eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-09-26 16:26:48 +08:00

Author	SHA1	Message	Date
Benoit Steiner	72d2cf642e	Deleted the coordinate based evaluation of tensor expressions, since it's hardly ever used and started to cause some issues with some versions of xcode.	2016-02-22 15:29:41 -08:00
Benoit Steiner	6270d851e3	Declare the half float type as arithmetic.	2016-02-22 13:59:33 -08:00
Benoit Steiner	5cd00068c0	include <iostream> in the tensor header since we now use it to better report cuda initialization errors	2016-02-22 13:59:03 -08:00
Benoit Steiner	257b640463	Fixed compilation warning generated by clang	2016-02-21 22:43:37 -08:00
Benoit Steiner	584832cb3c	Implemented the ptranspose function on half floats	2016-02-21 12:44:53 -08:00
Benoit Steiner	e644f60907	Pulled latest updates from trunk	2016-02-21 20:24:59 +00:00
Benoit Steiner	95fceb6452	Added the ability to compute the absolute value of a half float	2016-02-21 20:24:11 +00:00
Benoit Steiner	ed69cbeef0	Added some debugging information to the test to figure out why it fails sometimes	2016-02-21 11:20:20 -08:00
Benoit Steiner	96a24b05cc	Optimized casting of tensors in the case where the casting happens to be a no-op	2016-02-21 11:16:15 -08:00
Benoit Steiner	203490017f	Prevent unecessary Index to int conversions	2016-02-21 08:49:36 -08:00
Benoit Steiner	9ff269a1d3	Moved some of the fp16 operators outside the Eigen namespace to workaround some nvcc limitations.	2016-02-20 07:47:23 +00:00
Benoit Steiner	1e6fe6f046	Fixed the float16 tensor test.	2016-02-20 07:44:17 +00:00
Rasmus Munk Larsen	8eb127022b	Get rid of duplicate code.	2016-02-19 16:33:30 -08:00
Rasmus Munk Larsen	d5e2ec7447	Speed up tensor FFT by up ~25-50%. Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_tensor_fft_single_1D_cpu/8 132 134 -1.5% BM_tensor_fft_single_1D_cpu/9 1162 1229 -5.8% BM_tensor_fft_single_1D_cpu/16 199 195 +2.0% BM_tensor_fft_single_1D_cpu/17 2587 2267 +12.4% BM_tensor_fft_single_1D_cpu/32 373 341 +8.6% BM_tensor_fft_single_1D_cpu/33 5922 4879 +17.6% BM_tensor_fft_single_1D_cpu/64 797 675 +15.3% BM_tensor_fft_single_1D_cpu/65 13580 10481 +22.8% BM_tensor_fft_single_1D_cpu/128 1753 1375 +21.6% BM_tensor_fft_single_1D_cpu/129 31426 22789 +27.5% BM_tensor_fft_single_1D_cpu/256 4005 3008 +24.9% BM_tensor_fft_single_1D_cpu/257 70910 49549 +30.1% BM_tensor_fft_single_1D_cpu/512 8989 6524 +27.4% BM_tensor_fft_single_1D_cpu/513 165402 107751 +34.9% BM_tensor_fft_single_1D_cpu/999 198293 115909 +41.5% BM_tensor_fft_single_1D_cpu/1ki 21289 14143 +33.6% BM_tensor_fft_single_1D_cpu/1k 361980 233355 +35.5% BM_tensor_fft_double_1D_cpu/8 138 131 +5.1% BM_tensor_fft_double_1D_cpu/9 1253 1133 +9.6% BM_tensor_fft_double_1D_cpu/16 218 200 +8.3% BM_tensor_fft_double_1D_cpu/17 2770 2392 +13.6% BM_tensor_fft_double_1D_cpu/32 406 368 +9.4% BM_tensor_fft_double_1D_cpu/33 6418 5153 +19.7% BM_tensor_fft_double_1D_cpu/64 856 728 +15.0% BM_tensor_fft_double_1D_cpu/65 14666 11148 +24.0% BM_tensor_fft_double_1D_cpu/128 1913 1502 +21.5% BM_tensor_fft_double_1D_cpu/129 36414 24072 +33.9% BM_tensor_fft_double_1D_cpu/256 4226 3216 +23.9% BM_tensor_fft_double_1D_cpu/257 86638 52059 +39.9% BM_tensor_fft_double_1D_cpu/512 9397 6939 +26.2% BM_tensor_fft_double_1D_cpu/513 203208 114090 +43.9% BM_tensor_fft_double_1D_cpu/999 237841 125583 +47.2% BM_tensor_fft_double_1D_cpu/1ki 20921 15392 +26.4% BM_tensor_fft_double_1D_cpu/1k 455183 250763 +44.9% BM_tensor_fft_single_2D_cpu/8 1051 1005 +4.4% BM_tensor_fft_single_2D_cpu/9 16784 14837 +11.6% BM_tensor_fft_single_2D_cpu/16 4074 3772 +7.4% BM_tensor_fft_single_2D_cpu/17 75802 63884 +15.7% BM_tensor_fft_single_2D_cpu/32 20580 16931 +17.7% BM_tensor_fft_single_2D_cpu/33 345798 278579 +19.4% BM_tensor_fft_single_2D_cpu/64 97548 81237 +16.7% BM_tensor_fft_single_2D_cpu/65 1592701 1227048 +23.0% BM_tensor_fft_single_2D_cpu/128 472318 384303 +18.6% BM_tensor_fft_single_2D_cpu/129 7038351 5445308 +22.6% BM_tensor_fft_single_2D_cpu/256 2309474 1850969 +19.9% BM_tensor_fft_single_2D_cpu/257 31849182 23797538 +25.3% BM_tensor_fft_single_2D_cpu/512 10395194 8077499 +22.3% BM_tensor_fft_single_2D_cpu/513 144053843 104242541 +27.6% BM_tensor_fft_single_2D_cpu/999 279885833 208389718 +25.5% BM_tensor_fft_single_2D_cpu/1ki 45967677 36070985 +21.5% BM_tensor_fft_single_2D_cpu/1k 619727095 456489500 +26.3% BM_tensor_fft_double_2D_cpu/8 1110 1016 +8.5% BM_tensor_fft_double_2D_cpu/9 17957 15768 +12.2% BM_tensor_fft_double_2D_cpu/16 4558 4000 +12.2% BM_tensor_fft_double_2D_cpu/17 79237 66901 +15.6% BM_tensor_fft_double_2D_cpu/32 21494 17699 +17.7% BM_tensor_fft_double_2D_cpu/33 357962 290357 +18.9% BM_tensor_fft_double_2D_cpu/64 105179 87435 +16.9% BM_tensor_fft_double_2D_cpu/65 1617143 1288006 +20.4% BM_tensor_fft_double_2D_cpu/128 512848 419397 +18.2% BM_tensor_fft_double_2D_cpu/129 7271322 5636884 +22.5% BM_tensor_fft_double_2D_cpu/256 2415529 1922032 +20.4% BM_tensor_fft_double_2D_cpu/257 32517952 24462177 +24.8% BM_tensor_fft_double_2D_cpu/512 10724898 8287617 +22.7% BM_tensor_fft_double_2D_cpu/513 146007419 108603266 +25.6% BM_tensor_fft_double_2D_cpu/999 296351330 221885776 +25.1% BM_tensor_fft_double_2D_cpu/1ki 59334166 48357539 +18.5% BM_tensor_fft_double_2D_cpu/1k 666660132 483840349 +27.4%	2016-02-19 16:29:23 -08:00
Gael Guennebaud	d90a2dac5e	merge	2016-02-19 23:01:27 +01:00
Gael Guennebaud	485823b5f5	Add COD and BDCSVD in list of benched solvers.	2016-02-19 23:00:33 +01:00
Gael Guennebaud	2af04f1a57	Extend unit test to stress smart_copy with empty input/output.	2016-02-19 22:59:28 +01:00
Gael Guennebaud	6fa35bbd28	bug #1170 : skip calls to memcpy/memmove for empty imput.	2016-02-19 22:58:52 +01:00
Benoit Steiner	46fc23f91c	Print an error message to stderr when the initialization of the CUDA runtime fails. This helps debugging setup issues.	2016-02-19 13:44:22 -08:00
Gael Guennebaud	6f0992c05b	Fix nesting type and complete reflection methods of Block expressions.	2016-02-19 22:21:02 +01:00
Gael Guennebaud	f3643eec57	Add typedefs for the return type of all block methods.	2016-02-19 22:15:01 +01:00
Benoit Steiner	670db7988d	Updated the contraction code to make it compatible with half floats.	2016-02-19 13:03:26 -08:00
Benoit Steiner	180156ba1a	Added support for tensor reductions on half floats	2016-02-19 10:05:59 -08:00
Benoit Steiner	5c4901b83a	Implemented the scalar division of 2 half floats	2016-02-19 10:03:19 -08:00
Benoit Steiner	f268db1c4b	Added the ability to query the minor version of a cuda device	2016-02-19 16:31:04 +00:00
Benoit Steiner	a08d2ff0c9	Started to work on contractions and reductions using half floats	2016-02-19 15:59:59 +00:00
Benoit Steiner	f3352e0fb0	Don't make the array constructors explicit	2016-02-19 15:58:57 +00:00
Benoit Steiner	f7cb755299	Added support for operators +=, -=, *= and /= on CUDA half floats	2016-02-19 15:57:26 +00:00
Benoit Steiner	dc26459b99	Implemented protate() for CUDA	2016-02-19 15:16:54 +00:00
Benoit Steiner	cd042dbbfd	Fixed a bug in the tensor type converter	2016-02-19 15:03:26 +00:00
Benoit Steiner	ac5d706a94	Added support for simple coefficient wise tensor expression using half floats on CUDA devices	2016-02-19 08:19:12 +00:00
Benoit Steiner	0606a0a39b	FP16 on CUDA are only available starting with cuda 7.5. Disable them when using an older version of CUDA	2016-02-18 23:15:23 -08:00
Benoit Steiner	f36c0c2c65	Added regression test for float16	2016-02-19 06:23:28 +00:00
Benoit Steiner	7151bd8768	Reverted unintended changes introduced by a bad merge	2016-02-19 06:20:50 +00:00
Benoit Steiner	1304e1fb5e	Pulled latest updates from trunk	2016-02-19 06:17:02 +00:00
Benoit Steiner	17b9fbed34	Added preliminary support for half floats on CUDA GPU. For now we can simply convert floats into half floats and vice versa	2016-02-19 06:16:07 +00:00
Benoit Steiner	8ce46f9d89	Improved implementation of ptanh for SSE and AVX	2016-02-18 13:24:34 -08:00
Eugene Brevdo	832380c455	Merged eigen/eigen into default	2016-02-17 14:44:06 -08:00
Eugene Brevdo	06a2bc7c9c	Tiny bugfix in SpecialFunctions: some compilers don't like doubles implicitly downcast to floats in an array constructor.	2016-02-17 14:41:59 -08:00
Gael Guennebaud	f6f057bb7d	bug #1166 : fix shortcomming in gemv when the destination is not a vector at compile-time.	2016-02-15 21:43:07 +01:00
Gael Guennebaud	8e1f1ba6a6	Import wiki's paragraph: "I disabled vectorization, but I'm still getting annoyed about alignment issues"	2016-02-12 22:16:59 +01:00
Gael Guennebaud	c8b4c4b48a	bug #795 : mention allocate_shared as a condidate for aligned_allocator.	2016-02-12 22:09:16 +01:00
Gael Guennebaud	6eff3e5185	Fix triangularView versus triangularPart.	2016-02-12 17:09:28 +01:00
Gael Guennebaud	4252af6897	Remove dead code.	2016-02-12 16:13:35 +01:00
Gael Guennebaud	2f5f56a820	Fix usage of evaluator in sparse * permutation products.	2016-02-12 16:13:16 +01:00
Gael Guennebaud	0a537cb2d8	bug #901 : fix triangular-view with unit diagonal of sparse rectangular matrices.	2016-02-12 15:58:31 +01:00
Gael Guennebaud	b35d1a122e	Fix unit test: accessing elements in a deque by offsetting a pointer to another element causes undefined behavior.	2016-02-12 15:31:16 +01:00
Benoit Steiner	9e3f3a2d27	Deleted outdated comment	2016-02-11 17:27:35 -08:00
Benoit Steiner	de345eff2e	Added a method to conjugate the content of a tensor or the result of a tensor expression.	2016-02-11 16:34:07 -08:00
Benoit Steiner	17e93ba148	Pulled latest updates from trunk	2016-02-11 15:05:38 -08:00

1 2 3 4 5 ...

7467 Commits