eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-09-27 08:43:14 +08:00

Author	SHA1	Message	Date
Rasmus Munk Larsen	48c635e223	Add a simple cost model to prevent Eigen's parallel GEMM from using too many threads when the inner dimension is small. Timing for square matrices is unchanged, but both CPU and Wall time are significantly improved for skinny matrices. The benchmarks below are for multiplying NxK * KxN matrices with test names of the form BM_OuterishProd/N/K. Improvements in Wall time: Run on [redacted] (12 X 3501 MHz CPUs); 2016-10-05T17:40:02.462497196-07:00 CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_OuterishProd/64/1 3088 1610 +47.9% BM_OuterishProd/64/4 3562 2414 +32.2% BM_OuterishProd/64/32 8861 7815 +11.8% BM_OuterishProd/128/1 11363 6504 +42.8% BM_OuterishProd/128/4 11128 9794 +12.0% BM_OuterishProd/128/64 27691 27396 +1.1% BM_OuterishProd/256/1 33214 28123 +15.3% BM_OuterishProd/256/4 34312 36818 -7.3% BM_OuterishProd/256/128 174866 176398 -0.9% BM_OuterishProd/512/1 7963684 104224 +98.7% BM_OuterishProd/512/4 7987913 112867 +98.6% BM_OuterishProd/512/256 8198378 1306500 +84.1% BM_OuterishProd/1k/1 7356256 324432 +95.6% BM_OuterishProd/1k/4 8129616 331621 +95.9% BM_OuterishProd/1k/512 27265418 7517538 +72.4% Improvements in CPU time: Run on [redacted] (12 X 3501 MHz CPUs); 2016-10-05T17:40:02.462497196-07:00 CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_OuterishProd/64/1 6169 1608 +73.9% BM_OuterishProd/64/4 7117 2412 +66.1% BM_OuterishProd/64/32 17702 15616 +11.8% BM_OuterishProd/128/1 45415 6498 +85.7% BM_OuterishProd/128/4 44459 9786 +78.0% BM_OuterishProd/128/64 110657 109489 +1.1% BM_OuterishProd/256/1 265158 28101 +89.4% BM_OuterishProd/256/4 274234 183885 +32.9% BM_OuterishProd/256/128 1397160 1408776 -0.8% BM_OuterishProd/512/1 78947048 520703 +99.3% BM_OuterishProd/512/4 86955578 1349742 +98.4% BM_OuterishProd/512/256 74701613 15584661 +79.1% BM_OuterishProd/1k/1 78352601 3877911 +95.1% BM_OuterishProd/1k/4 78521643 3966221 +94.9% BM_OuterishProd/1k/512 258104736 89480530 +65.3%	2016-10-06 10:33:10 -07:00
Gael Guennebaud	80b5133789	Fix compilation of qr.inverse() for column and full pivoting variants.	2016-10-06 09:55:50 +02:00
Benoit Steiner	4387433acf	Increased the robustness of the reduction tests on fp16	2016-10-05 10:42:41 -07:00
Benoit Steiner	aad20d700d	Increase the tolerance to numerical noise.	2016-10-05 10:39:24 -07:00
Benoit Steiner	8b69d5d730	::rand() returns a signed integer on win32	2016-10-05 08:55:02 -07:00
Benoit Steiner	ed7a220b04	Fixed a typo that impacts windows builds	2016-10-05 08:51:31 -07:00
Benoit Steiner	ceee1c008b	Silenced compilation warning	2016-10-04 18:47:53 -07:00
Benoit Steiner	698ff69450	Properly characterize the CUDA packet primitives for fp16 as device only	2016-10-04 16:53:30 -07:00
Benoit Steiner	6af5ac7e27	Cleanup the cuda executor code.	2016-10-04 08:52:13 -07:00
Benoit Steiner	2f6d1607c8	Cleaned up the random number generation code.	2016-10-04 08:38:23 -07:00
Benoit Steiner	881b90e984	Use explicit type casting to generate packets of zeros.	2016-10-04 08:23:38 -07:00
Benoit Steiner	616a7a1912	Improved support for compiling CUDA code with clang as the host compiler	2016-10-03 17:09:33 -07:00
Benoit Steiner	409e887d78	Added support for constand std::complex numbers on GPU	2016-10-03 11:06:24 -07:00
Gael Guennebaud	9d6d0dff8f	bug #1317 : fix performance regression with some Block expressions and clang by helping it to remove dead code. The trick is to get rid of the nested expression in the evaluator by copying only the required information (here, the strides).	2016-10-01 15:37:00 +02:00
Gael Guennebaud	8b84801f7f	bug #1310 : workaround a compilation regression from 3.2 regarding triangular * homogeneous	2016-09-30 22:49:59 +02:00
Gael Guennebaud	67b4f45836	Fix angle range	2016-09-30 12:46:33 +02:00
Gael Guennebaud	27f3970453	Remove std:: prefix	2016-09-30 12:40:41 +02:00
Gael Guennebaud	3860a0bc8f	bug #1312 : Quaternion to AxisAngle conversion now ensures the angle will be in the range [-pi,pi]. This also increases accuracy when q.w is negative.	2016-09-29 23:23:35 +02:00
Gael Guennebaud	33500050c3	bug #1308 : fix compilation of some small products involving nullary-expressions.	2016-09-29 09:40:44 +02:00
Benoit Steiner	27d7628f16	Updated the list of warnings to reflect the new message ids introduced in cuda 8.0	2016-09-28 17:42:59 -07:00
Benoit Steiner	2bda1b0d93	Updated the tensor sum and mean reducer to enable them to process complex numbers on cuda gpus.	2016-09-28 17:08:41 -07:00
Gael Guennebaud	f3a00dd2b5	Merged in sergiu/eigen (pull request PR-229) Disabled MSVC level 4 warning C4714	2016-09-27 09:28:08 +02:00
Gael Guennebaud	892afb9416	Add debug info.	2016-09-26 23:53:57 +02:00
Gael Guennebaud	779774f98c	bug #1311 : fix alignment logic in some cases of (scalar*small).lazyProduct(small)	2016-09-26 23:53:40 +02:00
Benoit Steiner	6565f8d60f	Made the initialization of a CUDA device thread safe.	2016-09-26 11:00:32 -07:00
Gael Guennebaud	48dfe98abd	bug #1308 : fix compilation of vector * rowvector::nullary.	2016-09-25 14:54:35 +02:00
Sergiu Deitsch	fe29157d02	disabled MSVC level 4 warning C4714 The level 4 warning (/W4) warns about functions marked as __forceinline not inlined, and generates a lot of noise.	2016-09-25 14:25:47 +02:00
Gael Guennebaud	86caba838d	bug #1304 : fix Projective * scaling and Projective *= scaling	2016-09-23 13:41:21 +02:00
Gael Guennebaud	b9f7a17e47	Add missing file.	2016-09-23 10:26:08 +02:00
Benoit Steiner	1301d744f8	Made the gaussian generator usable on GPU	2016-09-22 19:04:44 -07:00
Benoit Steiner	2a69290ddb	Added a specialization of Eigen::numext::real and Eigen::numext::imag for std::complex<T> to be used when compiling a cuda kernel. This is unfortunately necessary to be able to process complex numbers from a CUDA kernel on MacOS.	2016-09-22 15:52:23 -07:00
Gael Guennebaud	3946768916	Added tag 3.3-rc1 for changeset 77e27fbeee7acb289d7df809fc09a8cc8ee94eb7	2016-09-22 22:38:36 +02:00
Gael Guennebaud	77e27fbeee	bump to 3.3-rc1 3.3-rc1	2016-09-22 22:37:39 +02:00
Gael Guennebaud	2ada122bc6	merge	2016-09-22 22:33:18 +02:00
Gael Guennebaud	8f2bdde373	merge	2016-09-22 22:32:55 +02:00
Gael Guennebaud	ba0f844d6b	Backout changeset ce3557ca69742af477546d031d644a6dab1ff614	2016-09-22 22:28:51 +02:00
Gael Guennebaud	9bcdc8b756	Add a nullary-functor example performing index-based sub-matrices.	2016-09-22 22:27:54 +02:00
Benoit Steiner	50e3bbfc90	Calls x.imag() instead of imag(x) when x is a complex number since the former is a constexpr while the later isn't. This fixes compilation errors triggered by nvcc on Mac.	2016-09-22 13:17:25 -07:00
Gael Guennebaud	ca3746c6f8	Bypass identity reflectors.	2016-09-22 22:07:13 +02:00
Felix Gruber	8bde7da086	fix documentation of LinSpaced The index of the highest value in a LinSpace is size-1.	2016-09-22 14:50:07 +02:00
Gael Guennebaud	66cbabafed	Add a note regarding gcc bug #72867	2016-09-22 11:18:52 +02:00
Christoph Hertzberg	4b377715d7	Do not manually add absolute path to boost-library. Also set C++ standard for blaze to C++14	2016-09-22 00:10:47 +02:00
Gael Guennebaud	aecc51a3e8	fix typo	2016-09-21 21:53:00 +02:00
Gael Guennebaud	1fc3a21ed0	Disable a failure test if extended double precision is in use (x87)	2016-09-21 20:09:07 +02:00
Gael Guennebaud	9fa2c8650e	Fix alignement of statically allocated temporaries in symv, and trmv.	2016-09-21 17:34:24 +02:00
Gael Guennebaud	ac5377e161	Improve cost estimation of complex division	2016-09-21 17:26:04 +02:00
Gael Guennebaud	5269d11935	Fix compilation if ICC.	2016-09-21 17:08:51 +02:00
Benoit Steiner	26f9907542	Added missing typedefs	2016-09-20 12:58:03 -07:00
RJ Ryan	608b1acd6d	Don't use c++11 features and fix include.	2016-09-20 07:49:05 -07:00
RJ Ryan	b2c6dc48d9	Add CUDA-specific std::complex<T> specializations for scalar_sum_op, scalar_difference_op, scalar_product_op, and scalar_quotient_op.	2016-09-20 07:18:20 -07:00

1 2 3 4 5 ...

8633 Commits