eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-04-29 07:14:12 +08:00

Author	SHA1	Message	Date
Rasmus Munk Larsen	225ab040e0	Remove unused packet op "palign". Clean up a compiler warning in c++03 mode in AVX512/Complex.h.	2020-05-07 17:14:26 -07:00
Rasmus Munk Larsen	74ec8e6618	Make size odd for transposeInPlace test to make sure we hit the scalar path.	2020-05-07 17:29:56 +00:00
Rasmus Munk Larsen	ab773c7e91	Extend support for Packet16b: * Add ptranspose<,4> to support matmul and add unit test for Matrix<bool> Matrix<bool> * work around a bug in slicing of Tensor<bool>. * Add tensor tests This speeds up matmul for boolean matrices by about 10x name old time/op new time/op delta BM_MatMul<bool>/8 267ns ± 0% 479ns ± 0% +79.25% (p=0.008 n=5+5) BM_MatMul<bool>/32 6.42µs ± 0% 0.87µs ± 0% -86.50% (p=0.008 n=5+5) BM_MatMul<bool>/64 43.3µs ± 0% 5.9µs ± 0% -86.42% (p=0.008 n=5+5) BM_MatMul<bool>/128 315µs ± 0% 44µs ± 0% -85.98% (p=0.008 n=5+5) BM_MatMul<bool>/256 2.41ms ± 0% 0.34ms ± 0% -85.68% (p=0.008 n=5+5) BM_MatMul<bool>/512 18.8ms ± 0% 2.7ms ± 0% -85.53% (p=0.008 n=5+5) BM_MatMul<bool>/1k 149ms ± 0% 22ms ± 0% -85.40% (p=0.008 n=5+5)	2020-04-28 16:12:47 +00:00
Rasmus Munk Larsen	b47c777993	Block transposeInPlace() when the matrix is real and square. This yields a large speedup because we transpose in registers (or L1 if we spill), instead of one packet at a time, which in the worst case makes the code write to the same cache line PacketSize times instead of once. rmlarsen@rmlarsen4:.../eigen_bench/google3$ benchy --benchmarks=.TransposeInPlace.float.* --reference=srcfs experimental/users/rmlarsen/bench:matmul_bench 10 / 10 [====================================================================================================================================================================================================================] 100.00% 2m50s (Generated by http://go/benchy. Settings: --runs 5 --benchtime 1s --reference "srcfs" --benchmarks ".TransposeInPlace.float.*" experimental/users/rmlarsen/bench:matmul_bench) name old time/op new time/op delta BM_TransposeInPlace<float>/4 9.84ns ± 0% 6.51ns ± 0% -33.80% (p=0.008 n=5+5) BM_TransposeInPlace<float>/8 23.6ns ± 1% 17.6ns ± 0% -25.26% (p=0.016 n=5+4) BM_TransposeInPlace<float>/16 78.8ns ± 0% 60.3ns ± 0% -23.50% (p=0.029 n=4+4) BM_TransposeInPlace<float>/32 302ns ± 0% 229ns ± 0% -24.40% (p=0.008 n=5+5) BM_TransposeInPlace<float>/59 1.03µs ± 0% 0.84µs ± 1% -17.87% (p=0.016 n=5+4) BM_TransposeInPlace<float>/64 1.20µs ± 0% 0.89µs ± 1% -25.81% (p=0.008 n=5+5) BM_TransposeInPlace<float>/128 8.96µs ± 0% 3.82µs ± 2% -57.33% (p=0.008 n=5+5) BM_TransposeInPlace<float>/256 152µs ± 3% 17µs ± 2% -89.06% (p=0.008 n=5+5) BM_TransposeInPlace<float>/512 837µs ± 1% 208µs ± 0% -75.15% (p=0.008 n=5+5) BM_TransposeInPlace<float>/1k 4.28ms ± 2% 1.08ms ± 2% -74.72% (p=0.008 n=5+5)	2020-04-28 16:08:16 +00:00
Rasmus Munk Larsen	e80ec24357	Remove unused packet op "preduxp".	2020-04-23 18:17:14 +00:00
Rasmus Munk Larsen	2f6ddaa25c	Add partial vectorization for matrices and tensors of bool. This speeds up boolean operations on Tensors by up to 25x. Benchmark numbers for the logical and of two NxN tensors: name old time/op new time/op delta BM_booleanAnd_1T/3 [using 1 threads] 14.6ns ± 0% 14.4ns ± 0% -0.96% BM_booleanAnd_1T/4 [using 1 threads] 20.5ns ±12% 9.0ns ± 0% -56.07% BM_booleanAnd_1T/7 [using 1 threads] 41.7ns ± 0% 10.5ns ± 0% -74.87% BM_booleanAnd_1T/8 [using 1 threads] 52.1ns ± 0% 10.1ns ± 0% -80.59% BM_booleanAnd_1T/10 [using 1 threads] 76.3ns ± 0% 13.8ns ± 0% -81.87% BM_booleanAnd_1T/15 [using 1 threads] 167ns ± 0% 16ns ± 0% -90.45% BM_booleanAnd_1T/16 [using 1 threads] 188ns ± 0% 16ns ± 0% -91.57% BM_booleanAnd_1T/31 [using 1 threads] 667ns ± 0% 34ns ± 0% -94.83% BM_booleanAnd_1T/32 [using 1 threads] 710ns ± 0% 35ns ± 0% -95.01% BM_booleanAnd_1T/64 [using 1 threads] 2.80µs ± 0% 0.11µs ± 0% -95.93% BM_booleanAnd_1T/128 [using 1 threads] 11.2µs ± 0% 0.4µs ± 0% -96.11% BM_booleanAnd_1T/256 [using 1 threads] 44.6µs ± 0% 2.5µs ± 0% -94.31% BM_booleanAnd_1T/512 [using 1 threads] 178µs ± 0% 10µs ± 0% -94.35% BM_booleanAnd_1T/1k [using 1 threads] 717µs ± 0% 78µs ± 1% -89.07% BM_booleanAnd_1T/2k [using 1 threads] 2.87ms ± 0% 0.31ms ± 1% -89.08% BM_booleanAnd_1T/4k [using 1 threads] 11.7ms ± 0% 1.9ms ± 4% -83.55% BM_booleanAnd_1T/10k [using 1 threads] 70.3ms ± 0% 17.2ms ± 4% -75.48%	2020-04-20 20:16:28 +00:00
Christoph Hertzberg	d46d726e9d	CommaInitializer wrongfully asserted for 0-sized blocks commainitialier unit-test never actually called `test_block_recursion`, which also was not correctly implemented and would have caused too deep template recursion.	2020-04-13 16:41:20 +02:00
Antonio Sanchez	8e875719b3	Replace norm() with squaredNorm() to address integer overflows For random matrices with integer coefficients, many of the tests here lead to integer overflows. When taking the norm() of a row/column, the squaredNorm() often overflows to a negative value, leading to domain errors when taking the sqrt(). This leads to a crash on some systems. By replacing the norm() call by a squaredNorm(), the values still overflow, but at least there is no domain error. Addresses https://gitlab.com/libeigen/eigen/-/issues/1856	2020-04-07 19:48:28 +00:00
Rasmus Munk Larsen	4fd5d1477b	Fix packetmath test build for AVX.	2020-03-27 17:05:39 +00:00
Rasmus Munk Larsen	55c8fe8d0f	Fix bug in `52d54278be`	2020-03-27 16:41:15 +00:00
Joel Holdsworth	52d54278be	Additional NEON packet-math operations	2020-03-26 20:18:19 +00:00
Aaron Franke	5c22c7a7de	Make file formatting comply with POSIX and Unix standards UTF-8, LF, no BOM, and newlines at the end of files	2020-03-23 18:09:02 +00:00
Joel Holdsworth	d5c665742b	Add absolute_difference coefficient-wise binary Array function	2020-03-19 17:45:20 +00:00
Joel Holdsworth	54aa8fa186	Implement integer square-root for NEON	2020-03-19 17:05:13 +00:00
Joel Holdsworth	88337acae2	test/packetmath: Add tests for all integer types	2020-03-10 22:46:19 +00:00
Joel Holdsworth	9e68977578	test/packetmath: Made negate non-mandatory	2020-03-10 22:46:19 +00:00
Rasmus Munk Larsen	6ac37768a9	Revert "add some static checks for packet-picking logic" This reverts commit 776960024585b907acc4abc3c59aef605941bb75	2020-02-25 01:07:04 +00:00
Rasmus Munk Larsen	87cfa4862f	Revert "Disable test in test/vectorization_logic.cpp, which is currently failing with AVX." This reverts commit b625adffd877639ff5cbe51ea154e1905a3b405c	2020-02-25 01:04:56 +00:00
Rasmus Munk Larsen	b625adffd8	Disable test in test/vectorization_logic.cpp, which is currently failing with AVX.	2020-02-24 23:28:25 +00:00
Francesco Mazzoli	7769600245	add some static checks for packet-picking logic	2020-02-07 18:16:16 +01:00
Christoph Hertzberg	1d0c45122a	Removing executable bit from file mode	2020-01-11 15:02:29 +01:00
Christoph Hertzberg	35219cea68	Bug #1790 : Make `areApprox` check `numext::isnan` instead of bitwise equality (NaNs don't have to be bitwise equal).	2020-01-11 14:57:22 +01:00
Srinivas Vasudevan	2e099e8d8f	Added special_packetmath test and tweaked bounds on tests. Refactor shared packetmath code to header file. (Squashed from PR !38)	2020-01-11 10:31:21 +00:00
Christoph Hertzberg	8333e03590	Use data.data() instead of &data (since it is not obvious that Array is trivially copyable)	2020-01-09 11:38:19 +01:00
Ilya Tokar	19876ced76	Bug #1785 : Introduce numext::rint. This provides a new op that matches std::rint and previous behavior of pround. Also adds corresponding unsupported/../Tensor op. Performance is the same as e. g. floor (tested SSE/AVX).	2020-01-07 21:22:44 +00:00
Everton Constantino	eedb7eeacf	Protecting integer_types's long long test with a check to see if we have CXX11 support.	2020-01-07 14:35:35 +00:00
Christoph Hertzberg	870e53c0f2	Bug #1788 : Fix rule-of-three violations inside the stable modules. This fixes deprecated-copy warnings when compiling with GCC>=9 Also protect some additional Base-constructors from getting called by user code code (#1587)	2019-12-19 17:30:11 +01:00
Christoph Hertzberg	6965f6de7f	Fix unit-test which I broke in previous fix	2019-12-19 13:42:14 +01:00
Christoph Hertzberg	72166d0e6e	Fix some maybe-unitialized warnings	2019-12-18 18:26:20 +01:00
Christoph Hertzberg	5a3eaf88ac	Workaround class-memaccess warnings on newer GCC versions	2019-12-18 16:37:26 +01:00
Rasmus Munk Larsen	a566074480	Improve accuracy of fast approximate tanh and the logistic functions in Eigen, such that they preserve relative accuracy to within a few ULPs where their function values tend to zero (around x=0 for tanh, and for large negative x for the logistic function). This change re-instates the fast rational approximation of the logistic function for float32 in Eigen (removed in `66f07efeae`), but uses the more accurate approximation 1/(1+exp(-1)) ~= exp(x) below -9. The exponential is only calculated on the vectorized path if at least one element in the SIMD input vector is less than -9. This change also contains a few improvements to speed up the original float specialization of logistic: - Introduce EIGEN_PREDICT_{FALSE,TRUE} for __builtin_predict and use it to predict that the logistic-only path is most likely (~2-3% speedup for the common case). - Carefully set the upper clipping point to the smallest x where the approximation evaluates to exactly 1. This saves the explicit clamping of the output (~7% speedup). The increased accuracy for tanh comes at a cost of 10-20% depending on instruction set. The benchmarks below repeated calls u = v.logistic() (u = v.tanh(), respectively) where u and v are of type Eigen::ArrayXf, have length 8k, and v contains random numbers in [-1,1]. Benchmark numbers for logistic: Before: Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------- SSE BM_eigen_logistic_float 4467 4468 155835 model_time: 4827 AVX BM_eigen_logistic_float 2347 2347 299135 model_time: 2926 AVX+FMA BM_eigen_logistic_float 1467 1467 476143 model_time: 2926 AVX512 BM_eigen_logistic_float 805 805 858696 model_time: 1463 After: Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------- SSE BM_eigen_logistic_float 2589 2590 270264 model_time: 4827 AVX BM_eigen_logistic_float 1428 1428 489265 model_time: 2926 AVX+FMA BM_eigen_logistic_float 1059 1059 662255 model_time: 2926 AVX512 BM_eigen_logistic_float 673 673 1000000 model_time: 1463 Benchmark numbers for tanh: Before: Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------- SSE BM_eigen_tanh_float 2391 2391 292624 model_time: 4242 AVX BM_eigen_tanh_float 1256 1256 554662 model_time: 2633 AVX+FMA BM_eigen_tanh_float 823 823 866267 model_time: 1609 AVX512 BM_eigen_tanh_float 443 443 1578999 model_time: 805 After: Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------- SSE BM_eigen_tanh_float 2588 2588 273531 model_time: 4242 AVX BM_eigen_tanh_float 1536 1536 452321 model_time: 2633 AVX+FMA BM_eigen_tanh_float 1007 1007 694681 model_time: 1609 AVX512 BM_eigen_tanh_float 471 471 1472178 model_time: 805	2019-12-16 21:33:42 +00:00
Ilya Tokar	06e99aaf40	Bug 1785: fix pround on x86 to use the same rounding mode as std::round. This also adds pset1frombits helper to Packet[24]d. Makes round ~45% slower for SSE: 1.65µs ± 1% before vs 2.45µs ± 2% after, stil an order of magnitude faster than scalar version: 33.8µs ± 2%.	2019-12-12 17:38:53 -05:00
Srinivas Vasudevan	88062b7fed	Fix implementation of complex expm1. Add tests that fail with previous implementation, but pass with the current one.	2019-12-12 01:56:54 +00:00
Joel Holdsworth	1b6e0395e6	Added io test	2019-12-11 18:22:57 +00:00
Gael Guennebaud	6358599ecb	Fix QuaternionBase::cast for quaternion map and wrapper.	2019-12-03 14:51:14 +01:00
Gael Guennebaud	7745f69013	bug #1776 : fix vector-wise STL iterator's operator-> using a proxy as pointer type. This changeset fixes also the value_type definition.	2019-12-03 14:40:15 +01:00
Joel Holdsworth	743c925286	test/packetmath: Silence alignment warnings	2019-11-05 19:06:12 +00:00
Hans Johnson	8c8cab1afd	STYLE: Convert CMake-language commands to lower case Ancient CMake versions required upper-case commands. Later command names became case-insensitive. Now the preferred style is lower-case.	2019-10-31 11:36:37 -05:00
Hans Johnson	6fb3e5f176	STYLE: Remove CMake-language block-end command arguments Ancient versions of CMake required else(), endif(), and similar block termination commands to have arguments matching the command starting the block. This is no longer the preferred style.	2019-10-31 11:36:27 -05:00
Rasmus Munk Larsen	f1e8307308	1. Fix a bug in psqrt and make it return 0 for +inf arguments. 2. Simplify handling of special cases by taking advantage of the fact that the builtin vrsqrt approximation handles negative, zero and +inf arguments correctly. This speeds up the SSE and AVX implementations by ~20%. 3. Make the Newton-Raphson formula used for rsqrt more numerically robust: Before: y = y * (1.5 - x/2 * y^2) After: y = y * (1.5 - y * (x/2) * y) Forming y^2 can overflow for very large or very small (denormalized) values of x, while x*y ~= 1. For AVX512, this makes it possible to compute accurate results for denormal inputs down to ~1e-42 in single precision. 4. Add a faster double precision implementation for Knights Landing using the vrsqrt28 instruction and a single Newton-Raphson iteration. Benchmark results: https://bitbucket.org/snippets/rmlarsen/5LBq9o	2019-11-15 17:09:46 -08:00
Gael Guennebaud	8af045a287	bug #1774 : fix VectorwiseOp::begin()/end() return types regarding constness.	2019-11-14 11:45:52 +01:00
Gael Guennebaud	8496f86f84	Enable CompleteOrthogonalDecomposition::pseudoInverse with non-square fixed-size matrices.	2019-11-13 21:16:53 +01:00
Gael Guennebaud	e7d8ba747c	bug #1752 : make is_convertible equivalent to the std c++11 equivalent and fallback to std::is_convertible when c++11 is enabled.	2019-10-10 17:41:47 +02:00
Gael Guennebaud	fb557aec5c	bug #1752 : disable some is_convertible tests for recent compilers.	2019-10-10 11:40:21 +02:00
Gael Guennebaud	36da231a41	Disable an expected warning in unit test	2019-10-08 16:28:14 +02:00
Gael Guennebaud	87427d2eaa	PR 719: fix real/imag namespace conflict	2019-10-08 09:15:17 +02:00
Rasmus Larsen	d38e6fbc27	Merged in rmlarsen/eigen (pull request PR-704) Add generic PacketMath implementation of the Error Function (erf).	2019-09-24 23:40:29 +00:00
Christoph Hertzberg	efd9867ff0	bug #1746 : Removed implementation of standard copy-constructor and standard copy-assign-operator from PermutationMatrix and Transpositions to allow malloc-less std::move. Added unit-test to rvalue_types	2019-09-24 11:09:58 +02:00
Rasmus Munk Larsen	6de5ed08d8	Add generic PacketMath implementation of the Error Function (erf).	2019-09-19 12:48:30 -07:00
Srinivas Vasudevan	df0816b71f	Merging eigen/eigen.	2019-09-16 19:33:29 -04:00

1 2 3 4 5 ...

2389 Commits