eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-10-20 20:11:07 +08:00

Author	SHA1	Message	Date
Rasmus Munk Larsen	8f2c6f0aa6	Make preciprocal IEEE compliant w.r.t. 1/0 and 1/inf.	2022-01-26 20:38:05 +00:00
Erik Schultheis	d271a7d545	reduce float warnings (comparisons and implicit conversions)	2022-01-26 18:16:19 +00:00
Rasmus Munk Larsen	ea2c02060c	Add reciprocal packet op and fast specializations for float with SSE, AVX, and AVX512.	2022-01-21 23:49:18 +00:00
Kolja Brix	afa616bc9e	Fix some typos found	2021-09-23 15:22:00 +00:00
Rasmus Munk Larsen	d7d0bf832d	Issue an error in case of direct inclusion of internal headers.	2021-09-10 19:12:26 +00:00
Nathan Luehr	7e6a1c129c	Device implementation of log for std::complex types.	2021-05-11 22:02:21 +00:00
Antonio Sanchez	bde6741641	Improved std::complex sqrt and rsqrt. Replaces `std::sqrt` with `complex_sqrt` for all platforms (previously `complex_sqrt` was only used for CUDA and MSVC), and implements custom `complex_rsqrt`. Also introduces `numext::rsqrt` to simplify implementation, and modified `numext::hypot` to adhere to IEEE IEC 6059 for special cases. The `complex_sqrt` and `complex_rsqrt` implementations were found to be significantly faster than `std::sqrt<std::complex<T>>` and `1/numext::sqrt<std::complex<T>>`. Benchmark file attached. ``` GCC 10, Intel Xeon, x86_64: --------------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------------- BM_Sqrt<std::complex<float>> 9.21 ns 9.21 ns 73225448 BM_StdSqrt<std::complex<float>> 17.1 ns 17.1 ns 40966545 BM_Sqrt<std::complex<double>> 8.53 ns 8.53 ns 81111062 BM_StdSqrt<std::complex<double>> 21.5 ns 21.5 ns 32757248 BM_Rsqrt<std::complex<float>> 10.3 ns 10.3 ns 68047474 BM_DivSqrt<std::complex<float>> 16.3 ns 16.3 ns 42770127 BM_Rsqrt<std::complex<double>> 11.3 ns 11.3 ns 61322028 BM_DivSqrt<std::complex<double>> 16.5 ns 16.5 ns 42200711 Clang 11, Intel Xeon, x86_64: --------------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------------- BM_Sqrt<std::complex<float>> 7.46 ns 7.45 ns 90742042 BM_StdSqrt<std::complex<float>> 16.6 ns 16.6 ns 42369878 BM_Sqrt<std::complex<double>> 8.49 ns 8.49 ns 81629030 BM_StdSqrt<std::complex<double>> 21.8 ns 21.7 ns 31809588 BM_Rsqrt<std::complex<float>> 8.39 ns 8.39 ns 82933666 BM_DivSqrt<std::complex<float>> 14.4 ns 14.4 ns 48638676 BM_Rsqrt<std::complex<double>> 9.83 ns 9.82 ns 70068956 BM_DivSqrt<std::complex<double>> 15.7 ns 15.7 ns 44487798 Clang 9, Pixel 2, aarch64: --------------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------------- BM_Sqrt<std::complex<float>> 24.2 ns 24.1 ns 28616031 BM_StdSqrt<std::complex<float>> 104 ns 103 ns 6826926 BM_Sqrt<std::complex<double>> 31.8 ns 31.8 ns 22157591 BM_StdSqrt<std::complex<double>> 128 ns 128 ns 5437375 BM_Rsqrt<std::complex<float>> 31.9 ns 31.8 ns 22384383 BM_DivSqrt<std::complex<float>> 99.2 ns 98.9 ns 7250438 BM_Rsqrt<std::complex<double>> 46.0 ns 45.8 ns 15338689 BM_DivSqrt<std::complex<double>> 119 ns 119 ns 5898944 ```	2021-01-17 08:50:57 -08:00
Antonio Sanchez	352f1422d3	Remove `inf` local variable. Apparently `inf` is a macro on iOS for `std::numeric_limits<T>::infinity()`, causing a compile error here. We don't need the local anyways since it's only used in one spot.	2021-01-12 10:33:15 -08:00
Antonio Sanchez	f149e0ebc3	Fix MSVC complex sqrt and packetmath test. MSVC incorrectly handles `inf` cases for `std::sqrt<std::complex<T>>`. Here we replace it with a custom version (currently used on GPU). Also fixed the `packetmath` test, which previously skipped several corner cases since `CHECK_CWISE1` only tests the first `PacketSize` elements.	2021-01-08 01:17:19 +00:00
David Tellenbach	4091f6b25c	Drop EIGEN_USING_STD_MATH in favour of EIGEN_USING_STD	2020-10-09 02:05:05 +02:00
Rasmus Munk Larsen	a566074480	Improve accuracy of fast approximate tanh and the logistic functions in Eigen, such that they preserve relative accuracy to within a few ULPs where their function values tend to zero (around x=0 for tanh, and for large negative x for the logistic function). This change re-instates the fast rational approximation of the logistic function for float32 in Eigen (removed in `66f07efeae`), but uses the more accurate approximation 1/(1+exp(-1)) ~= exp(x) below -9. The exponential is only calculated on the vectorized path if at least one element in the SIMD input vector is less than -9. This change also contains a few improvements to speed up the original float specialization of logistic: - Introduce EIGEN_PREDICT_{FALSE,TRUE} for __builtin_predict and use it to predict that the logistic-only path is most likely (~2-3% speedup for the common case). - Carefully set the upper clipping point to the smallest x where the approximation evaluates to exactly 1. This saves the explicit clamping of the output (~7% speedup). The increased accuracy for tanh comes at a cost of 10-20% depending on instruction set. The benchmarks below repeated calls u = v.logistic() (u = v.tanh(), respectively) where u and v are of type Eigen::ArrayXf, have length 8k, and v contains random numbers in [-1,1]. Benchmark numbers for logistic: Before: Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------- SSE BM_eigen_logistic_float 4467 4468 155835 model_time: 4827 AVX BM_eigen_logistic_float 2347 2347 299135 model_time: 2926 AVX+FMA BM_eigen_logistic_float 1467 1467 476143 model_time: 2926 AVX512 BM_eigen_logistic_float 805 805 858696 model_time: 1463 After: Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------- SSE BM_eigen_logistic_float 2589 2590 270264 model_time: 4827 AVX BM_eigen_logistic_float 1428 1428 489265 model_time: 2926 AVX+FMA BM_eigen_logistic_float 1059 1059 662255 model_time: 2926 AVX512 BM_eigen_logistic_float 673 673 1000000 model_time: 1463 Benchmark numbers for tanh: Before: Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------- SSE BM_eigen_tanh_float 2391 2391 292624 model_time: 4242 AVX BM_eigen_tanh_float 1256 1256 554662 model_time: 2633 AVX+FMA BM_eigen_tanh_float 823 823 866267 model_time: 1609 AVX512 BM_eigen_tanh_float 443 443 1578999 model_time: 805 After: Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------- SSE BM_eigen_tanh_float 2588 2588 273531 model_time: 4242 AVX BM_eigen_tanh_float 1536 1536 452321 model_time: 2633 AVX+FMA BM_eigen_tanh_float 1007 1007 694681 model_time: 1609 AVX512 BM_eigen_tanh_float 471 471 1472178 model_time: 805	2019-12-16 21:33:42 +00:00
Rasmus Munk Larsen	73a8d572f5	Clamp tanh approximation outside [-c, c] where c is the smallest value where the approximation is exactly +/-1. Without FMA, c = 7.90531110763549805, with FMA c = 7.99881172180175781.	2019-12-12 19:34:25 +00:00
Rasmus Munk Larsen	13ef08e5ac	Move implementation of vectorized error function erf() to SpecialFunctionsImpl.h.	2019-09-27 13:56:04 -07:00
Rasmus Munk Larsen	6de5ed08d8	Add generic PacketMath implementation of the Error Function (erf).	2019-09-19 12:48:30 -07:00
Andrea Bocci	f7124b3e46	Extend CUDA support to matrix inversion and selfadjointeigensolver	2018-06-11 18:33:24 +02:00
Christoph Hertzberg	4d392d93aa	Make hypot_impl compile again for types with expression-templates (e.g., boost::multiprecision)	2018-04-13 19:01:37 +02:00
Gael Guennebaud	4213b63f5c	Factories code between numext::hypot and scalar_hyot_op functor.	2018-04-04 15:12:43 +02:00
Gael Guennebaud	e116f6847e	bug #1521 : avoid signalling NaN in hypot and make it std::complex<> friendly.	2018-04-04 13:47:23 +02:00
Rasmus Munk Larsen	ae3e43a125	Remove extra space.	2017-01-24 16:16:39 -08:00
Rasmus Munk Larsen	5e144bbaa4	Make NaN propagatation consistent between the pmax/pmin and std::max/std::min. This makes the NaN propagation consistent between the scalar and vectorized code paths of Eigen's scalar_max_op and scalar_min_op. See #1373 for details.	2017-01-24 13:32:50 -08:00
Gael Guennebaud	66cbabafed	Add a note regarding gcc bug #72867	2016-09-22 11:18:52 +02:00
Gael Guennebaud	68e803a26e	Fix warning	2016-08-30 09:21:57 +02:00
Gael Guennebaud	fd9caa1bc2	bug #1282 : fix implicit double to float conversion warning	2016-08-28 22:45:56 +02:00
Gael Guennebaud	a4c266f827	Factorize the 4 copies of tanh implementations, make numext::tanh consistent with array::tanh, enable fast tanh in fast-math mode only.	2016-08-23 14:23:08 +02:00

24 Commits