eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-08-03 02:30:38 +08:00

Author	SHA1	Message	Date
Sean McBride	f1b9692d63	Removed EIGEN_UNUSED decorations from many functions that are in fact used	2022-03-03 20:19:33 +00:00
Antonio Sánchez	9c07e201ff	Modified sqrt/rsqrt for denormal handling.	2022-03-02 17:20:47 +00:00
Rasmus Munk Larsen	8b875dbef1	Changes to fast SQRT/RSQRT	2022-02-23 17:32:21 +00:00
Rasmus Munk Larsen	979fdd58a4	Add generic fast psqrt and prsqrt impls and make them correct for 0, +Inf, NaN, and negative arguments.	2022-02-05 00:20:13 +00:00
Rasmus Munk Larsen	ea2c02060c	Add reciprocal packet op and fast specializations for float with SSE, AVX, and AVX512.	2022-01-21 23:49:18 +00:00
Kolja Brix	8d81a2339c	Reduce usage of reserved names	2022-01-10 20:53:29 +00:00
Rasmus Munk Larsen	96dc37a03b	Some fixes/cleanups for numeric_limits & fix for related bug in psqrt	2022-01-07 01:10:17 +00:00
Rasmus Munk Larsen	d7d0bf832d	Issue an error in case of direct inclusion of internal headers.	2021-09-10 19:12:26 +00:00
Antonio Sanchez	b2126fd6b5	Fix pfrexp/pldexp for half. The recent addition of vectorized pow (!330) relies on `pfrexp` and `pldexp`. This was missing for `Eigen::half` and `Eigen::bfloat16`. Adding tests for these packet ops also exposed an issue with handling negative values in `pfrexp`, returning an incorrect exponent. Added the missing implementations, corrected the exponent in `pfrexp1`, and added `packetmath` tests.	2021-01-21 19:32:28 +00:00
Rasmus Munk Larsen	05754100fe	* Add iterative psqrt<double> for AVX and SSE when FMA is available. This provides a ~10% speedup. * Write iterative sqrt explicitly in terms of pmadd. This gives up to 7% speedup for psqrt<float> with AVX & SSE with FMA. * Remove iterative psqrt<double> for NEON, because the initial rsqrt apprimation is not accurate enough for convergence in 2 Newton-Raphson steps and with 3 steps, just calling the builtin sqrt insn is faster. The following benchmarks were compiled with clang "-O2 -fast-math -mfma" and with and without -mavx. AVX+FMA (float) name old cpu/op new cpu/op delta BM_eigen_sqrt_float/1 1.08ns ± 0% 1.09ns ± 1% ~ BM_eigen_sqrt_float/8 2.07ns ± 0% 2.08ns ± 1% ~ BM_eigen_sqrt_float/64 12.4ns ± 0% 12.4ns ± 1% ~ BM_eigen_sqrt_float/512 95.7ns ± 0% 95.5ns ± 0% ~ BM_eigen_sqrt_float/4k 776ns ± 0% 763ns ± 0% -1.67% BM_eigen_sqrt_float/32k 6.57µs ± 1% 6.13µs ± 0% -6.69% BM_eigen_sqrt_float/256k 83.7µs ± 3% 83.3µs ± 2% ~ BM_eigen_sqrt_float/1M 335µs ± 2% 332µs ± 2% ~ SSE+FMA (float) name old cpu/op new cpu/op delta BM_eigen_sqrt_float/1 1.08ns ± 0% 1.09ns ± 0% ~ BM_eigen_sqrt_float/8 2.07ns ± 0% 2.06ns ± 0% ~ BM_eigen_sqrt_float/64 12.4ns ± 0% 12.4ns ± 1% ~ BM_eigen_sqrt_float/512 95.7ns ± 0% 96.3ns ± 4% ~ BM_eigen_sqrt_float/4k 774ns ± 0% 763ns ± 0% -1.50% BM_eigen_sqrt_float/32k 6.58µs ± 2% 6.11µs ± 0% -7.06% BM_eigen_sqrt_float/256k 82.7µs ± 1% 82.6µs ± 1% ~ BM_eigen_sqrt_float/1M 330µs ± 1% 329µs ± 2% ~ SSE+FMA (double) BM_eigen_sqrt_double/1 1.63ns ± 0% 1.63ns ± 0% ~ BM_eigen_sqrt_double/8 6.51ns ± 0% 6.08ns ± 0% -6.68% BM_eigen_sqrt_double/64 52.1ns ± 0% 46.5ns ± 1% -10.65% BM_eigen_sqrt_double/512 417ns ± 0% 374ns ± 1% -10.29% BM_eigen_sqrt_double/4k 3.33µs ± 0% 2.97µs ± 1% -11.00% BM_eigen_sqrt_double/32k 26.7µs ± 0% 23.7µs ± 0% -11.07% BM_eigen_sqrt_double/256k 213µs ± 0% 206µs ± 1% -3.31% BM_eigen_sqrt_double/1M 862µs ± 0% 870µs ± 2% +0.96% AVX+FMA (double) name old cpu/op new cpu/op delta BM_eigen_sqrt_double/1 1.63ns ± 0% 1.63ns ± 0% ~ BM_eigen_sqrt_double/8 6.51ns ± 0% 6.06ns ± 0% -6.95% BM_eigen_sqrt_double/64 52.1ns ± 0% 46.5ns ± 1% -10.80% BM_eigen_sqrt_double/512 417ns ± 0% 373ns ± 1% -10.59% BM_eigen_sqrt_double/4k 3.33µs ± 0% 2.97µs ± 1% -10.79% BM_eigen_sqrt_double/32k 26.7µs ± 0% 23.8µs ± 0% -10.94% BM_eigen_sqrt_double/256k 214µs ± 0% 208µs ± 2% -2.76% BM_eigen_sqrt_double/1M 866µs ± 3% 923µs ± 7% ~	2020-12-16 18:16:11 +00:00
Rasmus Munk Larsen	f9fac1d5b0	Add log2() to Eigen.	2020-12-04 21:45:09 +00:00
Rasmus Munk Larsen	f23dc5b971	Revert "Add log2() operator to Eigen" This reverts commit 4d91519a9be061da5d300079fca17dd0b9328050.	2020-12-03 14:32:45 -08:00
Rasmus Munk Larsen	4d91519a9b	Add log2() operator to Eigen	2020-12-03 22:31:44 +00:00
Antonio Sanchez	a3b300f1af	Implement missing AVX half ops. Minimal implementation of AVX `Eigen::half` ops to bring in line with `bfloat16`. Allows `packetmath_13` to pass. Also adjusted `bfloat16` packet traits to match the supported set of ops (e.g. Bessel is not actually implemented).	2020-11-24 16:46:41 +00:00
Guoqiang QI	4700713faf	Add AVX plog<Packet4d> and AVX512 plog<Packet8d> ops,also unified AVX512 plog<Packet16f> op with generic api	2020-10-15 00:54:45 +00:00
Sheng Yang	56b3e3f3f8	AVX path for BF16	2020-07-14 01:34:03 +00:00
Rasmus Munk Larsen	f1e8307308	1. Fix a bug in psqrt and make it return 0 for +inf arguments. 2. Simplify handling of special cases by taking advantage of the fact that the builtin vrsqrt approximation handles negative, zero and +inf arguments correctly. This speeds up the SSE and AVX implementations by ~20%. 3. Make the Newton-Raphson formula used for rsqrt more numerically robust: Before: y = y * (1.5 - x/2 * y^2) After: y = y * (1.5 - y * (x/2) * y) Forming y^2 can overflow for very large or very small (denormalized) values of x, while x*y ~= 1. For AVX512, this makes it possible to compute accurate results for denormal inputs down to ~1e-42 in single precision. 4. Add a faster double precision implementation for Knights Landing using the vrsqrt28 instruction and a single Newton-Raphson iteration. Benchmark results: https://bitbucket.org/snippets/rmlarsen/5LBq9o	2019-11-15 17:09:46 -08:00
Rasmus Munk Larsen	13ef08e5ac	Move implementation of vectorized error function erf() to SpecialFunctionsImpl.h.	2019-09-27 13:56:04 -07:00
Rasmus Munk Larsen	6de5ed08d8	Add generic PacketMath implementation of the Error Function (erf).	2019-09-19 12:48:30 -07:00
Rasmus Munk Larsen	a3298b22ec	Implement vectorized versions of log1p and expm1 in Eigen using Kahan's formulas, and change the scalar implementations to properly handle infinite arguments. Depending on instruction set, significant speedups are observed for the vectorized path: log1p wall time is reduced 60-93% (2.5x - 15x speedup) expm1 wall time is reduced 0-85% (1x - 7x speedup) The scalar path is slower by 20-30% due to the extra branch needed to handle +infinity correctly. Full benchmarks measured on Intel(R) Xeon(R) Gold 6154 here: https://bitbucket.org/snippets/rmlarsen/MXBkpM	2019-08-12 13:53:28 -07:00
Gael Guennebaud	c68bd2fa7a	Cleanup	2018-11-30 14:32:31 +01:00
Gael Guennebaud	b477d60bc6	Extend the generic psin_float code to handle cosine and make SSE and AVX use it (-> this adds pcos for AVX)	2018-11-30 11:26:30 +01:00
Gael Guennebaud	fa7fd61eda	Unify SSE/AVX psin functions. It is based on the SSE version which is much more accurate, though very slightly slower. This changeset also includes the following required changes: - add packet-float to packet-int type traits - add packet float<->int reinterpret casts - add faster pselect for AVX based on blendv	2018-11-27 22:41:51 +01:00
Gael Guennebaud	7655a8af6e	cleanup	2018-11-26 23:21:29 +01:00
Gael Guennebaud	502f92fa10	Unify SSE and AVX pexp for double.	2018-11-26 23:12:44 +01:00
Gael Guennebaud	cf8b85d5c5	Unify SSE and AVX implementation of pexp	2018-11-26 16:36:19 +01:00
Gael Guennebaud	2c44c40114	First step toward a unification of packet log implementation, currently only SSE and AVX are unified. To this end, I added the following functions: pzero, pcmp_*, pfrexp, pset1frombits functions.	2018-11-26 14:21:24 +01:00
Rasmus Munk Larsen	47150af1c8	Fix copy-paste error: Must use _mm256_cmp_ps for AVX.	2016-10-12 08:34:39 -07:00
Rasmus Munk Larsen	7f67e6dfdb	Update comment for fast sqrt.	2016-10-04 15:09:11 -07:00
Rasmus Munk Larsen	3ed67cb0bb	Fix a bug in the implementation of Carmack's fast sqrt algorithm in Eigen (enabled by EIGEN_FAST_MATH), which causes the vectorized parts of the computation to return -0.0 instead of NaN for negative arguments. Benchmark speed in Giga-sqrts/s Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz ----------------------------------------- SSE AVX Fast=1 2.529G 4.380G Fast=0 1.944G 1.898G Fast=1 fixed 2.214G 3.739G This table illustrates the worst case in terms speed impact: It was measured by repeatedly computing the sqrt of an n=4096 float vector that fits in L1 cache. For large vectors the operation becomes memory bound and the differences between the different versions almost negligible.	2016-10-04 14:22:56 -07:00
Gael Guennebaud	a4c266f827	Factorize the 4 copies of tanh implementations, make numext::tanh consistent with array::tanh, enable fast tanh in fast-math mode only.	2016-08-23 14:23:08 +02:00
Benoit Steiner	8ce46f9d89	Improved implementation of ptanh for SSE and AVX	2016-02-18 13:24:34 -08:00
Benoit Steiner	6d8b1dce06	Avoid implicit cast from double to float.	2016-02-10 18:07:11 -08:00
Benoit Steiner	2d523332b3	Optimized implementation of the hyperbolic tangent function for AVX	2016-02-10 08:48:05 -08:00
Gael Guennebaud	d2e288ae50	Workaround compilers that do not even define _mm256_set_m128.	2015-12-24 16:53:43 +01:00
Benoit Steiner	6d777e1bc7	Fixed a typo.	2015-12-18 19:25:50 -08:00
Gael Guennebaud	3abd8470ca	bug #1140 : remove custom definition and use of _mm256_setr_m128	2015-12-18 14:18:59 +01:00
Gael Guennebaud	75861f6650	bug #1069 : fix AVX support on MSVC (use of non portable C-style cast)	2015-09-28 10:08:26 +02:00
Benoit Steiner	1dded10cb7	Added a double-precision implementation of the exp() function for AVX.	2015-05-04 10:42:51 -07:00
Benoit Steiner	0196141938	Fixed the optimized AVX implementation of the fast rsqrt function	2015-03-02 13:49:39 -08:00
Benoit Steiner	4fd7f47692	Added an optimized version of rsqrt for SSE and AVX that is used when EIGEN_FAST_MATH is defined.	2015-03-02 09:38:47 -08:00
Benoit Steiner	f41b1f1666	Added support for fast reciprocal square root computation.	2015-02-26 09:42:41 -08:00
Benoit Steiner	0927801a84	Optimized version of the sin(), exp(), log() and sqrt() function for AVX	2015-02-13 16:07:08 -08:00

43 Commits