14 Commits

Author SHA1 Message Date
Rasmus Munk Larsen
a3298b22ec Implement vectorized versions of log1p and expm1 in Eigen using Kahan's formulas, and change the scalar implementations to properly handle infinite arguments.
Depending on instruction set, significant speedups are observed for the vectorized path:
log1p wall time is reduced 60-93% (2.5x - 15x speedup)
expm1 wall time is reduced 0-85% (1x - 7x speedup)

The scalar path is slower by 20-30% due to the extra branch needed to handle +infinity correctly.

Full benchmarks measured on Intel(R) Xeon(R) Gold 6154 here: https://bitbucket.org/snippets/rmlarsen/MXBkpM
2019-08-12 13:53:28 -07:00
Gael Guennebaud
3492a1ca74 fix plog(+inf) with AVX512 2019-01-09 16:53:37 +01:00
Gael Guennebaud
fa87f9d876 Add psin/pcos on AVX512 -> almost for free, at last! 2018-11-30 14:33:13 +01:00
Gael Guennebaud
0f780bb0b4 Fix float-to-double warning 2018-10-16 09:19:45 +02:00
Gael Guennebaud
97e2c808e9 Fix avx512 plog(NaN) to return NaN instead of +inf 2018-10-11 10:13:13 +02:00
Gael Guennebaud
b3f66d29a5 Enable avx512 plog with clang 2018-10-11 10:12:21 +02:00
Mark D Ryan
bc615e4585 Re-enable FMA for fast sqrt functions 2018-07-30 13:21:00 +02:00
Mark D Ryan
e79c5149bf Fix AVX512 implementations of psqrt
This commit fixes the AVX512 implementations of psqrt in the same
way that 3ed67cb0bb4af65fbf243df598604a8c7630bf7d
 fixed the AVX2 version of this function.  The
AVX512 versions of psqrt incorrectly return -0.0 for negative
values, instead of NaN.  Fixing the issues requires adding
some additional instructions that slow down the algorithms.  A
similar test to the one used in 3ed67cb0bb4af65fbf243df598604a8c7630bf7d
 shows that the
corrected Packet16f code runs at 73% of the speed of the existing code,
while the corrected Packed8d function runs at 68% of the original.
2018-06-25 05:05:02 -07:00
Jayaram Bobba
b7b868d1c4 fix AVX512 plog 2018-04-20 13:39:18 -07:00
Gael Guennebaud
40b4bf3d32 AVX512: _mm512_rsqrt28_ps is available for AVX512ER only 2018-04-03 14:36:27 +02:00
Gael Guennebaud
7b0630315f AVX512: fix psqrt and prsqrt 2018-04-03 14:12:50 +02:00
Benoit Steiner
d37ee89ca8 Disabled some of the AVX512 primitives on compilers that don't support them 2016-04-29 12:50:29 -07:00
Benoit Steiner
3ca1ae2bb7 Commented out the version of pexp<Packet8d> since it fails to compile with gcc 5.3 2016-02-04 13:49:06 -08:00
Benoit Steiner
23f69ab936 Added implementations of pexp, plog, psqrt, and prsqrt optimized for AVX512 2016-02-04 10:36:36 -08:00