Rasmus Munk Larsen
|
ce62177b5b
|
Vectorize atanh & add a missing definition and unit test for atan.
|
2023-02-21 03:14:05 +00:00 |
|
Charles Schlosser
|
049a144798
|
Add typed logicals
|
2023-02-18 01:23:47 +00:00 |
|
Chip Kerchner
|
e797974689
|
Add and enable Packet int divide for Power10.
|
2023-02-17 19:04:18 +00:00 |
|
Antonio Sánchez
|
a16fb889dd
|
Guard complex sqrt on old MSVC compilers.
|
2023-02-16 19:47:00 +00:00 |
|
Charles Schlosser
|
71a8e60a7a
|
Tweak pasin_float, fix psqrt_complex
|
2023-02-15 01:01:14 +00:00 |
|
Antonio Sánchez
|
384269937f
|
More NEON packetmath fixes.
|
2023-02-14 21:45:25 +00:00 |
|
Antonio Sánchez
|
2dfbf1b251
|
Fix NEON make_packet2f.
|
2023-02-14 16:52:07 +00:00 |
|
Chip Kerchner
|
4a03409569
|
Fix problem with array conversions BF16->F32 in Power.
|
2023-02-13 21:30:45 +00:00 |
|
Chip Kerchner
|
0ecae61568
|
Disable array BF16 to F32 conversions in Power
|
2023-02-10 20:06:58 +00:00 |
|
Chip Kerchner
|
fba12e02b3
|
Fold extra column calculations into an extra MMA accumulator and other bfloat16 MMA GEMM improvements
|
2023-02-10 17:32:06 +00:00 |
|
Chip Kerchner
|
79cfc74f4d
|
Revert ODR changes and make gemm_extra_cols and gemm_complex_extra_cols EIGEN_ALWAYS_INLINE to avoid external functions.
|
2023-02-10 17:05:07 +00:00 |
|
Alexander Grund
|
f9659d91f1
|
Fix ODR violation with gemm_extra_cols on PPC
|
2023-02-09 22:16:06 +00:00 |
|
Charles Schlosser
|
325e3063d9
|
Optimize psign
|
2023-02-09 22:15:26 +00:00 |
|
Antonio Sánchez
|
0a5392d606
|
Fix MSVC arm build.
|
2023-02-08 21:46:37 +00:00 |
|
Chip Kerchner
|
e71f88abce
|
Change in Power eigen_asserts to eigen_internal_asserts since it is putting unnecessary error checking and assertions without NDEBUG.
|
2023-02-08 00:57:30 +00:00 |
|
Antonio Sánchez
|
f6cc359e10
|
More EIGEN_DEVICE_FUNC fixes for CUDA 10/11/12.
|
2023-02-03 19:18:45 +00:00 |
|
Chip Kerchner
|
4a58f30aa0
|
Fix pre-POWER8_VECTOR bugs in pcmp_lt and pnegate and reactivate psqrt.
|
2023-01-31 19:40:24 +00:00 |
|
Rasmus Munk Larsen
|
12ad99ce60
|
Remove unused variables from GenericPacketMathFunctions.h
|
2023-01-29 18:10:28 +00:00 |
|
Charles Schlosser
|
0471e61b4c
|
Optimize various mathematical packet ops
|
2023-01-28 01:34:26 +00:00 |
|
Chip Kerchner
|
ab8725d947
|
Turn off vectorize version of rsqrt - doesn't match generic version
|
2023-01-27 18:28:54 +00:00 |
|
Chip Kerchner
|
6fc9de7d93
|
Fix slowdown in bfloat16 MMA when rows is not a multiple of 8 or columns is not a multiple of 4.
|
2023-01-25 18:22:20 +00:00 |
|
Sean McBride
|
d70b4864d9
|
issue #2581: review and cleanup of compiler version checks
|
2023-01-17 18:58:34 +00:00 |
|
Mehdi Goli
|
b523120687
|
[SYCL-2020 Support] Enabling Intel DPCPP Compiler support to Eigen
|
2023-01-16 07:04:08 +00:00 |
|
Sergey Fedorov
|
4d05765345
|
Altivec fixes for Darwin: do not use unsupported VSX insns
|
2023-01-12 16:33:33 +00:00 |
|
Martin Burchell
|
c54785b071
|
Fix error: unused parameter 'tmp' [-Werror,-Wunused-parameter] on clang/32-bit arm
|
2023-01-10 21:15:28 +00:00 |
|
Chip Kerchner
|
d20fe21ae4
|
Improve performance for Power10 MMA bfloat16 GEMM
|
2023-01-06 23:08:37 +00:00 |
|
Ryan Senanayake
|
fe7f527787
|
Fix guard macros for emulated FP16 operators on GPU
|
2023-01-06 22:02:51 +00:00 |
|
Arthur
|
311cc0f9cc
|
Enable NEON pcmp, plset, and complex psqrt
|
2022-12-22 05:38:34 +00:00 |
|
Antonio Sánchez
|
bb6675caf7
|
Fix incorrect NEON native fp16 multiplication.
|
2022-12-19 20:46:44 +00:00 |
|
Arthur Feeney
|
c4fb6af24b
|
Enable NEON pabs for unsigned int types
|
2022-12-19 17:07:36 +00:00 |
|
Lianhuang Li
|
d194167149
|
Fix the bug using neon instruction fmla for data type half
|
2022-12-01 17:28:57 +00:00 |
|
Pedro Caldeira
|
31ab62d347
|
Add support for Power10 (AltiVec) MMA instructions for bfloat16.
|
2022-11-30 23:33:37 +00:00 |
|
Charles Schlosser
|
02805bd56c
|
Fix AVX2 psignbit
|
2022-11-16 13:43:11 +00:00 |
|
Chip Kerchner
|
399ce1ed63
|
Fix duplicate execution code for Power 8 Altivec in pstore_partial.
|
2022-11-16 13:41:42 +00:00 |
|
Antonio Sánchez
|
8588d8c74b
|
Correct pnegate for floating-point zero.
|
2022-11-15 18:07:23 +00:00 |
|
Antonio Sanchez
|
5eacb9e117
|
Put brackets around unsigned type names.
|
2022-11-15 09:09:45 -08:00 |
|
Antonio Sánchez
|
37e40dca85
|
Fix ambiguity in PPC for vec_splats call.
|
2022-11-14 18:58:16 +00:00 |
|
Charles Schlosser
|
9b6d624eab
|
fix neon
|
2022-11-08 20:03:01 +00:00 |
|
Rasmus Munk Larsen
|
7e398e9436
|
Add missing return keyword in psignbit for NEON.
|
2022-11-04 16:13:09 +00:00 |
|
Charles Schlosser
|
82b152dbe7
|
Add signbit function
|
2022-11-04 00:31:20 +00:00 |
|
Antonio Sánchez
|
886aad1361
|
Disable patan for double on PPC.
|
2022-10-27 17:56:08 +00:00 |
|
Rasmus Munk Larsen
|
462758e8a3
|
Don't use generic sign function for sign(complex) unless it is vectorizable
|
2022-10-12 16:03:29 +00:00 |
|
Rasmus Munk Larsen
|
72db3f0fa5
|
Remove references to M_PI_2 and M_PI_4.
|
2022-10-11 00:27:16 +00:00 |
|
Rasmus Munk Larsen
|
e95c4a837f
|
Simpler range reduction strategy for atan<float>().
|
2022-10-04 18:11:00 +00:00 |
|
Antonio Sánchez
|
80efbfdeda
|
Unconditionally enable CXX11 math.
|
2022-10-04 17:37:47 +00:00 |
|
Antonio Sánchez
|
e5794873cb
|
Replace assert with eigen_assert.
|
2022-10-04 17:11:23 +00:00 |
|
Rasmus Munk Larsen
|
1414a76fa9
|
Only vectorize atan<double> for Altivec if VSX is available.
|
2022-10-03 22:06:58 +00:00 |
|
Rasmus Munk Larsen
|
c475228b28
|
Vectorize atan() for double.
|
2022-10-01 01:49:30 +00:00 |
|
Rasmus Munk Larsen
|
1e1848fdb1
|
Add a vectorized implementation of atan2 to Eigen.
|
2022-09-28 20:46:49 +00:00 |
|
Rasmus Munk Larsen
|
13b69fc1b0
|
Try to reduce compilation time/memory for GEBP kernel using EIGEN_IF_CONSTEXPR
|
2022-09-23 20:09:42 +00:00 |
|