5029 Commits

Author SHA1 Message Date
Rasmus Munk Larsen
6156797016 Revert "Add template to specify QR permutation index type, Fix ColPivHouseholderQR Lapacke bindings"
This reverts commit be7791e097c1fc21d4f2e8713467431784f3a4fd
2023-01-11 18:50:52 +00:00
Charles Schlosser
be7791e097 Add template to specify QR permutation index type, Fix ColPivHouseholderQR Lapacke bindings 2023-01-11 15:57:28 +00:00
Martin Burchell
c54785b071 Fix error: unused parameter 'tmp' [-Werror,-Wunused-parameter] on clang/32-bit arm 2023-01-10 21:15:28 +00:00
Chip Kerchner
d20fe21ae4 Improve performance for Power10 MMA bfloat16 GEMM 2023-01-06 23:08:37 +00:00
Ryan Senanayake
fe7f527787 Fix guard macros for emulated FP16 operators on GPU 2023-01-06 22:02:51 +00:00
Antonio Sánchez
262194f12c Fix a bunch of minor build and test issues. 2023-01-06 16:37:26 +00:00
Antonio Sánchez
3564668908 Fix overalign check. 2023-01-05 17:10:48 +00:00
Charles Schlosser
f3929ac7ed Fix EIGEN_HAS_CXX17_OVERALIGN for icc 2023-01-03 17:30:10 +00:00
Arthur
311cc0f9cc Enable NEON pcmp, plset, and complex psqrt 2022-12-22 05:38:34 +00:00
Antonio Sánchez
dbf7ae6f9b Fix up C++ version detection macros and cmake tests. 2022-12-20 18:06:03 +00:00
Antonio Sánchez
bb6675caf7 Fix incorrect NEON native fp16 multiplication. 2022-12-19 20:46:44 +00:00
Arthur Feeney
c4fb6af24b Enable NEON pabs for unsigned int types 2022-12-19 17:07:36 +00:00
Alexander Richardson
37de432907 Avoid using std::raise() for divide by zero 2022-12-14 20:06:16 +00:00
Alexander Richardson
62de593c40 Allow std::initializer_list constructors in constexpr expressions 2022-12-14 17:05:37 +00:00
Charles Schlosser
6d3e3678b4 optimize equalspace packetop 2022-12-13 01:22:25 +00:00
Charles Schlosser
2004831941 add EqualSpaced / setEqualSpaced 2022-12-13 00:54:57 +00:00
Antonio Sánchez
03c9b4738c Enable direct access for NestByValue. 2022-12-07 18:21:45 +00:00
Chip Kerchner
b59f18b4f7 Increase L2 and L3 cache size for Power10. 2022-12-07 18:20:33 +00:00
Lianhuang Li
d194167149 Fix the bug using neon instruction fmla for data type half 2022-12-01 17:28:57 +00:00
Pedro Caldeira
31ab62d347 Add support for Power10 (AltiVec) MMA instructions for bfloat16. 2022-11-30 23:33:37 +00:00
Antonio Sánchez
2260e11eb0 Fix reshape strides when input has non-zero inner stride. 2022-11-29 19:39:29 +00:00
Charles Schlosser
044f3f6234 Fix bug in handmade_aligned_realloc 2022-11-18 22:35:31 +00:00
Charles Schlosser
02805bd56c Fix AVX2 psignbit 2022-11-16 13:43:11 +00:00
Chip Kerchner
399ce1ed63 Fix duplicate execution code for Power 8 Altivec in pstore_partial. 2022-11-16 13:41:42 +00:00
Gabriele Buondonno
6431dfdb50 Cross product for vectors of size 2. Fixes #1037 2022-11-15 22:39:42 +00:00
Antonio Sánchez
8588d8c74b Correct pnegate for floating-point zero. 2022-11-15 18:07:23 +00:00
Antonio Sanchez
5eacb9e117 Put brackets around unsigned type names. 2022-11-15 09:09:45 -08:00
Antonio Sánchez
37e40dca85 Fix ambiguity in PPC for vec_splats call. 2022-11-14 18:58:16 +00:00
Charles Schlosser
9b6d624eab fix neon 2022-11-08 20:03:01 +00:00
Rasmus Munk Larsen
7e398e9436 Add missing return keyword in psignbit for NEON. 2022-11-04 16:13:09 +00:00
Charles Schlosser
82b152dbe7 Add signbit function 2022-11-04 00:31:20 +00:00
Antonio Sanchez
01a31b81b2 Remove unused parameter name. 2022-11-01 15:51:25 -07:00
Antonio Sánchez
c5b896c5a3 Allow empty matrices to be resized. 2022-10-27 20:33:35 +00:00
Antonio Sánchez
886aad1361 Disable patan for double on PPC. 2022-10-27 17:56:08 +00:00
Antonio Sánchez
ab407b2b6e Fix handmade_aligned_malloc offset computation. 2022-10-27 17:33:47 +00:00
Charles Schlosser
a226371371 Change handmade_aligned_malloc/realloc/free to store a 1 byte offset instead of absolute address 2022-10-22 22:51:31 +00:00
Rasmus Munk Larsen
3bb6a48d8c Fix bug atan2 2022-10-12 23:49:32 +00:00
Rasmus Munk Larsen
14c847dc0e Refactor special values test for pow, and add a similar test for atan2 2022-10-12 20:12:08 +00:00
Rasmus Munk Larsen
462758e8a3 Don't use generic sign function for sign(complex) unless it is vectorizable 2022-10-12 16:03:29 +00:00
Rasmus Munk Larsen
c0d6a72611 Use pnegate(pzero(x)) as a generic way to generate -0.0. Some compiler do not handle the literal -0.0 properly in fastmath mode. 2022-10-12 01:57:05 +00:00
Rasmus Munk Larsen
3167544873 Handle NaN inputs to atan2. 2022-10-10 19:36:36 -07:00
Rasmus Munk Larsen
72db3f0fa5 Remove references to M_PI_2 and M_PI_4. 2022-10-11 00:27:16 +00:00
Rasmus Munk Larsen
e95c4a837f Simpler range reduction strategy for atan<float>(). 2022-10-04 18:11:00 +00:00
Antonio Sánchez
80efbfdeda Unconditionally enable CXX11 math. 2022-10-04 17:37:47 +00:00
Antonio Sánchez
e5794873cb Replace assert with eigen_assert. 2022-10-04 17:11:23 +00:00
Rasmus Munk Larsen
1414a76fa9 Only vectorize atan<double> for Altivec if VSX is available. 2022-10-03 22:06:58 +00:00
Rasmus Munk Larsen
c475228b28 Vectorize atan() for double. 2022-10-01 01:49:30 +00:00
Rasmus Munk Larsen
1e1848fdb1 Add a vectorized implementation of atan2 to Eigen. 2022-09-28 20:46:49 +00:00
Rasmus Munk Larsen
b3bf8d6a13 Try to reduce size of GEBP kernel for non-ARM targets. 2022-09-28 02:37:18 +00:00
Rasmus Munk Larsen
13b69fc1b0 Try to reduce compilation time/memory for GEBP kernel using EIGEN_IF_CONSTEXPR 2022-09-23 20:09:42 +00:00