12762 Commits

Author SHA1 Message Date
Antonio Sánchez
7b6623af30 Fix special packetmath erfc flushing for ARM32. 2024-12-07 01:42:30 +00:00
Antonio Sánchez
fd48fbb260 Update rocm docker again again. 2024-12-06 22:13:53 +00:00
Antonio Sánchez
a885340ba5 Update rocm docker again. 2024-12-06 17:19:31 +00:00
Antonio Sanchez
45a8478d09 Update rocm docker image in CI. 2024-12-06 07:14:59 -08:00
Antonio Sánchez
de4afcf414 Add a deploy phase to the CI that tags the latest nightly pipeline if it passes. 2024-12-05 15:28:18 +00:00
Charles Schlosser
5e8916050b move constructor / move assignment doc strings 2024-12-04 17:42:20 +00:00
Charles Schlosser
77a073aaa8 fix checkformat ci stage 2024-12-04 02:45:52 +00:00
Charles Schlosser
41e46ed243 fix IOFormat alignment 2024-12-04 01:13:48 +00:00
Charles Schlosser
a0d32e40d9 fix map fill logic 2024-11-30 13:39:02 +00:00
Charles Schlosser
d34b100c13 Fix UB in setZero 2024-11-27 19:32:14 +00:00
Rasmus Munk Larsen
f19a6803c8 Refactor special case handling in pow(x,y) and revert to repeated squaring for <float,int> 2024-11-27 00:24:21 +00:00
Rasmus Munk Larsen
5064cb7d5e Add test for using pcast on scalars. 2024-11-25 22:27:26 -08:00
Rasmus Munk Larsen
1ea61a5d26 Improve pow(x,y): 25% speedup, increase accuracy for integer exponents. 2024-11-26 06:13:48 +00:00
Charles Schlosser
8ad4344ca7 optimize setConstant, setZero 2024-11-22 03:39:19 +00:00
Rasmus Munk Larsen
5610a13b77 Simplify and speed up pow() by 5-6% 2024-11-20 12:45:00 +00:00
Rasmus Munk Larsen
6c6ce9d06b Enable vectorized erf<double>(x) for SSE and AVX, which was accidentally removed in merge request 1750. 2024-11-19 22:14:29 +00:00
Rasmus Munk Larsen
e7c799b7c9 Prevent premature overflow to infinity in exp(x). The changes also provide a 3-4% speedup. 2024-11-19 13:08:18 -08:00
Rasmus Munk Larsen
00af47102d Revert 040180078d 2024-11-19 10:25:16 -08:00
Rasmus Munk Larsen
8ee6f8475a Speed up exp(x). 2024-11-19 17:50:34 +00:00
Charles Schlosser
93ec5450cb disable fill_n optimization for msvc 2024-11-19 01:38:48 +00:00
Rasmus Munk Larsen
0af6ab4b76 Remove unnecessary check for HasBlend trait. 2024-11-18 21:16:45 +00:00
Rasmus Munk Larsen
d5eec781b7 Get rid of redundant computation for large arguments to erf(x). 2024-11-18 10:51:58 -08:00
Tyler Veness
2fc63808e4 Fix C++20 constexpr test compilation failures 2024-11-18 01:56:55 +00:00
Rasmus Munk Larsen
5133c836c0 Vectorize erf(x) for double. 2024-11-16 19:05:16 +00:00
Conrad Poelman
d6e3b528b2 Update Assign_MKL.h to cast disparate enum type to int, so it can be compared... 2024-11-15 20:00:29 +00:00
breathe1
040180078d Ensure that destructor's needed by lldb make it into binary in non-inlined fashion 2024-11-15 17:15:09 +00:00
Tyler Veness
0fb2ed140d Make element accessors constexpr 2024-11-14 01:05:29 +00:00
Charles Schlosser
8b4efc8ed8 check_size_for_overflow: use numeric limits instead of c99 macro 2024-11-13 00:35:35 +00:00
Charles Schlosser
489dbbc651 make fixed_size matrices conform to std::is_standard_layout 2024-11-12 23:34:26 +00:00
Rasmus Munk Larsen
283d871a3f Add missing EIGEN_DEVICE_FUNCTION decorations. 2024-11-08 14:25:57 -08:00
Rasmus Munk Larsen
0d366f6532 Vectorize erfc(x) for double and improve erfc(x) for float. 2024-11-08 17:21:11 +00:00
Charles Schlosser
8adf43640e more avx predux_any 2024-11-07 19:58:48 +00:00
Charles Schlosser
bc424f617a add missing avx predux_any functions 2024-11-07 19:11:29 +00:00
Charles Schlosser
e52ac76ca3 use EIGEN_CPLUSPLUS instead of checking cpp version 2024-11-06 17:25:22 +00:00
Rasmus Munk Larsen
122be167cd Revert "make fixed-size objects trivially move assignable" 2024-11-06 01:09:38 +00:00
Tobias Wood
d49021212b Tensor Roll / Circular Shift / Rotate 2024-11-05 14:10:19 +00:00
Charles Schlosser
bb73be8a2e make fixed-size objects trivially move assignable 2024-11-04 17:55:27 +00:00
Antonio Sánchez
7fd305ecae Fix GPU builds. 2024-11-01 04:50:03 +00:00
Morris Hafner
c8267654f2 Don't use __builtin_alloca_with_align with nvc++ 2024-10-30 18:02:08 +00:00
Tyler Veness
84c446df2c Fix macro redefinition warning in FFTW test 2024-10-30 17:18:42 +00:00
Antonio Sánchez
a9584d8e3c Fix clang6 failures. 2024-10-30 14:41:50 +00:00
Antonio Sánchez
dd4c2805d9 Fix clang6 failures. 2024-10-29 22:18:30 +00:00
Antonio Sánchez
9e962d9c54 Fix OOB access in triangular matrix multiplication. 2024-10-29 19:07:07 +00:00
Antonio Sánchez
695e49d1bd Fix NVCC builds for CUDA 10+. 2024-10-29 18:38:14 +00:00
Antonio Sánchez
dae09773fc Don't pass matrices by value. 2024-10-29 18:19:02 +00:00
Rasmus Munk Larsen
c23ec3420e Add tests for sizeof() with one dynamic dimension. 2024-10-28 13:48:53 -07:00
Rasmus Munk Larsen
58b252e5b3 Fix typo in PacketMath.h 2024-10-28 18:19:52 +00:00
Rasmus Munk Larsen
6c04d0cd68 Add missing exp2 definition for Altivec. 2024-10-28 18:12:36 +00:00
Peter Gavin
b15ebb1c2d add nextafter for bfloat16 2024-10-26 00:08:25 +00:00
Rasmus Munk Larsen
53b83cddf9 Include <type_traits> in main.h for std::is_trivial* 2024-10-25 20:55:51 +00:00