Rasmus Munk Larsen
|
6c04d0cd68
|
Add missing exp2 definition for Altivec.
|
2024-10-28 18:12:36 +00:00 |
|
Peter Gavin
|
b15ebb1c2d
|
add nextafter for bfloat16
|
2024-10-26 00:08:25 +00:00 |
|
Charles Schlosser
|
37563856c9
|
Fix stack allocation assert
|
2024-10-25 17:02:43 +00:00 |
|
Rasmus Munk Larsen
|
3f067c4850
|
Add exp2() as a packet op and array method.
|
2024-10-22 22:09:34 +00:00 |
|
Charles Schlosser
|
4e5136d239
|
make fixed size matrices and arrays trivially_default_constructible
|
2024-10-21 17:10:15 +00:00 |
|
Antonio Sánchez
|
b396a6fbb2
|
Add free-function swap.
|
2024-10-14 15:51:40 +00:00 |
|
Charles Schlosser
|
820e8a45fb
|
add compile time info to reverse in place
|
2024-10-13 17:55:56 +00:00 |
|
Rasmus Munk Larsen
|
74dcfbbd0f
|
Use ppolevl for polynomial evaluation in more places.
|
2024-10-07 13:27:28 -07:00 |
|
Antonio Sánchez
|
12068cbcdb
|
Fix inverse evaluator for running on CUDA device.
|
2024-10-01 20:59:54 +00:00 |
|
Rasmus Munk Larsen
|
8e8c319087
|
Add missing EIGEN_DEVICE_FUNC annotations.
|
2024-10-01 11:40:58 -07:00 |
|
Sean McBride
|
b6b8b54e5e
|
Fixed issue #2858: removed unneeded call to _mm_setzero_si128
|
2024-09-24 16:29:45 +00:00 |
|
Frédéric BRIOL
|
2a3465102a
|
Refactor code to use constexpr for data() functions.
|
2024-09-23 16:43:53 +00:00 |
|
Charles Schlosser
|
2d4c9b400c
|
make fixed size matrices and arrays trivially_copy_constructible and trivially_move_constructible
|
2024-09-17 17:43:36 +00:00 |
|
Antonio Sánchez
|
132f281f50
|
Fix generic ceil for SSE2.
|
2024-09-14 01:31:21 +00:00 |
|
Charles Schlosser
|
84282c42fc
|
optimize new dot product
|
2024-09-11 21:40:43 +00:00 |
|
Charles Schlosser
|
fb477b8be1
|
Better dot products
|
2024-09-10 21:02:31 +00:00 |
|
qile lin
|
072ec9d954
|
Fix a bug for pcmp_lt_or_nan and Add sqrt support for SVE
|
2024-09-04 21:45:39 +00:00 |
|
Rasmus Munk Larsen
|
9315389795
|
Fix bug in bug fix for atanh.
|
2024-09-04 09:37:59 -07:00 |
|
Rasmus Munk Larsen
|
f33af052e0
|
Fix bug for atanh(-1).
|
2024-09-03 20:54:01 +00:00 |
|
Rasmus Munk Larsen
|
66927f7807
|
Fix out-of-range arguments to _mm_permute_pd.
|
2024-08-30 17:31:52 +00:00 |
|
Rasmus Munk Larsen
|
bbdabebf44
|
Vectorize atanh<double>. Make atanh(x) standard compliant for |x| >= 1.
|
2024-08-30 17:27:55 +00:00 |
|
Morris Hafner
|
26e2c4f617
|
Add nvc++ support
|
2024-08-30 12:34:48 +00:00 |
|
Charles Schlosser
|
648bce6cae
|
SSE/AVX Complex FMA
|
2024-08-29 17:37:57 +00:00 |
|
Charles Schlosser
|
9d3d37c5b7
|
Complex Numtraits::HasSign and nmsub test
|
2024-08-28 03:02:47 +00:00 |
|
qile lin
|
3b5a1b4157
|
sve instrinsics with "_x" suffix will be faster than "_z" suffix
|
2024-08-23 12:52:22 +00:00 |
|
Rasmus Munk Larsen
|
98f1ac5e65
|
Fix breakage in GPU build.
|
2024-08-23 06:08:37 +00:00 |
|
Tobias Wood
|
2bf8fe1489
|
NEON Complex Intrinsics
|
2024-08-22 22:46:16 +00:00 |
|
Rasmus Munk Larsen
|
f91f8e9ab9
|
Consolidate float and double implementations of patan().
|
2024-08-21 20:44:18 +00:00 |
|
Charles Schlosser
|
87239e058a
|
vectorize squaredNorm() for complex types
|
2024-08-21 10:54:17 +00:00 |
|
Rasmus Munk Larsen
|
32d95bb097
|
Add vectorized implementation of tanh<double>
|
2024-08-21 02:29:45 +00:00 |
|
Rasmus Munk Larsen
|
cc240eea2f
|
Speed up and improve accuracy of tanh.
|
2024-08-16 23:46:28 +00:00 |
|
Rasmus Munk Larsen
|
92e373e6f5
|
Speed up StableNorm for non-trivial sizes and improve consistency between aligned and unaligned inputs.
|
2024-08-14 21:42:04 +00:00 |
|
Rasmus Munk Larsen
|
ab310943d6
|
Add a yield instruction in the two spinloops of the threaded matmul implementation.
|
2024-08-09 10:48:24 -07:00 |
|
Rasmus Munk Larsen
|
99ffad1971
|
A few cleanups to threaded product code and test.
|
2024-08-09 09:35:23 -07:00 |
|
Charles Schlosser
|
59498c96fe
|
SSE/AVX use fmaddsub for complex products
|
2024-08-05 21:26:05 +00:00 |
|
Tyler Veness
|
d14b0a4e53
|
Remove C++23 check around has_denorm deprecation suppression
|
2024-08-03 21:34:27 +00:00 |
|
Jatin Chaudhary
|
24db460503
|
hlog symbol lookup should not restricted to global namespace
|
2024-08-03 03:59:13 +00:00 |
|
Alexander Grund
|
767e60e290
|
Fix Woverflow warnings in PacketMathFP16
|
2024-08-03 03:57:18 +00:00 |
|
Alexander Grund
|
8025683226
|
Fix conversion of Eigen::half to _Float16 in AVX512 code
|
2024-08-03 03:49:51 +00:00 |
|
Rasmus Munk Larsen
|
2b7b7aac57
|
Speed up complex * complex matrix multiplication.
|
2024-08-02 20:40:53 +00:00 |
|
Mike Taves
|
c593e9e948
|
Fix typos
|
2024-08-02 00:06:24 +00:00 |
|
Eugene Zhulenev
|
fd98cc49f1
|
Avoid atomic false sharing in RunQueue
|
2024-08-01 17:41:16 +00:00 |
|
Frédéric Chapoton
|
6331da95eb
|
fixing a lot of typos
|
2024-07-30 22:15:49 +00:00 |
|
Rasmus Munk Larsen
|
d791d48859
|
Fix AVX512FP16 build failure
|
2024-06-18 22:34:32 +00:00 |
|
Charles Schlosser
|
2fae4d7a77
|
Revert "fix scalar pselect"
|
2024-06-15 20:02:28 +00:00 |
|
Charles Schlosser
|
b430eb31e2
|
AVX512F double->int64_t cast
|
2024-06-15 17:45:02 +00:00 |
|
Charles Schlosser
|
02bcf9b591
|
fix scalar pselect
|
2024-06-10 17:30:22 +00:00 |
|
Louis David
|
392b95bdf1
|
allow pointer_based_stl_iterator to conform to the contiguous_iterator concept if we are in c++20
|
2024-06-06 21:38:09 +00:00 |
|
Victor Ceballos
|
27f8176254
|
fixing warning C5054: operator '==': deprecated between enumerations of different types
|
2024-06-04 16:44:13 +03:00 |
|
Charles Schlosser
|
eac6355df2
|
Fix warnings created by other warnings fix
|
2024-06-01 03:37:04 +00:00 |
|