Charles Schlosser
|
0471e61b4c
|
Optimize various mathematical packet ops
|
2023-01-28 01:34:26 +00:00 |
|
Antonio Sánchez
|
17ae83a966
|
Fix bugs exposed by enabling GPU asserts.
|
2023-01-27 21:43:00 +00:00 |
|
Chip Kerchner
|
ab8725d947
|
Turn off vectorize version of rsqrt - doesn't match generic version
|
2023-01-27 18:28:54 +00:00 |
|
Charles Schlosser
|
6d9f662a70
|
Tweak atan2
|
2023-01-26 17:38:21 +00:00 |
|
Chip Kerchner
|
6fc9de7d93
|
Fix slowdown in bfloat16 MMA when rows is not a multiple of 8 or columns is not a multiple of 4.
|
2023-01-25 18:22:20 +00:00 |
|
Rasmus Munk Larsen
|
576448572f
|
More fixes for __GNUC_PATCHLEVEL__.
|
2023-01-23 17:04:24 +00:00 |
|
Rasmus Munk Larsen
|
164ddf75ab
|
Use __GNUC_PATCHLEVEL__ rather than __GNUC_PATCH__, according to the documentation https://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html
|
2023-01-23 16:56:14 +00:00 |
|
Antonio Sánchez
|
08c961e837
|
Add custom ODR-safe assert.
|
2023-01-20 17:38:13 +00:00 |
|
Sean McBride
|
d70b4864d9
|
issue #2581: review and cleanup of compiler version checks
|
2023-01-17 18:58:34 +00:00 |
|
Mehdi Goli
|
b523120687
|
[SYCL-2020 Support] Enabling Intel DPCPP Compiler support to Eigen
|
2023-01-16 07:04:08 +00:00 |
|
tttapa
|
bae119bb7e
|
Support per-thread is_malloc_allowed() state
|
2023-01-16 01:34:56 +00:00 |
|
Antonio Sánchez
|
2e61c0c6b4
|
Add missing EIGEN_DEVICE_FUNC in a few places when called by asserts.
|
2023-01-15 02:06:17 +00:00 |
|
Charles Schlosser
|
68082b8226
|
Fix QR, again
|
2023-01-13 03:23:17 +00:00 |
|
Sergey Fedorov
|
4d05765345
|
Altivec fixes for Darwin: do not use unsupported VSX insns
|
2023-01-12 16:33:33 +00:00 |
|
Rasmus Munk Larsen
|
6156797016
|
Revert "Add template to specify QR permutation index type, Fix ColPivHouseholderQR Lapacke bindings"
This reverts commit be7791e097c1fc21d4f2e8713467431784f3a4fd
|
2023-01-11 18:50:52 +00:00 |
|
Charles Schlosser
|
be7791e097
|
Add template to specify QR permutation index type, Fix ColPivHouseholderQR Lapacke bindings
|
2023-01-11 15:57:28 +00:00 |
|
Martin Burchell
|
c54785b071
|
Fix error: unused parameter 'tmp' [-Werror,-Wunused-parameter] on clang/32-bit arm
|
2023-01-10 21:15:28 +00:00 |
|
Chip Kerchner
|
d20fe21ae4
|
Improve performance for Power10 MMA bfloat16 GEMM
|
2023-01-06 23:08:37 +00:00 |
|
Ryan Senanayake
|
fe7f527787
|
Fix guard macros for emulated FP16 operators on GPU
|
2023-01-06 22:02:51 +00:00 |
|
Antonio Sánchez
|
262194f12c
|
Fix a bunch of minor build and test issues.
|
2023-01-06 16:37:26 +00:00 |
|
Antonio Sánchez
|
3564668908
|
Fix overalign check.
|
2023-01-05 17:10:48 +00:00 |
|
Charles Schlosser
|
f3929ac7ed
|
Fix EIGEN_HAS_CXX17_OVERALIGN for icc
|
2023-01-03 17:30:10 +00:00 |
|
Arthur
|
311cc0f9cc
|
Enable NEON pcmp, plset, and complex psqrt
|
2022-12-22 05:38:34 +00:00 |
|
Antonio Sánchez
|
dbf7ae6f9b
|
Fix up C++ version detection macros and cmake tests.
|
2022-12-20 18:06:03 +00:00 |
|
Antonio Sánchez
|
bb6675caf7
|
Fix incorrect NEON native fp16 multiplication.
|
2022-12-19 20:46:44 +00:00 |
|
Arthur Feeney
|
c4fb6af24b
|
Enable NEON pabs for unsigned int types
|
2022-12-19 17:07:36 +00:00 |
|
Alexander Richardson
|
37de432907
|
Avoid using std::raise() for divide by zero
|
2022-12-14 20:06:16 +00:00 |
|
Alexander Richardson
|
62de593c40
|
Allow std::initializer_list constructors in constexpr expressions
|
2022-12-14 17:05:37 +00:00 |
|
Charles Schlosser
|
6d3e3678b4
|
optimize equalspace packetop
|
2022-12-13 01:22:25 +00:00 |
|
Charles Schlosser
|
2004831941
|
add EqualSpaced / setEqualSpaced
|
2022-12-13 00:54:57 +00:00 |
|
Antonio Sánchez
|
03c9b4738c
|
Enable direct access for NestByValue.
|
2022-12-07 18:21:45 +00:00 |
|
Chip Kerchner
|
b59f18b4f7
|
Increase L2 and L3 cache size for Power10.
|
2022-12-07 18:20:33 +00:00 |
|
Lianhuang Li
|
d194167149
|
Fix the bug using neon instruction fmla for data type half
|
2022-12-01 17:28:57 +00:00 |
|
Pedro Caldeira
|
31ab62d347
|
Add support for Power10 (AltiVec) MMA instructions for bfloat16.
|
2022-11-30 23:33:37 +00:00 |
|
Antonio Sánchez
|
2260e11eb0
|
Fix reshape strides when input has non-zero inner stride.
|
2022-11-29 19:39:29 +00:00 |
|
Charles Schlosser
|
044f3f6234
|
Fix bug in handmade_aligned_realloc
|
2022-11-18 22:35:31 +00:00 |
|
Charles Schlosser
|
02805bd56c
|
Fix AVX2 psignbit
|
2022-11-16 13:43:11 +00:00 |
|
Chip Kerchner
|
399ce1ed63
|
Fix duplicate execution code for Power 8 Altivec in pstore_partial.
|
2022-11-16 13:41:42 +00:00 |
|
Gabriele Buondonno
|
6431dfdb50
|
Cross product for vectors of size 2. Fixes #1037
|
2022-11-15 22:39:42 +00:00 |
|
Antonio Sánchez
|
8588d8c74b
|
Correct pnegate for floating-point zero.
|
2022-11-15 18:07:23 +00:00 |
|
Antonio Sanchez
|
5eacb9e117
|
Put brackets around unsigned type names.
|
2022-11-15 09:09:45 -08:00 |
|
Antonio Sánchez
|
37e40dca85
|
Fix ambiguity in PPC for vec_splats call.
|
2022-11-14 18:58:16 +00:00 |
|
Charles Schlosser
|
9b6d624eab
|
fix neon
|
2022-11-08 20:03:01 +00:00 |
|
Rasmus Munk Larsen
|
7e398e9436
|
Add missing return keyword in psignbit for NEON.
|
2022-11-04 16:13:09 +00:00 |
|
Charles Schlosser
|
82b152dbe7
|
Add signbit function
|
2022-11-04 00:31:20 +00:00 |
|
Antonio Sanchez
|
01a31b81b2
|
Remove unused parameter name.
|
2022-11-01 15:51:25 -07:00 |
|
Antonio Sánchez
|
c5b896c5a3
|
Allow empty matrices to be resized.
|
2022-10-27 20:33:35 +00:00 |
|
Antonio Sánchez
|
886aad1361
|
Disable patan for double on PPC.
|
2022-10-27 17:56:08 +00:00 |
|
Antonio Sánchez
|
ab407b2b6e
|
Fix handmade_aligned_malloc offset computation.
|
2022-10-27 17:33:47 +00:00 |
|
Charles Schlosser
|
a226371371
|
Change handmade_aligned_malloc/realloc/free to store a 1 byte offset instead of absolute address
|
2022-10-22 22:51:31 +00:00 |
|