Mehdi Goli
|
0623791930
|
[SYCL-2020] Enabling USM support for SYCL. SYCL-1.2.1 did not have support for USM.
|
2023-05-05 17:30:36 +00:00 |
|
Chip Kerchner
|
b8208b363c
|
Specialized loadColData correctly - fix previous BF16 GEMV MR
|
2023-05-04 16:38:17 +00:00 |
|
Chip Kerchner
|
fda1373a15
|
Fix ColMajor BF16 GEMV for when vector is RowMajor
|
2023-05-03 20:12:50 +00:00 |
|
Chip Kerchner
|
6418ac0285
|
Unroll F32 to BF16 loop - 1.8X faster conversions for LLVM. Use vector pairs for GCC.
|
2023-05-01 16:54:16 +00:00 |
|
Charles Schlosser
|
c9a14f48d9
|
SSE Packet4ui has pcmp, pmin, pmax
|
2023-04-28 20:36:08 +00:00 |
|
Antonio Sánchez
|
2d0c6ad873
|
Revert "Vectorize cast"
This reverts commit eb5ff1861a4783876564a1a79573c3b9ff566863
|
2023-04-26 18:03:36 +00:00 |
|
Charles Schlosser
|
8999525c29
|
AVX2: Packet4ul has pmul, abs2
|
2023-04-26 16:22:16 +00:00 |
|
Charles Schlosser
|
eb5ff1861a
|
Vectorize cast
|
2023-04-26 02:50:13 +00:00 |
|
Charles Schlosser
|
f6cf5dca80
|
Packet4ul does not have Abs2
|
2023-04-21 19:48:01 +00:00 |
|
Chip Kerchner
|
03f646b7e3
|
New VSX version of BF16 GEMV (Power) - up to 6.7X faster
|
2023-04-21 17:06:59 +00:00 |
|
Charles Schlosser
|
29c8e3c754
|
fix pow for uint32_t, disable pmul<Packet4ul>
|
2023-04-21 05:47:56 +00:00 |
|
Rasmus Munk Larsen
|
a347dbbab2
|
Delete last few occurences of HasHalfPacket.
|
2023-04-19 10:36:59 -07:00 |
|
Charles Schlosser
|
2b954be663
|
fix typo in sse packetmath
|
2023-04-18 18:17:41 +00:00 |
|
Rasmus Munk Larsen
|
25685c90ad
|
Fix incorrect packet type for unsigned int version of pfirst() in MSVC workaround in PacketMath.h.
|
2023-04-18 17:46:23 +00:00 |
|
Chip Kerchner
|
3f3ce214e6
|
New BF16 pcast functions and move type casting to TypeCasting.h
|
2023-04-18 02:38:38 +00:00 |
|
Pedro Gonnet
|
17b5b4de58
|
Add Packet4ui , Packet8ui , and Packet4ul to the SSE /AVX PacketMath.h headers
|
2023-04-17 23:33:59 +00:00 |
|
Chip Kerchner
|
1148f0a9ec
|
Add dynamic dispatch to BF16 GEMM (Power) and new VSX version
|
2023-04-14 22:20:42 +00:00 |
|
Rasmus Munk Larsen
|
554fe02ae3
|
Enable new AVX512 GEMM kernel by default.
|
2023-04-12 13:39:06 -07:00 |
|
b-shi
|
15fbddaf9b
|
ASAN fixes for AVX512 GEMM/TRSM
|
2023-04-04 15:54:24 -07:00 |
|
Rasmus Munk Larsen
|
df1049ddf4
|
Small packet math cleanup.
|
2023-04-04 16:14:32 +00:00 |
|
Rasmus Munk Larsen
|
c730290fa0
|
Use the correct truncating intrinsic for double->int casting.
|
2023-04-03 13:56:41 -07:00 |
|
Rasmus Munk Larsen
|
1a5dfd7c0f
|
Fix incorrect casting in AVX512DQ path.
|
2023-03-27 09:28:06 -07:00 |
|
Rasmus Munk Larsen
|
b8b8a26145
|
Add more missing vectorized casts for int on x86, and remove redundant unit tests
|
2023-03-24 16:02:00 +00:00 |
|
Rasmus Munk Larsen
|
d57a79e512
|
Optimize float->bool cast for AVX2, based on Charles Schlosser's comments.
|
2023-03-21 20:59:25 -07:00 |
|
Rasmus Munk Larsen
|
a5ae832773
|
Fix reversal of arguments to _mm256_set_m128() in pcast<Packet4d, Packet8f>.
|
2023-03-22 03:21:44 +00:00 |
|
Rasmus Munk Larsen
|
09945f2cc1
|
Optimize casting for x86_64.
|
2023-03-21 18:24:16 +00:00 |
|
Antonio Sánchez
|
2c8011c2dd
|
Fix arm builds.
|
2023-03-20 16:59:38 +00:00 |
|
Rasmus Munk Larsen
|
0488b708b4
|
Vectorize tensor.isnan() by using typed predicates.
|
2023-03-16 04:04:22 +00:00 |
|
Chip Kerchner
|
d71ac6a755
|
Fix recent PowerPC warnings and clang warning
|
2023-03-15 16:50:46 +00:00 |
|
Chip Kerchner
|
23e1541863
|
Put deadcode checks back in from previous change.
|
2023-03-14 00:57:16 +00:00 |
|
Chip Kerchner
|
6c58f0fe1f
|
Revert changes that made BF16 GEMM to cause bad register spillage for LLVM (Power)
|
2023-03-13 23:36:06 +00:00 |
|
Chip Kerchner
|
9d72412385
|
Add MMA to BF16 GEMV - 5.0-6.3X faster (for Power)
|
2023-03-13 19:37:13 +00:00 |
|
Rasmus Munk Larsen
|
ee0ff0ab3a
|
Fix typo in MathFunctions.h
|
2023-03-13 15:50:40 +00:00 |
|
Rasmus Munk Larsen
|
21c49e8f8e
|
Delete mystery character from Eigen/src/Core/arch/NEON/MathFunctions.h
|
2023-03-10 23:27:24 +00:00 |
|
Antonio Sánchez
|
394aabb0a3
|
Fix failing MSVC tests due to compiler bugs.
|
2023-03-10 22:36:57 +00:00 |
|
Rasmus Munk Larsen
|
d6235d76db
|
Clean up generic packetmath specializations for various backends with the help of a macro.
|
2023-03-10 22:02:23 +00:00 |
|
Chip Kerchner
|
2b513ca2a0
|
Added partial linear access for LHS & Output - 30% faster for bfloat16 GEMM MMA (Power)
|
2023-03-02 19:22:43 +00:00 |
|
Antonio Sánchez
|
62d5cfe835
|
Fix ODR issues with Intel's AVX512 TRSM kernels.
|
2023-02-27 07:54:52 +00:00 |
|
Chip Kerchner
|
e4598fedbe
|
Fix compiler versions for certain instructions on Power.
|
2023-02-23 23:24:41 +00:00 |
|
Rasmus Munk Larsen
|
1c0a6cf228
|
Get rid of EIGEN_HAS_AVX512_MATH workaround.
|
2023-02-23 23:16:41 +00:00 |
|
Rasmus Munk Larsen
|
6bcd941ee3
|
Use pmsub in twoprod. This speeds up pow() on Skylake by ~1%.
|
2023-02-21 20:09:29 +00:00 |
|
Rasmus Munk Larsen
|
ce62177b5b
|
Vectorize atanh & add a missing definition and unit test for atan.
|
2023-02-21 03:14:05 +00:00 |
|
Charles Schlosser
|
049a144798
|
Add typed logicals
|
2023-02-18 01:23:47 +00:00 |
|
Chip Kerchner
|
e797974689
|
Add and enable Packet int divide for Power10.
|
2023-02-17 19:04:18 +00:00 |
|
Antonio Sánchez
|
a16fb889dd
|
Guard complex sqrt on old MSVC compilers.
|
2023-02-16 19:47:00 +00:00 |
|
Charles Schlosser
|
71a8e60a7a
|
Tweak pasin_float, fix psqrt_complex
|
2023-02-15 01:01:14 +00:00 |
|
Antonio Sánchez
|
384269937f
|
More NEON packetmath fixes.
|
2023-02-14 21:45:25 +00:00 |
|
Antonio Sánchez
|
2dfbf1b251
|
Fix NEON make_packet2f.
|
2023-02-14 16:52:07 +00:00 |
|
Chip Kerchner
|
4a03409569
|
Fix problem with array conversions BF16->F32 in Power.
|
2023-02-13 21:30:45 +00:00 |
|
Chip Kerchner
|
0ecae61568
|
Disable array BF16 to F32 conversions in Power
|
2023-02-10 20:06:58 +00:00 |
|