1082 Commits

Author SHA1 Message Date
Antonio Sánchez
17d57fb168 Fix up PowerPC MMA flags so it builds by default.
(cherry picked from commit 591906477bc8c8102dbefceefe10d81648865394)
2023-07-11 16:27:32 -07:00
Antonio Sánchez
6973687c70 Fix up PowerPC MMA flags so it builds by default.
(cherry picked from commit 65eeedf9646ee6efc457cc3a8f8d9030a6f83689)
2023-07-11 16:20:57 -07:00
Kevin Leonardic
daa0b70a65 Fix argument for _mm256_cvtps_ph imm parameter
(cherry picked from commit d4b05454a7b33139ce6636584550780ff15af6ed)
2023-07-10 15:30:41 -07:00
Antonio Sánchez
e6e921f0e3 Disable FP16 arithmetic for arm32.
(cherry picked from commit 7465b7651edfb58322557179658853243eb96372)
2023-07-10 15:30:41 -07:00
Antonio Sánchez
72b0759451 Fix arm builds.
(cherry picked from commit 2c8011c2dd72d6c2086b181aad8cbb6204fed5db)
2023-07-10 14:52:08 -07:00
Antonio Sánchez
99473f255b Fix failing MSVC tests due to compiler bugs.
(cherry picked from commit 394aabb0a3976d95a5c6f286d49e43bb49558cc2)
2023-07-10 14:52:08 -07:00
Antonio Sánchez
dae8c6d7ad Guard complex sqrt on old MSVC compilers.
(cherry picked from commit a16fb889dd5890b2a0788af10568f19155e6b262)
2023-07-10 14:52:07 -07:00
Antonio Sánchez
2dfdaa2abf More NEON packetmath fixes.
(cherry picked from commit 384269937f707669fb1ab65bee7e9bfca2c2dfa1)
2023-07-10 14:52:03 -07:00
Antonio Sánchez
a659b5dbb2 Fix NEON make_packet2f.
(cherry picked from commit 2dfbf1b251e7a32c140f36fc865b154b8a725bdd)
2023-07-10 14:34:09 -07:00
Antonio Sánchez
879854382c Fix MSVC arm build.
(cherry picked from commit 0a5392d6061134a4a32d0025fa154f830b83d606)
2023-07-10 14:34:09 -07:00
Martin Burchell
b26ada1e03 Fix error: unused parameter 'tmp' [-Werror,-Wunused-parameter] on clang/32-bit arm
(cherry picked from commit c54785b071e6297c062883cf43f323525ff0e6fb)
2023-07-10 14:34:09 -07:00
Antonio Sánchez
5547205092 Correct pnegate for floating-point zero.
(cherry picked from commit 8588d8c74b42eedde578af01605ecc90189bc329)
2023-07-10 14:34:04 -07:00
Matthew Sterrett
d0e2b3e58d Removed unnecessary checks for FP16C
(cherry picked from commit 39fcc89798bc54501388348a448ea0e32fa5da7d)
2023-07-07 15:21:17 -07:00
Lexi Bromfield
33a602eb37 Don't double-define Half functions on aarch64
(cherry picked from commit 66ea0c09fdd939ae2741cee1f5a9961b64d5adcd)
2023-07-07 15:21:17 -07:00
Rasmus Munk Larsen
a9490cd3c5 Fix code and unit test for a few corner cases in vectorized pow()
(cherry picked from commit 7a87ed1b6a49bd0067856dcba9ad9a3a46186220)
2023-07-07 15:21:17 -07:00
Alexander Richardson
a5469a6f0f Avoid including <sstream> with EIGEN_NO_IO
(cherry picked from commit b7668c0371054a3a938eeab32a5c10d24c1ea4fc)
2023-07-07 15:21:17 -07:00
Antonio Sánchez
8a21df2d9c Disable f16c scalar conversions for MSVC.
(cherry picked from commit 73b2c13bf2d4c8192ce1cdf7ceeb8d098cfe6b71)
2023-07-07 15:21:12 -07:00
Antonio Sánchez
973b04f3e1 Fix AVX512 builds with MSVC.
(cherry picked from commit 9a14d91a9909cc430638ac750d323df10194b84e)
2023-07-07 15:06:18 -07:00
Antonio Sánchez
e0fe006915 Fix mixingtypes for g++-11.
(cherry picked from commit 19c39bea29e21041ceca481851b3a5c889b51d98)
2023-07-07 11:47:23 -07:00
Antonio Sánchez
995714142d Restrict GCC<6.3 maxpd workaround to only gcc.
(cherry picked from commit 4bffbe84f9125fc05bc781bf2ec87ada73ecf7f2)
2023-07-07 11:39:27 -07:00
Antonio Sánchez
730a781221 Define EIGEN_HAS_AVX512_MATH in PacketMath.
(cherry picked from commit e7f4a901ee8cbe42d37bcabefb342086235c3839)
2023-07-07 11:39:13 -07:00
Antonio Sánchez
77b2807322 Fix AVX512 math function consistency, enable for ICC.
(cherry picked from commit 96da541cba007a84979ee5e3000c13eab982d56c)
2023-07-07 11:37:49 -07:00
Antonio Sánchez
52e545324e Fix ODR violations.
(cherry picked from commit cafeadffef2a7ba41f2da5cf34c38068d74499eb)
2023-07-07 11:37:31 -07:00
Chip Kerchner
fbdaff81bd Invert rows and depth in non-vectorized portion of packing (PowerPC).
(cherry picked from commit 9cf34ee0aed25a7464e6ec14f977cfa940f48f1b)
2021-11-03 23:34:47 +00:00
Andreas Krebbel
23469c3cda ZVector: Move alignas qualifier to come first
We currently have plenty of type definitions with the alignment
qualifier coming after the type.  The compiler warns about ignoring
them:
int EIGEN_ALIGN16 ai[4];

Turn this into:
EIGEN_ALIGN16 int ai[4];


(cherry picked from commit 8faafc3aaa2b45e234cfe0bef085c1134ceffc42)
2021-11-03 23:29:10 +00:00
Antonio Sanchez
18824d10ea Fix ZVector build.
Cross-compiled via `s390x-linux-gnu-g++`, run via qemu.  This allows the
packetmath tests to pass.


(cherry picked from commit 40bbe8a4d0eb3ec2bfd472fa30cac19e6e743b46)
2021-11-03 23:28:26 +00:00
Antonio Sanchez
943ef50a2d Disable testing of complex compound assignment operators for MSVC.
MSVC does not support specializing compound assignments for
`std::complex`, since it already specializes them (contrary to the
standard).

Trying to use one of these on device will currently lead to a
duplicate definition error.  This is still probably preferable
to no error though.  If we remove the definitions for MSVC, then
it will compile, but the kernel will fail silently.

The only proper solution would be to define our own custom `Complex`
type.

(cherry picked from commit f0f1d7938b7083800ff75fe88e15092f08a4e67e)
2021-10-11 10:00:29 -07:00
Alexander Grund
929bc0e191 Fix alias violation in BFloat16
reinterpret_cast between unrelated types is undefined behavior and leads
to misoptimizations on some platforms.
Use the safer (and faster) version via bit_cast


(cherry picked from commit b5eaa4269503f77d0aa58d2f8ed9419e1ba7784d)
2021-09-20 14:25:58 +00:00
Antonio Sanchez
f046e326d9 Fix strict aliasing bug causing product_small failure.
Packet loading is skipped due to aliasing violation, leading to nullopt matrix
multiplication.

Fixes #2327.


(cherry picked from commit 3c724c44cff3f9e2e9e35351abff0b5c022b320d)
2021-09-19 18:06:17 +00:00
Antonio Sanchez
4ef67cbfb2 GCC 4.8 arm EIGEN_OPTIMIZATION_BARRIER fix (#2315).
GCC 4.8 doesn't seem to like the `g` register constraint, failing to
compile with "error: 'asm' operand requires impossible reload".

Tested `r` instead, and that seems to work, even with latest compilers.

Also fixed some minor macro issues to eliminate warnings on armv7.

Fixes #2315.


(cherry picked from commit ff07a8a63945d89301d1b29ac59d170ff9be3955)
2021-08-31 21:23:28 +00:00
Antonio Sanchez
c2b6df6e60 Disable cuda Eigen::half vectorization on host.
All cuda `__half` functions are device-only in CUDA 9, including
conversions. Host-side conversions were added in CUDA 10.
The existing code doesn't build prior to 10.0.

All arithmetic functions are always device-only, so there's
therefore no reason to use vectorization on the host at all.

Modified the code to disable vectorization for `__half` on host,
which required also updating the `TensorReductionGpu` implementation
which previously made assumptions about available packets.


(cherry picked from commit cc3573ab4451853774cd5c3497373d5fe8914774)
2021-08-31 21:23:11 +00:00
Antonio Sanchez
f1032255d3 Add missing PPC packet comparisons.
This is to fix the packetmath tests on the ppc pipeline.


(cherry picked from commit 2cc6ee0d2e76e88fe1476f6b0eae12edb68b1c8a)
2021-08-17 15:33:55 +00:00
Chip-Kerchner
f57dec64ef Fix unaligned loads in ploadLhs & ploadRhs for P8.
(cherry picked from commit 8dcf3e38ba9913021ce6a831836a59217e21baf2)
2021-08-17 12:48:36 +00:00
Chip-Kerchner
0b56b62f30 Reverse compare logic ƒin F32ToBf16 since vec_cmpne is not available in Power8 - now compiles for clang10 default (P8).
(cherry picked from commit e07227c411cb5ed5c6252b594fe841867bd19f6a)
2021-08-13 18:01:15 +00:00
Chip Kerchner
44cc96e1a1 Get rid of used uninitialized warnings for EIGEN_UNUSED_VARIABLE in gcc11+
(cherry picked from commit 66499f0f172d0758360043e9c578761c0f7d50cd)
2021-08-12 21:39:17 +00:00
ChipKerchner
13d7658c5d Fix errors on older compilers (gcc 7.5 - lack of vec_neg, clang10 - can not use const pointers with vec_xl).
(cherry picked from commit 413bc491f1721afdb9802553b13a5b7aba67ed3b)
2021-08-10 20:40:54 +00:00
Gauri Deshpande
93bff85a42 remove denormal flushing in fp32tobf16 for avx & avx512
(cherry picked from commit e6a5a594a7f3cbe2f9843d4ef57a10d478cbb818)
2021-08-09 22:15:42 +00:00
Rasmus Munk Larsen
05bab8139a Fix breakage of conj_helper in conjunction with custom types introduced in !537.
(cherry picked from commit 7b35638ddb99a0298c5d3450de506a8e8e0203d3)
2021-07-02 20:59:50 +00:00
Chip Kerchner
eebde572d9 Create the ability to disable the specialized gemm_pack_rhs in Eigen (only PPC) for TensorFlow
(cherry picked from commit 91e99ec1e02100d07e35a7abb1b5c76707237219)
2021-07-01 23:32:38 +00:00
大河メタル
94e2250b36 Correct declarations for aarch64-pc-windows-msvc
(cherry picked from commit c81da59a252b3479753b2eada26ee0cf46280bd0)
2021-06-30 04:10:04 +00:00
Rasmus Munk Larsen
380d0e4916 Get rid of redundant pabs instruction in complex square root.
(cherry picked from commit 5aebbe9098f53f01c99eed67b52725397e955280)
2021-06-29 23:27:09 +00:00
Rohit Santhanam
e83af2cc24 Commit 52a5f982 broke conjhelper functionality for HIP GPUs.
This commit addresses this.


(cherry picked from commit 2d132d17365ffc84c0cc7a7da9b8f7090e94b476)
2021-06-25 19:56:18 +00:00
Rasmus Munk Larsen
413ff2b531 Small cleanup: Get rid of the macros EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD and CJMADD, which were effectively unused, apart from on x86, where the change results in identically performing code.
(cherry picked from commit bffd267d176410a517a0fe9afa6dde99c213c08a)
2021-06-25 17:13:12 +00:00
Rasmus Munk Larsen
a235ddef39 Get rid of code duplication for conj_helper. For packets where LhsType=RhsType a single generic implementation suffices. For scalars, the generic implementation of pconj automatically forwards to numext::conj, so much of the existing specialization can be avoided. For mixed types we still need specializations.
(cherry picked from commit 52a5f9821235e5a9f7e9b3e0198d45d42a1cb267)
2021-06-24 23:30:42 +00:00
Antonio Sanchez
ee4e099aa2 Remove pset, replace with ploadu.
We can't make guarantees on alignment for existing calls to `pset`,
so we should default to loading unaligned.  But in that case, we should
just use `ploadu` directly. For loading constants, this load should hopefully
get optimized away.

This is causing segfaults in Google Maps.


(cherry picked from commit 12e8d57108c50d8a63605c6eb0144c838c128337)
2021-06-17 17:11:08 +00:00
Chip-Kerchner
9fc93ce31a EIGEN_STRONG_INLINE was NOT inlining in some critical needed areas (6.6X slowdown) when used with Tensorflow. Changing to EIGEN_ALWAYS_INLINE where appropiate.
(cherry picked from commit ef1fd341a895fda883f655102f371fa8b41f2088)
2021-06-16 22:14:17 +00:00
Antonio Sanchez
1374f49f28 Add missing ppc pcmp_lt_or_nan<Packet8bf>
(cherry picked from commit 9e94c5957000c38a6553552c96a7a27b1fc2860d)
2021-06-15 22:12:22 +00:00
Rasmus Munk Larsen
1cb1ffd5b2 Use bit_cast to create -0.0 for floating point types to avoid compiler optimization changing sign with --ffast-math enabled.
(cherry picked from commit fc87e2cbaa65e7e93a2c695ce5a9dc048a64a985)
2021-06-11 02:57:02 +00:00
Antonio Sanchez
98cf1e076f Add missing NEON ptranspose implementations.
Unified implementation using only `vzip`.


(cherry picked from commit dba753a986b527a17c8cc62474d0487aec7c2b36)
2021-05-25 19:09:50 +00:00
guoqiangqi
2f908f8255 Changing the storage of the SSE complex packets to that of the wrapper. This should fix #2242 .
(cherry picked from commit 3d9051ea84a5089b277c88dac456b3b1576bfa7f)
2021-05-12 17:02:19 +00:00