Antonio Sánchez
17d57fb168
Fix up PowerPC MMA flags so it builds by default.
...
(cherry picked from commit 591906477bc8c8102dbefceefe10d81648865394)
2023-07-11 16:27:32 -07:00
Antonio Sánchez
6973687c70
Fix up PowerPC MMA flags so it builds by default.
...
(cherry picked from commit 65eeedf9646ee6efc457cc3a8f8d9030a6f83689)
2023-07-11 16:20:57 -07:00
Chip Kerchner
fbdaff81bd
Invert rows and depth in non-vectorized portion of packing (PowerPC).
...
(cherry picked from commit 9cf34ee0aed25a7464e6ec14f977cfa940f48f1b)
2021-11-03 23:34:47 +00:00
Antonio Sanchez
f1032255d3
Add missing PPC packet comparisons.
...
This is to fix the packetmath tests on the ppc pipeline.
(cherry picked from commit 2cc6ee0d2e76e88fe1476f6b0eae12edb68b1c8a)
2021-08-17 15:33:55 +00:00
Chip-Kerchner
f57dec64ef
Fix unaligned loads in ploadLhs & ploadRhs for P8.
...
(cherry picked from commit 8dcf3e38ba9913021ce6a831836a59217e21baf2)
2021-08-17 12:48:36 +00:00
Chip-Kerchner
0b56b62f30
Reverse compare logic in F32ToBf16 since vec_cmpne is not available in Power8 - now compiles for clang10 default (P8).
...
(cherry picked from commit e07227c411cb5ed5c6252b594fe841867bd19f6a)
2021-08-13 18:01:15 +00:00
Chip Kerchner
44cc96e1a1
Get rid of used uninitialized warnings for EIGEN_UNUSED_VARIABLE in gcc11+
...
(cherry picked from commit 66499f0f172d0758360043e9c578761c0f7d50cd)
2021-08-12 21:39:17 +00:00
ChipKerchner
13d7658c5d
Fix errors on older compilers (gcc 7.5 - lack of vec_neg, clang10 - can not use const pointers with vec_xl).
...
(cherry picked from commit 413bc491f1721afdb9802553b13a5b7aba67ed3b)
2021-08-10 20:40:54 +00:00
Chip Kerchner
eebde572d9
Create the ability to disable the specialized gemm_pack_rhs in Eigen (only PPC) for TensorFlow
...
(cherry picked from commit 91e99ec1e02100d07e35a7abb1b5c76707237219)
2021-07-01 23:32:38 +00:00
Rasmus Munk Larsen
413ff2b531
Small cleanup: Get rid of the macros EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD and CJMADD, which were effectively unused, apart from on x86, where the change results in identically performing code.
...
(cherry picked from commit bffd267d176410a517a0fe9afa6dde99c213c08a)
2021-06-25 17:13:12 +00:00
Rasmus Munk Larsen
a235ddef39
Get rid of code duplication for conj_helper. For packets where LhsType=RhsType a single generic implementation suffices. For scalars, the generic implementation of pconj automatically forwards to numext::conj, so much of the existing specialization can be avoided. For mixed types we still need specializations.
...
(cherry picked from commit 52a5f9821235e5a9f7e9b3e0198d45d42a1cb267)
2021-06-24 23:30:42 +00:00
Chip-Kerchner
9fc93ce31a
EIGEN_STRONG_INLINE was NOT inlining in some critical needed areas (6.6X slowdown) when used with Tensorflow. Changing to EIGEN_ALWAYS_INLINE where appropiate.
...
(cherry picked from commit ef1fd341a895fda883f655102f371fa8b41f2088)
2021-06-16 22:14:17 +00:00
Antonio Sanchez
1374f49f28
Add missing ppc pcmp_lt_or_nan<Packet8bf>
...
(cherry picked from commit 9e94c5957000c38a6553552c96a7a27b1fc2860d)
2021-06-15 22:12:22 +00:00
Rasmus Munk Larsen
1cb1ffd5b2
Use bit_cast to create -0.0 for floating point types to avoid compiler optimization changing sign with --ffast-math enabled.
...
(cherry picked from commit fc87e2cbaa65e7e93a2c695ce5a9dc048a64a985)
2021-06-11 02:57:02 +00:00
Chip-Kerchner
28564957ac
Fix taking address of rvalue compiler issue with TensorFlow (plus other warnings).
...
(cherry picked from commit 06c2760bd1139711eeffa30266ead43423891698)
2021-04-21 01:05:21 +00:00
Chip Kerchner
c24bee6120
Fix address of temporary object errors in clang11.
...
This fixes the problem with taking the address of temporary objects which clang11 treats as errors.
2021-04-02 16:27:08 +00:00
Chip Kerchner
d59ef212e1
Fixed performance issues for complex VSX and P10 MMA in gebp_kernel (level 3).
2021-03-25 11:08:19 +00:00
Chip Kerchner
c9d4367fa4
Fix pround and add print
2021-03-15 19:07:43 +00:00
Antonio Sanchez
c65c2b31d4
Make half/bfloat16 constructor take inputs by value, fix powerpc test.
...
Since `numeric_limits<half>::max_exponent` is a static inline constant,
it cannot be directly passed by reference. This triggers a linker error
in recent versions of `g++-powerpc64le`.
Changing `half` to take inputs by value fixes this. Wrapping
`max_exponent` with `int(...)` to make an addressable integer also fixes this
and may help with other custom `Scalar` types down-the-road.
Also eliminated some compile warnings for powerpc.
2021-02-27 21:32:06 +00:00
Chip-Kerchner
6eebe97bab
Fix clang compile when no MMA flags are set. Simplify MMA compiler detection.
2021-02-24 20:43:23 -06:00
Chip-Kerchner
c31ead8a15
Having forward template function declarations in a P10 file causes bad code in certain situations.
2021-02-24 23:43:30 +00:00
Chip-Kerchner
8523d447a1
Fixes to support old and new versions of the compilers for built-ins. Cast to non-const when using vector_pair with certain built-ins.
2021-02-24 20:49:15 +00:00
Chip-Kerchner
10c77b0ff4
Fix compilation errors with later versions of GCC and use of MMA.
2021-02-22 15:01:47 -06:00
Chip Kerchner
9b51dc7972
Fixed performance issues for VSX and P10 MMA in general_matrix_matrix_product
2021-02-17 17:49:23 +00:00
Antonio Sanchez
7ff0b7a980
Updated pfrexp implementation.
...
The original implementation fails for 0, denormals, inf, and NaN.
See #2150
2021-02-17 02:23:24 +00:00
Antonio Sanchez
4cb563a01e
Fix ldexp implementations.
...
The previous implementations produced garbage values if the exponent did
not fit within the exponent bits. See #2131 for a complete discussion,
and !375 for other possible implementations.
Here we implement the 4-factor version. See `pldexp_impl` in
`GenericPacketMathFunctions.h` for a full description.
The SSE `pcmp*` methods were moved down since `pcmp_le<Packet4i>`
requires `por`.
Left as a "TODO" is to delegate to a faster version if we know the
exponent does fit within the exponent bits.
Fixes #2131 .
2021-02-10 22:45:41 +00:00
Antonio Sanchez
56c8b14d87
Eliminate implicit conversions from float to double.
2021-02-01 15:31:01 -08:00
Antonio Sanchez
1615a27993
Fix altivec packetmath.
...
Allows the altivec packetmath tests to pass. There were a few issues:
- `pstoreu` was missing MSQ on `_BIG_ENDIAN` systems
- `cmp_*` didn't properly handle conversion of bool flags (0x7FC instead
of 0xFFFF)
- `pfrexp` needed to set the `exponent` argument.
Related to !370 , #2128
cc: @ChipKerchner @pdrocaldeira
Tested on `_BIG_ENDIAN` running on QEMU with VSX. Couldn't figure out build
flags to get it to work for little endian.
2021-01-28 18:37:09 +00:00
Chip Kerchner
1414e2212c
Fix clang compilation for AltiVec from previous check-in
2021-01-28 18:36:40 +00:00
Chip Kerchner
0784d9f87b
Fix sqrt, ldexp and frexp compilation errors.
2021-01-25 15:22:19 -06:00
Pedro Caldeira
c29935b323
Add support for dynamic dispatch of MMA instructions for POWER 10
2020-11-12 11:31:15 -03:00
Pedro Caldeira
35d149e34c
Add missing functions for Packet8bf in Altivec architecture.
...
Including new tests for bfloat16 Packets.
Fix prsqrt on GenericPacketMath.
2020-09-08 09:22:11 -05:00
Everton Constantino
6fe88a3c9d
MatrixProuct enhancements:
...
- Changes to Altivec/MatrixProduct
Adapting code to gcc 10.
Generic code style and performance enhancements.
Adding PanelMode support.
Adding stride/offset support.
Enabling float64, std::complex and std::complex.
Fixing lack of symm_pack.
Enabling mixedtypes.
- Adding std::complex tests to blasutil.
- Adding an implementation of storePacketBlock when Incr!= 1.
2020-09-02 18:21:36 -03:00
Everton Constantino
6568856275
Changing u/int8_t to un/signed char because clang does not understand
...
it.
Implementing pcmp_eq to Packet8 and Packet16.
2020-09-02 17:02:15 -03:00
Chip Kerchner
e5886457c8
Change Packet8s and Packet8us to use vector commands on Power for pmadd, pmul and psub.
2020-08-28 19:27:32 +00:00
Pedro Caldeira
704798d1df
Add support for Bfloat16 to use vector instructions on Altivec
...
architecture
2020-08-10 13:22:01 -05:00
Pedro Caldeira
a475bf14d4
Fix pscatter and pgather for Altivec Complex double
2020-06-16 16:41:02 -03:00
Pedro Caldeira
2d67af2d2b
Add pscatter for Packet16{u}c (int8)
2020-05-20 17:29:34 -03:00
Everton Constantino
8a7f360ec3
- Vectorizing MMA packing.
...
- Optimizing MMA kernel.
- Adding PacketBlock store to blas_data_mapper.
2020-05-19 19:24:11 +00:00
Pedro Caldeira
5fdc179241
Altivec template functions to better code reusability
2020-05-11 21:04:51 +00:00
Rasmus Munk Larsen
225ab040e0
Remove unused packet op "palign".
...
Clean up a compiler warning in c++03 mode in AVX512/Complex.h.
2020-05-07 17:14:26 -07:00
Pedro Caldeira
29f0917a43
Add support to vector instructions to Packet16uc and Packet16c
2020-04-27 12:48:08 -03:00
Rasmus Munk Larsen
e80ec24357
Remove unused packet op "preduxp".
2020-04-23 18:17:14 +00:00
Pedro Caldeira
0c67b855d2
Add Packet8s and Packet8us to support signed/unsigned int16/short Altivec vector operations
2020-04-21 14:52:46 -03:00
Everton Constantino
deb93ed1bf
Adhere to recommended load/store intrinsics for pp64le
2020-03-23 15:18:15 -03:00
Everton Constantino
5afdaa473a
Fixing float32's pround halfway criteria to match STL's criteria.
2020-03-21 22:30:54 -05:00
Joel Holdsworth
232f904082
Add shift_left<N> and shift_right<N> coefficient-wise unary Array functions
2020-03-19 17:24:06 +00:00
Everton Constantino
5a8b97b401
Switching unpacket_traits<Packet4i> to vectorizable=true.
2020-01-13 16:08:20 -03:00
Rasmus Munk Larsen
13ef08e5ac
Move implementation of vectorized error function erf() to SpecialFunctionsImpl.h.
2019-09-27 13:56:04 -07:00
Rasmus Munk Larsen
6de5ed08d8
Add generic PacketMath implementation of the Error Function (erf).
2019-09-19 12:48:30 -07:00