If EIGEN_DONT_VECTORIZE is defined, immintrin.h is not included even if F16C is available. Trying to use F16C intrinsics thus fails.
This fixes issue #2395.
(cherry picked from commit c06c3e52a082e403e7a241350fd867e907c833dc)
- Doing computation with uninitialized (zero-ed ? but thanks Linux) matrix, or
worse NaN on other non-linux systems.
- This commit fixes it by initializing to Random().
(cherry picked from commit 4284c68fbb81cb069a630ae1bf4a953ee922f6e5)
& and | short-circuit, && and || don't. When both arguments to those
are boolean, the short-circuiting version is usually the desired one, so
clang warns on this.
Here, it is inconsequential, so switch to && and || to suppress the warning.
(cherry picked from commit b17bcddbca749f621040990a3efb840046315050)
For vectorized 1-dimensional inputs that do not take the special
blocking path (e.g. `std::complex<...>`), there was an
index-out-of-bounds error causing the broadcast size to be
computed incorrectly. Here we fix this, and make other minor
cleanup changes.
Fixes#2351.
(cherry picked from commit a500da1dc089b08e2f2b3b05a2eb23194425460e)
Fixes compiler errors in expressions that look like
Eigen::Matrix<Eigen::half, 3, 1>::Random().maxCoeff()
The error comes from the code that creates the initial value for
vectorized reductions. The fix is to specify the scalar type of the
reduction's initial value.
The cahnge is necessary for Eigen::half because unlike other types,
Eigen::half scalars cannot be implicitly created from integers.
(cherry picked from commit d0e3791b1a0e2db9edd5f1d1befdb2ac5a40efe0)
We currently have plenty of type definitions with the alignment
qualifier coming after the type. The compiler warns about ignoring
them:
int EIGEN_ALIGN16 ai[4];
Turn this into:
EIGEN_ALIGN16 int ai[4];
(cherry picked from commit 8faafc3aaa2b45e234cfe0bef085c1134ceffc42)
Cross-compiled via `s390x-linux-gnu-g++`, run via qemu. This allows the
packetmath tests to pass.
(cherry picked from commit 40bbe8a4d0eb3ec2bfd472fa30cac19e6e743b46)
For some reason, having `take<n, numeric_list<T>>` for `n > 0` causes
g++-11 to ICE with
```
sorry, unimplemented: unexpected AST of kind nontype_argument_pack
```
It does work with other versions of gcc, and with clang.
I filed a GCC bug
[here](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102999).
Technically we should never actually run into this case, since you
can't take n > 0 elements from an empty list. Commenting it out
allows our Eigen tests to pass.
(cherry picked from commit 8f8c2ba2fe19c6c2e47bbe2fbaf87594642e523d)
MSVC does not support specializing compound assignments for
`std::complex`, since it already specializes them (contrary to the
standard).
Trying to use one of these on device will currently lead to a
duplicate definition error. This is still probably preferable
to no error though. If we remove the definitions for MSVC, then
it will compile, but the kernel will fail silently.
The only proper solution would be to define our own custom `Complex`
type.
(cherry picked from commit f0f1d7938b7083800ff75fe88e15092f08a4e67e)
The 2979 warning is yet another "calling a __host__ function from a
__host__ device__ function. Although we probably should eventually
address these, they are flooding the logs. Most of these are
harmless since we only call the original from the host.
In cases where these are actually called from device, an error is generated
instead anyways.
The 2977 warning is a bit strange - although the warning suggests the
`__device__` annotation is ignored, this doesn't actually seem to be
the case. Without the `__device__` declarations, the kernel actually
fails to run when attempting to construct such objects. Again,
these warnings are flooding the logs, so disabling for now.
(cherry picked from commit 86c0decc480147d109b1dd8b968bcbc509b7a2e6)
reinterpret_cast between unrelated types is undefined behavior and leads
to misoptimizations on some platforms.
Use the safer (and faster) version via bit_cast
(cherry picked from commit b5eaa4269503f77d0aa58d2f8ed9419e1ba7784d)
Packet loading is skipped due to aliasing violation, leading to nullopt matrix
multiplication.
Fixes#2327.
(cherry picked from commit 3c724c44cff3f9e2e9e35351abff0b5c022b320d)
The `Options` of the new `hCoeffs` vector do not necessarily match
those of the `MatrixType`, leading to build errors. Having the
`CoeffVectorType` be a template parameter relieves this restriction.
(cherry picked from commit ebd4b17d2f5ca29a5c16ebd35d54d7aeda587820)
CUDA 9 seems to require labelling defaulted constructors as
`EIGEN_DEVICE_FUNC`, despite giving warnings that such labels are
ignored. Without these labels, the `gpu_basic` test fails to
compile, with errors about calling `__host__` functions from
`__host__ __device__` functions.
(cherry picked from commit 998bab4b04f26552b9875acfe113e69c7adccec4)