reinterpret_cast between unrelated types is undefined behavior and leads
to misoptimizations on some platforms.
Use the safer (and faster) version via bit_cast
(cherry picked from commit b5eaa4269503f77d0aa58d2f8ed9419e1ba7784d)
Packet loading is skipped due to aliasing violation, leading to nullopt matrix
multiplication.
Fixes#2327.
(cherry picked from commit 3c724c44cff3f9e2e9e35351abff0b5c022b320d)
The `Options` of the new `hCoeffs` vector do not necessarily match
those of the `MatrixType`, leading to build errors. Having the
`CoeffVectorType` be a template parameter relieves this restriction.
(cherry picked from commit ebd4b17d2f5ca29a5c16ebd35d54d7aeda587820)
CUDA 9 seems to require labelling defaulted constructors as
`EIGEN_DEVICE_FUNC`, despite giving warnings that such labels are
ignored. Without these labels, the `gpu_basic` test fails to
compile, with errors about calling `__host__` functions from
`__host__ __device__` functions.
(cherry picked from commit 998bab4b04f26552b9875acfe113e69c7adccec4)
Clang doesn't like !621, needs the "g" constraint back.
The "g" constraint also works for GCC >= 5.
This fixes our gitlab CI.
(cherry picked from commit 3a6296d4f198ffbcccda4303919b3b14d5e54524)
GCC 4.8 doesn't seem to like the `g` register constraint, failing to
compile with "error: 'asm' operand requires impossible reload".
Tested `r` instead, and that seems to work, even with latest compilers.
Also fixed some minor macro issues to eliminate warnings on armv7.
Fixes#2315.
(cherry picked from commit ff07a8a63945d89301d1b29ac59d170ff9be3955)
All cuda `__half` functions are device-only in CUDA 9, including
conversions. Host-side conversions were added in CUDA 10.
The existing code doesn't build prior to 10.0.
All arithmetic functions are always device-only, so there's
therefore no reason to use vectorization on the host at all.
Modified the code to disable vectorization for `__half` on host,
which required also updating the `TensorReductionGpu` implementation
which previously made assumptions about available packets.
(cherry picked from commit cc3573ab4451853774cd5c3497373d5fe8914774)
There were some typos that checked `EIGEN_HAS_CXX14` that should have
checked `EIGEN_HAS_CXX14_VARIABLE_TEMPLATES`, causing a mismatch
in some of the `Eigen::fix<N>` assumptions.
Also fixed the `symbolic_index` test when
`EIGEN_HAS_CXX14_VARIABLE_TEMPLATES` is 0.
Fixes#2308
(cherry picked from commit 5db9e5c77958997856ddbccfa4a52ff22e83bef9)
In VS 2017, `std::arg` for real inputs always returns 0, even for
negative inputs. It should return `PI` for negative real values.
This seems to be fixed in VS 2019 (MSVC 1920).
(cherry picked from commit 2b410ecbefea1bf4b9d50decb946a4ebe4a73f98)
Manually constructing an unaligned object declared as aligned
invokes UB, so we cannot technically check for alignment from
within the constructor. Newer versions of clang optimize away
this check.
Removing the affected tests.
(cherry picked from commit 0c4ae56e3797cc6719a8d08a0dafad0a5139a5f9)
* This can make less compile_time if A is smaller than B. and avoid failure in compile if we get a little A and a great B.
Authored by @awoniu.
(cherry picked from commit 8ce341caf2947e4b5ac4580c20254ae7d828b009)
There seems to be a gcc 4.7 bug that incorrectly flags the current
3x3 inverse as using uninitialized memory. I'm *pretty* sure it's
a false positive, but it's hard to trigger. The same warning
does not trigger with clang or later compiler versions.
In trying to find a work-around, this implementation turns out to be
faster anyways for static-sized matrices.
```
name old cpu/op new cpu/op delta
BM_Inverse3x3<DynamicMatrix3T<float>> 423ns ± 2% 433ns ± 3% +2.32% (p=0.000 n=98+96)
BM_Inverse3x3<DynamicMatrix3T<double>> 425ns ± 2% 427ns ± 3% +0.48% (p=0.003 n=99+96)
BM_Inverse3x3<StaticMatrix3T<float>> 7.10ns ± 2% 0.80ns ± 1% -88.67% (p=0.000 n=114+112)
BM_Inverse3x3<StaticMatrix3T<double>> 7.45ns ± 2% 1.34ns ± 1% -82.01% (p=0.000 n=105+111)
BM_AliasedInverse3x3<DynamicMatrix3T<float>> 409ns ± 3% 419ns ± 3% +2.40% (p=0.000 n=100+98)
BM_AliasedInverse3x3<DynamicMatrix3T<double>> 414ns ± 3% 413ns ± 2% ~ (p=0.322 n=98+98)
BM_AliasedInverse3x3<StaticMatrix3T<float>> 7.57ns ± 1% 0.80ns ± 1% -89.37% (p=0.000 n=111+114)
BM_AliasedInverse3x3<StaticMatrix3T<double>> 9.09ns ± 1% 2.58ns ±41% -71.60% (p=0.000 n=113+116)
```
(cherry picked from commit 5ad8b9bfe2bf75620bc89467c5cc051fc2a597df)
The latest version of `mpreal` has a bug that breaks `min`/`max`.
It also breaks with the latest dev version of `mpfr`. Here we
add `FindMPREAL.cmake` which searches for the library and tests if
compilation works.
Removed our internal copy of `mpreal.h` under `unsupported/test`, as
it is out-of-sync with the latest, and similarly breaks with
the latest `mpfr`. It would be best to use the installed version
of `mpreal` anyways, since that's what we actually want to test.
Fixes#2282.
(cherry picked from commit 31f796ebef35eeadd0e26878aab3fe99ca412a45)
We were getting a lot of warnings due to nested `find_package` calls
within `Find***.cmake` files. The recommended approach is to use
[`find_dependency`](https://cmake.org/cmake/help/latest/module/CMakeFindDependencyMacro.html)
in package configuration files. I made this change for all instances.
Case mismatches between `Find<Package>.cmake` and calling
`find_package(<PACKAGE>`) also lead to warnings. Fixed for
`FindPASTIX.cmake` and `FindSCOTCH.cmake`.
`FindBLASEXT.cmake` was broken due to calling `find_package_handle_standard_args(BLAS ...)`.
The package name must match, otherwise the `find_package(BLASEXT)` falsely thinks
the package wasn't found. I changed to `BLASEXT`, but then also copied that value
to `BLAS_FOUND` for compatibility.
`FindPastix.cmake` had a typo that incorrectly added `PTSCOTCH` when looking for
the `SCOTCH` component.
`FindPTSCOTCH` incorrectly added `***-NOTFOUND` to include/library lists,
corrupting them. This led to cmake errors down-the-line.
Fixes#2288.
(cherry picked from commit 1cdec386530c6b844389b96c199e723a1e4e71c7)
This is to enable compiling with the latest trisycl. `FindTriSYCL.cmake` was
broken by commit 00f32752, which modified `add_sycl_to_target` for ComputeCPP.
This makes the corresponding modifications for trisycl to make them consistent.
Also, trisycl now requires c++17.
(cherry picked from commit 8cf6cb27baa9607cc00e5dbb42a1c31efda41b74)
The `memset` function and bitwise manipulation only apply to POD types
that do not require initialization, otherwise resulting in UB. We currently
violate this in `ptrue` and `pzero`, we assume bitmasks for `pselect`, and
bitwise operations are applied byte-by-byte in the generic implementations.
This is causing issues for scalar types that do require initialization
or that contain non-POD info such as pointers (#2201). We either break
them, or force specializations of these functions for custom scalars,
even if they are not vectorized.
Here we modify these functions for scalars only - instead using only
scalar operations:
- `pzero`: `Scalar(0)` for all scalars.
- `ptrue`: `Scalar(1)` for non-trivial scalars, bitset to one bits for trivial scalars.
- `pselect`: ternary select comparing mask to `Scalar(0)` for all scalars
- `pand`, `por`, `pxor`, `pnot`: use operators `&`, `|`, `^`, `~` for all integer or non-trivial scalars, otherwise apply bytewise.
For non-scalar types, the original implementations are used to maintain
compatibility and minimize the number of changes.
Fixes#2201.
(cherry picked from commit 3d98a6ef5ce0ba85acaee4ffffc53f0f21bd8fd2)
Since `std::equal_to::operator()` is not a device function, it
fails on GPU. On my device, I seem to get a silent crash in the
kernel (no reported error, but the kernel does not complete).
Replacing this with a portable version enables comparisons on device.
Addresses #2292 - would need to be cherry-picked. The 3.3 branch
also requires adding `EIGEN_DEVICE_FUNC` in `BooleanRedux.h` to get
fully working.
(cherry picked from commit 7880f10526a11dc5544426c54c5763de576bf285)
Details are scattered across #920, #1000, #1324, #2291.
Summary: some MSVC versions have a bug that requires omitting explicit
`operator=` definitions (leads to duplicate definition errors), and
some MSVC versions require adding explicit `operator=` definitions
(otherwise implicitly deleted errors). This mess tries to cover
all the cases encountered.
Fixes#2291.
(cherry picked from commit 9816fe59b47dc4c07967b5ee93a8e8aaa6e9c308)