reinterpret_cast between unrelated types is undefined behavior and leads
to misoptimizations on some platforms.
Use the safer (and faster) version via bit_cast
(cherry picked from commit b5eaa4269503f77d0aa58d2f8ed9419e1ba7784d)
Packet loading is skipped due to aliasing violation, leading to nullopt matrix
multiplication.
Fixes#2327.
(cherry picked from commit 3c724c44cff3f9e2e9e35351abff0b5c022b320d)
The `Options` of the new `hCoeffs` vector do not necessarily match
those of the `MatrixType`, leading to build errors. Having the
`CoeffVectorType` be a template parameter relieves this restriction.
(cherry picked from commit ebd4b17d2f5ca29a5c16ebd35d54d7aeda587820)
CUDA 9 seems to require labelling defaulted constructors as
`EIGEN_DEVICE_FUNC`, despite giving warnings that such labels are
ignored. Without these labels, the `gpu_basic` test fails to
compile, with errors about calling `__host__` functions from
`__host__ __device__` functions.
(cherry picked from commit 998bab4b04f26552b9875acfe113e69c7adccec4)
Clang doesn't like !621, needs the "g" constraint back.
The "g" constraint also works for GCC >= 5.
This fixes our gitlab CI.
(cherry picked from commit 3a6296d4f198ffbcccda4303919b3b14d5e54524)
GCC 4.8 doesn't seem to like the `g` register constraint, failing to
compile with "error: 'asm' operand requires impossible reload".
Tested `r` instead, and that seems to work, even with latest compilers.
Also fixed some minor macro issues to eliminate warnings on armv7.
Fixes#2315.
(cherry picked from commit ff07a8a63945d89301d1b29ac59d170ff9be3955)
All cuda `__half` functions are device-only in CUDA 9, including
conversions. Host-side conversions were added in CUDA 10.
The existing code doesn't build prior to 10.0.
All arithmetic functions are always device-only, so there's
therefore no reason to use vectorization on the host at all.
Modified the code to disable vectorization for `__half` on host,
which required also updating the `TensorReductionGpu` implementation
which previously made assumptions about available packets.
(cherry picked from commit cc3573ab4451853774cd5c3497373d5fe8914774)
There were some typos that checked `EIGEN_HAS_CXX14` that should have
checked `EIGEN_HAS_CXX14_VARIABLE_TEMPLATES`, causing a mismatch
in some of the `Eigen::fix<N>` assumptions.
Also fixed the `symbolic_index` test when
`EIGEN_HAS_CXX14_VARIABLE_TEMPLATES` is 0.
Fixes#2308
(cherry picked from commit 5db9e5c77958997856ddbccfa4a52ff22e83bef9)
In VS 2017, `std::arg` for real inputs always returns 0, even for
negative inputs. It should return `PI` for negative real values.
This seems to be fixed in VS 2019 (MSVC 1920).
(cherry picked from commit 2b410ecbefea1bf4b9d50decb946a4ebe4a73f98)
* This can make less compile_time if A is smaller than B. and avoid failure in compile if we get a little A and a great B.
Authored by @awoniu.
(cherry picked from commit 8ce341caf2947e4b5ac4580c20254ae7d828b009)
There seems to be a gcc 4.7 bug that incorrectly flags the current
3x3 inverse as using uninitialized memory. I'm *pretty* sure it's
a false positive, but it's hard to trigger. The same warning
does not trigger with clang or later compiler versions.
In trying to find a work-around, this implementation turns out to be
faster anyways for static-sized matrices.
```
name old cpu/op new cpu/op delta
BM_Inverse3x3<DynamicMatrix3T<float>> 423ns ± 2% 433ns ± 3% +2.32% (p=0.000 n=98+96)
BM_Inverse3x3<DynamicMatrix3T<double>> 425ns ± 2% 427ns ± 3% +0.48% (p=0.003 n=99+96)
BM_Inverse3x3<StaticMatrix3T<float>> 7.10ns ± 2% 0.80ns ± 1% -88.67% (p=0.000 n=114+112)
BM_Inverse3x3<StaticMatrix3T<double>> 7.45ns ± 2% 1.34ns ± 1% -82.01% (p=0.000 n=105+111)
BM_AliasedInverse3x3<DynamicMatrix3T<float>> 409ns ± 3% 419ns ± 3% +2.40% (p=0.000 n=100+98)
BM_AliasedInverse3x3<DynamicMatrix3T<double>> 414ns ± 3% 413ns ± 2% ~ (p=0.322 n=98+98)
BM_AliasedInverse3x3<StaticMatrix3T<float>> 7.57ns ± 1% 0.80ns ± 1% -89.37% (p=0.000 n=111+114)
BM_AliasedInverse3x3<StaticMatrix3T<double>> 9.09ns ± 1% 2.58ns ±41% -71.60% (p=0.000 n=113+116)
```
(cherry picked from commit 5ad8b9bfe2bf75620bc89467c5cc051fc2a597df)
The `memset` function and bitwise manipulation only apply to POD types
that do not require initialization, otherwise resulting in UB. We currently
violate this in `ptrue` and `pzero`, we assume bitmasks for `pselect`, and
bitwise operations are applied byte-by-byte in the generic implementations.
This is causing issues for scalar types that do require initialization
or that contain non-POD info such as pointers (#2201). We either break
them, or force specializations of these functions for custom scalars,
even if they are not vectorized.
Here we modify these functions for scalars only - instead using only
scalar operations:
- `pzero`: `Scalar(0)` for all scalars.
- `ptrue`: `Scalar(1)` for non-trivial scalars, bitset to one bits for trivial scalars.
- `pselect`: ternary select comparing mask to `Scalar(0)` for all scalars
- `pand`, `por`, `pxor`, `pnot`: use operators `&`, `|`, `^`, `~` for all integer or non-trivial scalars, otherwise apply bytewise.
For non-scalar types, the original implementations are used to maintain
compatibility and minimize the number of changes.
Fixes#2201.
(cherry picked from commit 3d98a6ef5ce0ba85acaee4ffffc53f0f21bd8fd2)
Since `std::equal_to::operator()` is not a device function, it
fails on GPU. On my device, I seem to get a silent crash in the
kernel (no reported error, but the kernel does not complete).
Replacing this with a portable version enables comparisons on device.
Addresses #2292 - would need to be cherry-picked. The 3.3 branch
also requires adding `EIGEN_DEVICE_FUNC` in `BooleanRedux.h` to get
fully working.
(cherry picked from commit 7880f10526a11dc5544426c54c5763de576bf285)
Details are scattered across #920, #1000, #1324, #2291.
Summary: some MSVC versions have a bug that requires omitting explicit
`operator=` definitions (leads to duplicate definition errors), and
some MSVC versions require adding explicit `operator=` definitions
(otherwise implicitly deleted errors). This mess tries to cover
all the cases encountered.
Fixes#2291.
(cherry picked from commit 9816fe59b47dc4c07967b5ee93a8e8aaa6e9c308)
- Move constructors can only be defaulted as NOEXCEPT if all members
have NOEXCEPT move constructors.
- gcc 4.8 has some funny parsing bug in `a < b->c`, thinking `b-` is a template parameter.
(cherry picked from commit 6035da5283f12f7e6a49cda0c21696c8e5a115b7)
For empty or single-column matrices, the current `PartialPivLU`
currently dereferences a `nullptr` or accesses memory out-of-bounds.
Here we adjust the checks to avoid this.
(cherry picked from commit 154f00e9eacaec5667215784c7601b55024e2f61)
We can't make guarantees on alignment for existing calls to `pset`,
so we should default to loading unaligned. But in that case, we should
just use `ploadu` directly. For loading constants, this load should hopefully
get optimized away.
This is causing segfaults in Google Maps.
(cherry picked from commit 12e8d57108c50d8a63605c6eb0144c838c128337)
MinGW spits out version strings like: `x86_64-w64-mingw32-g++ (GCC)
10-win32 20210110`, which causes the version extraction to fail.
Added support for this with tests.
Also added `make_unsigned` for `long long`, since mingw seems to
use that for `uint64_t`.
Related to #2268. CMake and build passes for me after this.
(cherry picked from commit ad82d20cf649ba8c07352f947fd25766d0328df2)