There seems to be a gcc 4.7 bug that incorrectly flags the current
3x3 inverse as using uninitialized memory. I'm *pretty* sure it's
a false positive, but it's hard to trigger. The same warning
does not trigger with clang or later compiler versions.
In trying to find a work-around, this implementation turns out to be
faster anyways for static-sized matrices.
```
name old cpu/op new cpu/op delta
BM_Inverse3x3<DynamicMatrix3T<float>> 423ns ± 2% 433ns ± 3% +2.32% (p=0.000 n=98+96)
BM_Inverse3x3<DynamicMatrix3T<double>> 425ns ± 2% 427ns ± 3% +0.48% (p=0.003 n=99+96)
BM_Inverse3x3<StaticMatrix3T<float>> 7.10ns ± 2% 0.80ns ± 1% -88.67% (p=0.000 n=114+112)
BM_Inverse3x3<StaticMatrix3T<double>> 7.45ns ± 2% 1.34ns ± 1% -82.01% (p=0.000 n=105+111)
BM_AliasedInverse3x3<DynamicMatrix3T<float>> 409ns ± 3% 419ns ± 3% +2.40% (p=0.000 n=100+98)
BM_AliasedInverse3x3<DynamicMatrix3T<double>> 414ns ± 3% 413ns ± 2% ~ (p=0.322 n=98+98)
BM_AliasedInverse3x3<StaticMatrix3T<float>> 7.57ns ± 1% 0.80ns ± 1% -89.37% (p=0.000 n=111+114)
BM_AliasedInverse3x3<StaticMatrix3T<double>> 9.09ns ± 1% 2.58ns ±41% -71.60% (p=0.000 n=113+116)
```
(cherry picked from commit 5ad8b9bfe2bf75620bc89467c5cc051fc2a597df)
The `memset` function and bitwise manipulation only apply to POD types
that do not require initialization, otherwise resulting in UB. We currently
violate this in `ptrue` and `pzero`, we assume bitmasks for `pselect`, and
bitwise operations are applied byte-by-byte in the generic implementations.
This is causing issues for scalar types that do require initialization
or that contain non-POD info such as pointers (#2201). We either break
them, or force specializations of these functions for custom scalars,
even if they are not vectorized.
Here we modify these functions for scalars only - instead using only
scalar operations:
- `pzero`: `Scalar(0)` for all scalars.
- `ptrue`: `Scalar(1)` for non-trivial scalars, bitset to one bits for trivial scalars.
- `pselect`: ternary select comparing mask to `Scalar(0)` for all scalars
- `pand`, `por`, `pxor`, `pnot`: use operators `&`, `|`, `^`, `~` for all integer or non-trivial scalars, otherwise apply bytewise.
For non-scalar types, the original implementations are used to maintain
compatibility and minimize the number of changes.
Fixes#2201.
(cherry picked from commit 3d98a6ef5ce0ba85acaee4ffffc53f0f21bd8fd2)
Since `std::equal_to::operator()` is not a device function, it
fails on GPU. On my device, I seem to get a silent crash in the
kernel (no reported error, but the kernel does not complete).
Replacing this with a portable version enables comparisons on device.
Addresses #2292 - would need to be cherry-picked. The 3.3 branch
also requires adding `EIGEN_DEVICE_FUNC` in `BooleanRedux.h` to get
fully working.
(cherry picked from commit 7880f10526a11dc5544426c54c5763de576bf285)
Details are scattered across #920, #1000, #1324, #2291.
Summary: some MSVC versions have a bug that requires omitting explicit
`operator=` definitions (leads to duplicate definition errors), and
some MSVC versions require adding explicit `operator=` definitions
(otherwise implicitly deleted errors). This mess tries to cover
all the cases encountered.
Fixes#2291.
(cherry picked from commit 9816fe59b47dc4c07967b5ee93a8e8aaa6e9c308)
- Move constructors can only be defaulted as NOEXCEPT if all members
have NOEXCEPT move constructors.
- gcc 4.8 has some funny parsing bug in `a < b->c`, thinking `b-` is a template parameter.
(cherry picked from commit 6035da5283f12f7e6a49cda0c21696c8e5a115b7)
For empty or single-column matrices, the current `PartialPivLU`
currently dereferences a `nullptr` or accesses memory out-of-bounds.
Here we adjust the checks to avoid this.
(cherry picked from commit 154f00e9eacaec5667215784c7601b55024e2f61)
We can't make guarantees on alignment for existing calls to `pset`,
so we should default to loading unaligned. But in that case, we should
just use `ploadu` directly. For loading constants, this load should hopefully
get optimized away.
This is causing segfaults in Google Maps.
(cherry picked from commit 12e8d57108c50d8a63605c6eb0144c838c128337)
MinGW spits out version strings like: `x86_64-w64-mingw32-g++ (GCC)
10-win32 20210110`, which causes the version extraction to fail.
Added support for this with tests.
Also added `make_unsigned` for `long long`, since mingw seems to
use that for `uint64_t`.
Related to #2268. CMake and build passes for me after this.
(cherry picked from commit ad82d20cf649ba8c07352f947fd25766d0328df2)
This used to work for non-class types (e.g. raw function pointers) in
Eigen 3.3. This was changed in commit 11f55b29 to optimize the
evaluator:
> `sizeof((A-B).cwiseAbs2())` with A,B Vector4f is now 16 bytes, instead of 48 before this optimization.
though I cannot reproduce the 16 byte result. Both before the change
and after, with multiple compilers/versions, I always get a result of 40 bytes.
https://godbolt.org/z/MsjTc1PGe
This change modifies the code slightly to allow non-class types. The
final generated code is identical, and the expression remains 40 bytes
for the `abs2` sample case.
Fixes#2251
(cherry picked from commit ebb300d0b4340104dcade3afa656a57da2b7660c)
When calling conservativeResize() on a matrix with DontAlign flag, the
temporary variable used to perform the resize should have the same
Options as the original matrix to ensure that the correct override of
swap is called (i.e. PlainObjectBase::swap(DenseBase<OtherDerived> &
other). Calling the base class swap (i.e in DenseBase) results in
assertions errors or memory corruption.
(cherry picked from commit 391094c50743f28f9174f455661f650bf07e0177)
The boost library unfortunately specializes `conj` for various types and
assumes the original two-template-parameter version. This changes
restores the second parameter. This also restores ABI compatibility.
The specialization for `std::complex` is because `std::conj` is not
a device function. For custom complex scalar types, users should provide
their own `conj` implementation.
We may consider removing the unnecessary second parameter in the future - but
this will require modifying boost as well.
Fixes#2112.
(cherry picked from commit c0eb5f89a406243f71eae0b705eba4437d9f8565)
The cxx11 path for `numext::arg` incorrectly returned the complex type
instead of the real type, leading to compile errors. Fixed this and
added tests.
Related to !477, which uncovered the issue.
(cherry picked from commit 90e9a33e1ce3e4e7663dd67e6c1f225afaf5c206)
The original produced NaNs when dividing 0/b for subnormal b.
The `complex_divide_stable` was changed to use the more common
Smith's algorithm.
(cherry picked from commit 1c013be2cc6a999268be2f25575cd6a07bd52c45)
Fixes#2229.
For dynamic matrices with fixed-sized storage, only copy/swap
elements that have been set. Otherwise, this leads to inefficient
copying, and potential UB for non-initialized elements.
(cherry picked from commit d213a0bcea2344aa3f6c9856da9f5b2a26ccec25)