MSVC does not support specializing compound assignments for
`std::complex`, since it already specializes them (contrary to the
standard).
Trying to use one of these on device will currently lead to a
duplicate definition error. This is still probably preferable
to no error though. If we remove the definitions for MSVC, then
it will compile, but the kernel will fail silently.
The only proper solution would be to define our own custom `Complex`
type.
(cherry picked from commit f0f1d7938b7083800ff75fe88e15092f08a4e67e)
There were some typos that checked `EIGEN_HAS_CXX14` that should have
checked `EIGEN_HAS_CXX14_VARIABLE_TEMPLATES`, causing a mismatch
in some of the `Eigen::fix<N>` assumptions.
Also fixed the `symbolic_index` test when
`EIGEN_HAS_CXX14_VARIABLE_TEMPLATES` is 0.
Fixes#2308
(cherry picked from commit 5db9e5c77958997856ddbccfa4a52ff22e83bef9)
Manually constructing an unaligned object declared as aligned
invokes UB, so we cannot technically check for alignment from
within the constructor. Newer versions of clang optimize away
this check.
Removing the affected tests.
(cherry picked from commit 0c4ae56e3797cc6719a8d08a0dafad0a5139a5f9)
The `memset` function and bitwise manipulation only apply to POD types
that do not require initialization, otherwise resulting in UB. We currently
violate this in `ptrue` and `pzero`, we assume bitmasks for `pselect`, and
bitwise operations are applied byte-by-byte in the generic implementations.
This is causing issues for scalar types that do require initialization
or that contain non-POD info such as pointers (#2201). We either break
them, or force specializations of these functions for custom scalars,
even if they are not vectorized.
Here we modify these functions for scalars only - instead using only
scalar operations:
- `pzero`: `Scalar(0)` for all scalars.
- `ptrue`: `Scalar(1)` for non-trivial scalars, bitset to one bits for trivial scalars.
- `pselect`: ternary select comparing mask to `Scalar(0)` for all scalars
- `pand`, `por`, `pxor`, `pnot`: use operators `&`, `|`, `^`, `~` for all integer or non-trivial scalars, otherwise apply bytewise.
For non-scalar types, the original implementations are used to maintain
compatibility and minimize the number of changes.
Fixes#2201.
(cherry picked from commit 3d98a6ef5ce0ba85acaee4ffffc53f0f21bd8fd2)
Since `std::equal_to::operator()` is not a device function, it
fails on GPU. On my device, I seem to get a silent crash in the
kernel (no reported error, but the kernel does not complete).
Replacing this with a portable version enables comparisons on device.
Addresses #2292 - would need to be cherry-picked. The 3.3 branch
also requires adding `EIGEN_DEVICE_FUNC` in `BooleanRedux.h` to get
fully working.
(cherry picked from commit 7880f10526a11dc5544426c54c5763de576bf285)
For empty or single-column matrices, the current `PartialPivLU`
currently dereferences a `nullptr` or accesses memory out-of-bounds.
Here we adjust the checks to avoid this.
(cherry picked from commit 154f00e9eacaec5667215784c7601b55024e2f61)
When calling conservativeResize() on a matrix with DontAlign flag, the
temporary variable used to perform the resize should have the same
Options as the original matrix to ensure that the correct override of
swap is called (i.e. PlainObjectBase::swap(DenseBase<OtherDerived> &
other). Calling the base class swap (i.e in DenseBase) results in
assertions errors or memory corruption.
(cherry picked from commit 391094c50743f28f9174f455661f650bf07e0177)
The cxx11 path for `numext::arg` incorrectly returned the complex type
instead of the real type, leading to compile errors. Fixed this and
added tests.
Related to !477, which uncovered the issue.
(cherry picked from commit 90e9a33e1ce3e4e7663dd67e6c1f225afaf5c206)
Fixes#2229.
For dynamic matrices with fixed-sized storage, only copy/swap
elements that have been set. Otherwise, this leads to inefficient
copying, and potential UB for non-initialized elements.
(cherry picked from commit d213a0bcea2344aa3f6c9856da9f5b2a26ccec25)
The namespace declaration for googlehash is a configurable macro that
can be disabled. In particular, it is disabled within google, causing
compile errors since `dense_hash_map`/`sparse_hash_map` are then in
the global namespace instead of in `::google`.
Here we play a bit of gynastics to allow for both `google::*_hash_map`
and `*_hash_map`, while limiting namespace polution. Symbols within
the `::google` namespace are imported into `Eigen::google`.
We also remove checks based on `_SPARSE_HASH_MAP_H_`, as this is
fragile, and instead require `EIGEN_GOOGLEHASH_SUPPORT` to be
defined.
Clang-tidy complains that full specializations in headers can cause ODR
violations. Marked these as `inline` to fix.
It also complains about renaming arguments in specializations. Set the
argument names to match.
Some CUDA/HIP constants fail on device with `constexpr` since they
internally rely on non-constexpr functions, e.g.
```
\#define CUDART_INF_F __int_as_float(0x7f800000)
```
This fails for cuda-clang (though passes with nvcc). These constants are
currently used by `device::numeric_limits`. For portability, we
need to remove `constexpr` from the affected functions.
For C++11 or higher, we should be able to rely on the `std::numeric_limits`
versions anyways, since the methods themselves are now `constexpr`, so
should be supported on device (clang/hipcc natively, nvcc with
`--expr-relaxed-constexpr`).
`g_called` is not used in subtest 7, so was generating a
`-Wunneeded-internal-declaration` warnings. Here we silence
it by initializing the static variable.
The original fails with nvcc+msvc - there's a static order of initialization
issue leading to registered tests being cleared. The test then fails on
```
VERIFY(EigenTest::all().size()>0);
```
since `EigenTest` no longer contains any tests. The singleton pattern
fixes this.
Replace usage of `std::numeric_limits<...>::min/max_exponent` in
codebase where possible. Also replaced some other `numeric_limits`
usages in affected tests with the `NumTraits` equivalent.
The previous MR !443 failed for c++03 due to lack of `constexpr`.
Because of this, we need to keep around the `std::numeric_limits`
version in enum expressions until the switch to c++11.
Fixes#2148
Replace usage of `std::numeric_limits<...>::min/max_exponent` in
codebase. Also replaced some other `numeric_limits` usages in
affected tests with the `NumTraits` equivalent.
Fixes#2148
NVCC does not understand `__forceinline`, so we need to use `inline`
when compiling for GPU.
ICC specializes `std::complex` operators for `float` and `double`
by default, which cannot be used on device and conflict with Eigen's
workaround in CUDA/Complex.h. This can be prevented by defining
`_OVERRIDE_COMPLEX_SPECIALIZATION_` before including `<complex>`.
Added this define to the tests and to `Eigen/Core`, but this will
not work if the user includes `<complex>` before `<Eigen/Core>`.
ICC also seems to generate a duplicate `Map` symbol in
`PlainObjectBase`:
```
error: "Map" has already been declared in the current scope
static ConstMapType Map(const Scalar *data)
```
I tracked this down to `friend class Eigen::Map`. Putting the `friend`
statements at the bottom of the class seems to resolve this issue.
Fixes#2180
The original swap approach leads to potential undefined behavior (reading
uninitialized memory) and results in unnecessary copying of data for static
storage.
Here we pass down the move assignment to the underlying storage. Static
storage does a one-way copy, dynamic storage does a swap.
Modified the tests to no longer read from the moved-from matrix/tensor,
since that can lead to UB. Added a test to ensure we do not access
uninitialized memory in a move.
Fixes: #2119
The macro `__cplusplus` is not defined correctly in MSVC unless building
with the the `/Zc:__cplusplus` flag. Instead, it defines `_MSVC_LANG` to the
specified c++ standard version number.
Here we introduce `EIGEN_CPLUSPLUS` which will contain the c++ version
number both for MSVC and otherwise. This simplifies checks for supported
features.
Also replaced most instances of standard version checking via `__cplusplus`
with the existing `EIGEN_COMP_CXXVER` macro for better clarity.
Fixes: #2170
This is a new version of !423, which failed for MSVC.
Defined `EIGEN_OPTIMIZATION_BARRIER(X)` that uses inline assembly to
prevent operations involving `X` from crossing that barrier. Should
work on most `GNUC` compatible compilers (MSVC doesn't seem to need
this). This is a modified version adapted from what was used in
`psincos_float` and tested on more platforms
(see #1674, https://godbolt.org/z/73ezTG).
Modified `rint` to use the barrier to prevent the add/subtract rounding
trick from being optimized away.
Also fixed an edge case for large inputs that get bumped up a power of two
and ends up rounding away more than just the fractional part. If we are
over `2^digits` then just return the input. This edge case was missed in
the test since the test was comparing approximate equality, which was still
satisfied. Adding a strict equality option catches it.