MinGW spits out version strings like: `x86_64-w64-mingw32-g++ (GCC)
10-win32 20210110`, which causes the version extraction to fail.
Added support for this with tests.
Also added `make_unsigned` for `long long`, since mingw seems to
use that for `uint64_t`.
Related to #2268. CMake and build passes for me after this.
(cherry picked from commit ad82d20cf649ba8c07352f947fd25766d0328df2)
When using Eigen for gpu, these simplify portability. If
`EIGEN_PERMANENTLY_ENABLE_GPU_HIP_CUDA_DEFINES` is set, then
we do not undefine them.
(cherry picked from commit 514977f31b1c00b233969f12321a25d859dd1efa)
Currently TF lite needs to hack around with the Tensor headers in order
to customize the contraction dispatch method. Here we add simple `#ifndef`
guards to allow them to provide their own dispatch prior to inclusion.
(cherry picked from commit 6aec83263d32c29f6c5623b9716ec7e367693078)
As the first line of the version is empty it crashes,
so delete first line if it is empty
(cherry picked from commit 001a57519a7aa909d3bf0cd8c6ec8a9cd19d9c70)
This used to work for non-class types (e.g. raw function pointers) in
Eigen 3.3. This was changed in commit 11f55b29 to optimize the
evaluator:
> `sizeof((A-B).cwiseAbs2())` with A,B Vector4f is now 16 bytes, instead of 48 before this optimization.
though I cannot reproduce the 16 byte result. Both before the change
and after, with multiple compilers/versions, I always get a result of 40 bytes.
https://godbolt.org/z/MsjTc1PGe
This change modifies the code slightly to allow non-class types. The
final generated code is identical, and the expression remains 40 bytes
for the `abs2` sample case.
Fixes#2251
(cherry picked from commit ebb300d0b4340104dcade3afa656a57da2b7660c)
When calling conservativeResize() on a matrix with DontAlign flag, the
temporary variable used to perform the resize should have the same
Options as the original matrix to ensure that the correct override of
swap is called (i.e. PlainObjectBase::swap(DenseBase<OtherDerived> &
other). Calling the base class swap (i.e in DenseBase) results in
assertions errors or memory corruption.
(cherry picked from commit 391094c50743f28f9174f455661f650bf07e0177)
The boost library unfortunately specializes `conj` for various types and
assumes the original two-template-parameter version. This changes
restores the second parameter. This also restores ABI compatibility.
The specialization for `std::complex` is because `std::conj` is not
a device function. For custom complex scalar types, users should provide
their own `conj` implementation.
We may consider removing the unnecessary second parameter in the future - but
this will require modifying boost as well.
Fixes#2112.
(cherry picked from commit c0eb5f89a406243f71eae0b705eba4437d9f8565)
Made a class and singleton to encapsulate initialization and retrieval of
device properties.
Related to !481, which already changed the API to address a static
linkage issue.
(cherry picked from commit 0eba8a1fe3e0fa78f0e6760c0e1265817491845d)
The cxx11 path for `numext::arg` incorrectly returned the complex type
instead of the real type, leading to compile errors. Fixed this and
added tests.
Related to !477, which uncovered the issue.
(cherry picked from commit 90e9a33e1ce3e4e7663dd67e6c1f225afaf5c206)
Time-dependence prevents tests from being repeatable. This has long
been an issue with debugging the tensor tests. Removing this will allow
future tests to be repeatable in the usual way.
Also, the recently added macros in !476 are causing headaches across different
platforms. For example, checking `_XOPEN_SOURCE` is leading to multiple
ambiguous macro errors across Google, and `_DEFAULT_SOURCE`/`_SVID_SOURCE`/`_BSD_SOURCE`
are sometimes defined with values, sometimes defined as empty, and sometimes
not defined at all when they probably should be. This is leading to
multiple build breakages.
The simplest approach is to generate a seed via
`Eigen::internal::random<uint64_t>()` if on CPU. For GPU, we use a
hash based on the current thread ID (since `rand()` isn't supported
on GPU).
Fixes#1602.
(cherry picked from commit e3b7f59659689015aa254ed67c48d870831f086f)
The original produced NaNs when dividing 0/b for subnormal b.
The `complex_divide_stable` was changed to use the more common
Smith's algorithm.
(cherry picked from commit 1c013be2cc6a999268be2f25575cd6a07bd52c45)
m_deviceProperties and m_devicePropInitialized are defined as global
statics which will define multiple copies which can cause issues if
initializeDeviceProp() is called in one translation unit and then
m_deviceProperties is used in a different translation unit. Added
inline functions getDeviceProperties() and getDevicePropInitialized()
which defines those variables as static locals. As per the C++ standard
7.1.2/4, a static local declared in an inline function always refers
to the same object, so this should be safer. Credit to Sun Chenggen
for this fix.
This fixes issue #1475.
(cherry picked from commit 3804ca0d905a0a03357db50abc7468f5f90abc98)
`TensorRandom` currently relies on BSD `random()`, which is not always
available. The [linux manpage](https://man7.org/linux/man-pages/man3/srandom.3.html)
gives the glibc condition:
```
_XOPEN_SOURCE >= 500
|| /* Glibc since 2.19: */ _DEFAULT_SOURCE
|| /* Glibc <= 2.19: */ _SVID_SOURCE || _BSD_SOURCE
```
In particular, this was failing to compile for MinGW via msys2. If not
available, we fall back to using `rand()`.
(cherry picked from commit 045c0609b5c059974104f29dad91bcc3828e91ac)
Fixes#2229.
For dynamic matrices with fixed-sized storage, only copy/swap
elements that have been set. Otherwise, this leads to inefficient
copying, and potential UB for non-initialized elements.
(cherry picked from commit d213a0bcea2344aa3f6c9856da9f5b2a26ccec25)
Wrong shuffle was used. Need to interleave low/high halves with a
`permute` instruction.
Fixes#2215.
(cherry picked from commit 1d79c68ba0507574d893780e60b982f07d210261)
The namespace declaration for googlehash is a configurable macro that
can be disabled. In particular, it is disabled within google, causing
compile errors since `dense_hash_map`/`sparse_hash_map` are then in
the global namespace instead of in `::google`.
Here we play a bit of gynastics to allow for both `google::*_hash_map`
and `*_hash_map`, while limiting namespace polution. Symbols within
the `::google` namespace are imported into `Eigen::google`.
We also remove checks based on `_SPARSE_HASH_MAP_H_`, as this is
fragile, and instead require `EIGEN_GOOGLEHASH_SUPPORT` to be
defined.
The issue was discovered when the GPU scan unit test was run and resulted in a segmentation fault.
The segmantation fault occurred because the unit test allocated GPU memory and passed a pointer to that memory to the computation that it presumed would execute on the GPU.
But because of the issue, the computation was scheduled to execute on the CPU so a situation was constructed where the CPU attempted to access a GPU memory location.
The fix expands the GPU specific ScanLauncher specialization to handle cases where vectorization is enabled.
Previously, the GPU specialization is chosen only if Vectorization is not used.
Should have been 0.5 to widen the bounds, since this is inverse
precision. Setting to 0.5, however, leads to many more failing
tests at Google, so reverting to 1 for now.