11476 Commits

Author SHA1 Message Date
Antonio Sanchez
5e75331b9f Fix checking of version number for mingw.
MinGW spits out version strings like: `x86_64-w64-mingw32-g++ (GCC)
10-win32 20210110`, which causes the version extraction to fail.
Added support for this with tests.

Also added `make_unsigned` for `long long`, since mingw seems to
use that for `uint64_t`.

Related to #2268.  CMake and build passes for me after this.


(cherry picked from commit ad82d20cf649ba8c07352f947fd25766d0328df2)
2021-06-12 00:02:26 +00:00
Antonio Sanchez
b5fc69bdd8 Add ability to permanently enable HIP/CUDA gpu* defines.
When using Eigen for gpu, these simplify portability.  If
`EIGEN_PERMANENTLY_ENABLE_GPU_HIP_CUDA_DEFINES` is set, then
we do not undefine them.


(cherry picked from commit 514977f31b1c00b233969f12321a25d859dd1efa)
2021-06-11 17:48:37 +00:00
Antonio Sanchez
4b683b65df Allow custom TENSOR_CONTRACTION_DISPATCH macro.
Currently TF lite needs to hack around with the Tensor headers in order
to customize the contraction dispatch method. Here we add simple `#ifndef`
guards to allow them to provide their own dispatch prior to inclusion.


(cherry picked from commit 6aec83263d32c29f6c5623b9716ec7e367693078)
2021-06-11 17:19:29 +00:00
Rasmus Munk Larsen
1cb1ffd5b2 Use bit_cast to create -0.0 for floating point types to avoid compiler optimization changing sign with --ffast-math enabled.
(cherry picked from commit fc87e2cbaa65e7e93a2c695ce5a9dc048a64a985)
2021-06-11 02:57:02 +00:00
Rasmus Munk Larsen
4b502a7215 Fix c++20 warnings about using enums in arithmetic expressions.
(cherry picked from commit f64b2954c711b7846ae6ae228c5f14bd8dd56ec4)
2021-06-11 02:35:19 +00:00
Nicolas Cornu
85868564df Fix parsing of version for nvhpc
As the first line of the version is empty it crashes,
so delete first line if it is empty


(cherry picked from commit 001a57519a7aa909d3bf0cd8c6ec8a9cd19d9c70)
2021-06-10 18:50:22 +00:00
Rohit Santhanam
cbb6ae6296 Removed dead code from GPU float16 unit test.
(cherry picked from commit c8d40a7bf1915015c991b108cf2cd6a32138fdc8)
2021-06-10 17:16:47 +00:00
Cyril Kaiser
573570b6c9 Remove EIGEN_DEVICE_FUNC from CwiseBinaryOp's default copy constructor.
(cherry picked from commit 91cd67f057f90101cf858d63916ee56a58511b0d)
2021-05-26 19:45:25 +00:00
Antonio Sanchez
98cf1e076f Add missing NEON ptranspose implementations.
Unified implementation using only `vzip`.


(cherry picked from commit dba753a986b527a17c8cc62474d0487aec7c2b36)
2021-05-25 19:09:50 +00:00
Antonio Sanchez
ee2a8f7139 Modify Unary/Binary/TernaryOp evaluators to work for non-class types.
This used to work for non-class types (e.g. raw function pointers) in
Eigen 3.3.  This was changed in commit 11f55b29 to optimize the
evaluator:

> `sizeof((A-B).cwiseAbs2())` with A,B Vector4f is now 16 bytes, instead of 48 before this optimization.

though I cannot reproduce the 16 byte result.  Both before the change
and after, with multiple compilers/versions, I always get a result of 40 bytes.

https://godbolt.org/z/MsjTc1PGe

This change modifies the code slightly to allow non-class types.  The
final generated code is identical, and the expression remains 40 bytes
for the `abs2` sample case.

Fixes #2251


(cherry picked from commit ebb300d0b4340104dcade3afa656a57da2b7660c)
2021-05-25 18:19:53 +00:00
Jakub Lichman
3835046309 predux_half_dowto4 test extended to all applicable packets
(cherry picked from commit 12471fcb5d59f969c60a9b78727624dc91e5c04e)
2021-05-21 16:58:16 +00:00
Steve Bronder
4fbd01cd4b Adds macro for checking if C++14 variable templates are supported
(cherry picked from commit 17200570239f23b2f0d3b434bc0269c46c409791)
2021-05-21 16:43:30 +00:00
Niall Murphy
a883a8797c Use derived object type in conservative_resize_like_impl
When calling conservativeResize() on a matrix with DontAlign flag, the
temporary variable used to perform the resize should have the same
Options as the original matrix to ensure that the correct override of
swap is called (i.e. PlainObjectBase::swap(DenseBase<OtherDerived> &
other). Calling the base class swap (i.e in DenseBase) results in
assertions errors or memory corruption.


(cherry picked from commit 391094c50743f28f9174f455661f650bf07e0177)
2021-05-20 23:43:57 +00:00
Jakub Lichman
0bd9e9bc45 ptranpose test for non-square kernels added
(cherry picked from commit 8877f8d9b2631301ba070d645cdc3fc9b9f764f5)
2021-05-20 19:27:20 +00:00
Guoqiang QI
77c66e368c Ensure all generated matrices for inverse_4x4 testes are invertible, this fix #2248 .
(cherry picked from commit 3e006bfd31e4389e8c5718c30409cddb65a73b04)
2021-05-13 15:03:47 +00:00
guoqiangqi
2f908f8255 Changing the storage of the SSE complex packets to that of the wrapper. This should fix #2242 .
(cherry picked from commit 3d9051ea84a5089b277c88dac456b3b1576bfa7f)
2021-05-12 17:02:19 +00:00
Nathan Luehr
82f13830e6 Fix calls to device functions from host code
(cherry picked from commit 972cf0c28a8d2ee0808c1277dea2c5c206591ce6)
2021-05-12 17:01:45 +00:00
Nathan Luehr
d1825cbb68 Device implementation of log for std::complex types.
(cherry picked from commit 7e6a1c129c201db4eff46f4dd68acdc7e935eaf2)
2021-05-11 22:31:53 +00:00
Nathan Luehr
d9288f078d Fix ambiguity due to argument dependent lookup.
(cherry picked from commit 6753f0f197e7b8a8019e82e7b144ac0281d6a7f1)
2021-05-11 22:00:36 +00:00
Rohit Santhanam
85ebd6aff8 Fix for issue where numext::imag and numext::real are used before they are defined.
(cherry picked from commit 39ec31c0adbdde6b8cda36b3415e9cc2af20dab6)
2021-05-10 20:14:10 +00:00
Antonio Sanchez
2947c0cc84 Restore ABI compatibility for conj with 3.3, fix conflict with boost.
The boost library unfortunately specializes `conj` for various types and
assumes the original two-template-parameter version.  This changes
restores the second parameter.  This also restores ABI compatibility.

The specialization for `std::complex` is because `std::conj` is not
a device function. For custom complex scalar types, users should provide
their own `conj` implementation.

We may consider removing the unnecessary second parameter in the future - but
this will require modifying boost as well.

Fixes #2112.


(cherry picked from commit c0eb5f89a406243f71eae0b705eba4437d9f8565)
2021-05-07 18:38:23 +00:00
Antonio Sanchez
25424f4cf1 Clean up gpu device properties.
Made a class and singleton to encapsulate initialization and retrieval of
device properties.

Related to !481, which already changed the API to address a static
linkage issue.


(cherry picked from commit 0eba8a1fe3e0fa78f0e6760c0e1265817491845d)
2021-05-07 18:13:40 +00:00
Antonio Sanchez
42acbd5700 Fix numext::arg return type.
The cxx11 path for `numext::arg` incorrectly returned the complex type
instead of the real type, leading to compile errors. Fixed this and
added tests.

Related to !477, which uncovered the issue.


(cherry picked from commit 90e9a33e1ce3e4e7663dd67e6c1f225afaf5c206)
2021-05-07 17:52:07 +00:00
Christoph Hertzberg
9e0dc8f09b Revert addition of unused paddsub<Packet2cf>. This fixes #2242
(cherry picked from commit 722ca0b665666f3af579002ad752541d7319d1b6)
2021-05-07 16:23:03 +00:00
Antonio Sanchez
da19f7a910 Simplify TensorRandom and remove time-dependence.
Time-dependence prevents tests from being repeatable. This has long
been an issue with debugging the tensor tests. Removing this will allow
future tests to be repeatable in the usual way.

Also, the recently added macros in !476 are causing headaches across different
platforms. For example, checking `_XOPEN_SOURCE` is leading to multiple
ambiguous macro errors across Google, and `_DEFAULT_SOURCE`/`_SVID_SOURCE`/`_BSD_SOURCE`
are sometimes defined with values, sometimes defined as empty, and sometimes
not defined at all when they probably should be.  This is leading to
multiple build breakages.

The simplest approach is to generate a seed via
`Eigen::internal::random<uint64_t>()` if on CPU. For GPU, we use a
hash based on the current thread ID (since `rand()` isn't supported
on GPU).

Fixes #1602.


(cherry picked from commit e3b7f59659689015aa254ed67c48d870831f086f)
2021-05-05 23:37:48 +00:00
Antonio Sanchez
fc2cc10842 Better CUDA complex division.
The original produced NaNs when dividing 0/b for subnormal b.
The `complex_divide_stable` was changed to use the more common
Smith's algorithm.


(cherry picked from commit 1c013be2cc6a999268be2f25575cd6a07bd52c45)
2021-04-29 17:58:45 +00:00
Antonio Sanchez
a33855f6ee Add missing pcmp_lt_or_nan for NEON Packet4bf.
(cherry picked from commit 172db7bfc32def5ed0f885287e352b63dd5cd767)
2021-04-27 21:15:08 +00:00
Theo Fletcher
83df5df61b Added complex matrix unit tests for SelfAdjointEigenSolve
(cherry picked from commit 2ced0cc233fff6ef16c4d098b03aeeb69ff7c509)
2021-04-26 19:18:53 +00:00
Jakub Lichman
ac3c5aad31 Tests added and AVX512 bug fixed for pcmp_lt_or_nan
(cherry picked from commit d87648a6bea315645b893c3815ca8c6bb00ec5d2)
2021-04-26 18:07:55 +00:00
Jakub Lichman
63abb10000 Tests for pcmp_lt and pcmp_le added
(cherry picked from commit 1115f5462ecaa84d3c60479f7e23a530a1a415d2)
2021-04-23 19:52:23 +00:00
Turing Eret
baf601a0e3 Fix for issue with static global variables in TensorDeviceGpu.h
m_deviceProperties and m_devicePropInitialized are defined as global
statics which will define multiple copies which can cause issues if
initializeDeviceProp() is called in one translation unit and then
m_deviceProperties is used in a different translation unit. Added
inline functions getDeviceProperties() and getDevicePropInitialized()
which defines those variables as static locals. As per the C++ standard
7.1.2/4, a static local declared in an inline function always refers
to the same object, so this should be safer. Credit to Sun Chenggen
for this fix.

This fixes issue #1475.


(cherry picked from commit 3804ca0d905a0a03357db50abc7468f5f90abc98)
2021-04-23 19:06:16 +00:00
Antonio Sanchez
587a691516 Check existence of BSD random before use.
`TensorRandom` currently relies on BSD `random()`, which is not always
available.  The [linux manpage](https://man7.org/linux/man-pages/man3/srandom.3.html)
gives the glibc condition:
```
_XOPEN_SOURCE >= 500
               || /* Glibc since 2.19: */ _DEFAULT_SOURCE
	       || /* Glibc <= 2.19: */ _SVID_SOURCE ||  _BSD_SOURCE
```
In particular, this was failing to compile for MinGW via msys2. If not
available, we fall back to using `rand()`.


(cherry picked from commit 045c0609b5c059974104f29dad91bcc3828e91ac)
2021-04-23 00:35:05 +00:00
Antonio Sanchez
8830d66c02 DenseStorage safely copy/swap.
Fixes #2229.

For dynamic matrices with fixed-sized storage, only copy/swap
elements that have been set.  Otherwise, this leads to inefficient
copying, and potential UB for non-initialized elements.


(cherry picked from commit d213a0bcea2344aa3f6c9856da9f5b2a26ccec25)
2021-04-22 21:05:50 +00:00
Rasmus Munk Larsen
54425a39b2 Make vectorized compute_inverse_size4 compile with AVX.
(cherry picked from commit 85a76a16ea835fcfa7d4c185a338ae2aef9a272a)
2021-04-22 17:25:25 +00:00
Jakub Lichman
34d0be9ec1 Compilation of basicbenchmark fixed
(cherry picked from commit d72c794ccd21637ba56dec0dd8bd0cffef7bc47e)
2021-04-21 12:09:42 +02:00
Jakub Lichman
42a8bdd4d7 HasExp added for AVX512 Packet8d
(cherry picked from commit 2b1dfd1ba0638e57a50d2f401412e0893064c354)
2021-04-21 12:09:21 +02:00
Chip-Kerchner
28564957ac Fix taking address of rvalue compiler issue with TensorFlow (plus other warnings).
(cherry picked from commit 06c2760bd1139711eeffa30266ead43423891698)
2021-04-21 01:05:21 +00:00
Antonio Sanchez
ab7fe215f9 Fix ldexp for AVX512 (#2215)
Wrong shuffle was used.  Need to interleave low/high halves with a
`permute` instruction.

Fixes #2215.


(cherry picked from commit 1d79c68ba0507574d893780e60b982f07d210261)
2021-04-20 20:52:26 +00:00
David Tellenbach
1f4c0311cd Bump to 3.3.91 (3.4-rc1) 3.4-rc1 2021-04-18 23:43:12 +02:00
David Tellenbach
3e819d83bf Before 3.4 branch before-3.4 2021-04-18 23:36:14 +02:00
Antonio Sanchez
69adf26aa3 Modify googlehash use to account for namespace issues.
The namespace declaration for googlehash is a configurable macro that
can be disabled.  In particular, it is disabled within google, causing
compile errors since `dense_hash_map`/`sparse_hash_map` are then in
the global namespace instead of in `::google`.

Here we play a bit of gynastics to allow for both `google::*_hash_map`
and `*_hash_map`, while limiting namespace polution.  Symbols within
the `::google` namespace are imported into `Eigen::google`.

We also remove checks based on `_SPARSE_HASH_MAP_H_`, as this is
fragile, and instead require `EIGEN_GOOGLEHASH_SUPPORT` to be
defined.
2021-04-12 19:00:39 -07:00
Christoph Hertzberg
9357feedc7 Avoid using uninitialized inputs and if available, use slightly more efficient movsd instruction for pset1<Packet2cf>. 2021-04-13 01:36:59 +02:00
Rasmus Munk Larsen
a2c0542010 Fix typo in TensorDimensions.h 2021-04-12 18:59:56 +00:00
Rohit Santhanam
dfd6720d82 Fix for float16 GPU unit test. 2021-04-12 10:19:06 +00:00
Christoph Hertzberg
1e1c8a735c Use EIGEN_HAS_CXX11 and EIGEN_COMP_CXXVER macros to detect C++ version for std::result_of and std::invoke_result.
Fixes #2209
2021-04-12 01:26:15 +00:00
Jens Wehner
f6fc66aa75 fixed doxygen for unsupported iterative solver module 2021-04-11 16:26:14 +00:00
Christoph Hertzberg
d58678069c Make iterators default constructible and assignable, by making... 2021-04-09 17:03:28 +00:00
Rohit Santhanam
2859db0220 This fixes an issue where the compiler was not choosing the GPU specific specialization of ScanLauncher.
The issue was discovered when the GPU scan unit test was run and resulted in a segmentation fault.

The segmantation fault occurred because the unit test allocated GPU memory and passed a pointer to that memory to the computation that it presumed would execute on the GPU.

But because of the issue, the computation was scheduled to execute on the CPU so a situation was constructed where the CPU attempted to access a GPU memory location.

The fix expands the GPU specific ScanLauncher specialization to handle cases where vectorization is enabled.

Previously, the GPU specialization is chosen only if Vectorization is not used.
2021-04-08 15:14:48 +00:00
Antonio Sanchez
fcb5106c6e Scaled epsilon the wrong way.
Should have been 0.5 to widen the bounds, since this is inverse
precision.  Setting to 0.5, however, leads to many more failing
tests at Google, so reverting to 1 for now.
2021-04-07 15:08:39 -07:00
Christoph Hertzberg
6197ce1a35 Replace -2147483648 by -0.0f or -0.0 constants (this should fix #2189).
Also, remove unnecessary `pgather` operations.
2021-04-07 11:25:27 +00:00