2954 Commits

Author SHA1 Message Date
Antonio Sanchez
ac561cd038 Reduce tensor_contract_gpu test.
The original test times out after 60 minutes on Windows, even when
setting flags to optimize for speed.  Reducing the number of
contractions performed from 3600->27 for subtests 8,9 allow the
two to run in just over a minute each.

(cherry picked from commit be9e7d205f38e3e8effdfdded88817b371673930)
2023-07-11 11:27:31 -07:00
Antonio Sanchez
554982beef Disable Tree reduction for GPU.
For moderately sized inputs, running the Tree reduction quickly
fills/overflows the GPU thread stack space, leading to memory errors.
This was happening in the `cxx11_tensor_complex_gpu` test, for example.
Disabling tree reduction on GPU fixes this.

(cherry picked from commit 24ebb37f38287d65c0e0b60c714e39faffeb5b94)
2023-07-10 16:09:30 -07:00
Antonio Sanchez
89a71f3126 Fix gpu special function tests.
Some checks used incorrect values, partly from copy-paste errors,
partly from the change in behaviour introduced in !398.

Modified results to match scipy, simplified tests by updating
`VERIFY_IS_CWISE_APPROX` to work for scalars.

(cherry picked from commit 701f5d1c91c770e558c7760da14ff3365757e275)
2023-07-10 15:57:08 -07:00
Antonio Sanchez
a605d6b996 Rename EIGEN_CUDA_FLAGS to EIGEN_CUDA_CXX_FLAGS
Also add a missing space for clang.

(cherry picked from commit 846d34384af80b80793d32257a7f917eeece41d4)
2023-07-10 15:30:41 -07:00
Antonio Sanchez
dfcd6de20a Clean up CUDA CMake files.
- Unify test/CMakeLists.txt and unsupported/test/CMakeLists.txt
- Added `EIGEN_CUDA_FLAGS` that are appended to the set of flags passed
to the cuda compiler (nvcc or clang).

The latter is to support passing custom flags (e.g. `-arch=` to nvcc,
or to disable cuda-specific warnings).

(cherry picked from commit 7b00e8b186a7679b0f46be742809a55d07d4efe8)
2023-07-10 15:30:41 -07:00
Antonio Sánchez
26b8fabd80 Return NaN in ndtri for values outside valid input range.
(cherry picked from commit 1f79a6078fb77da47069c8aec23c4e309fb982e2)
2023-07-10 14:52:08 -07:00
Rasmus Munk Larsen
f296720d7d Make sure we return +/-1 above the clamping point for Erf().
(cherry picked from commit b378014fef017a829fb42c7fad15f3764bfb8ef9)
2023-07-10 14:52:08 -07:00
Rasmus Munk Larsen
d4c24eca96 Don't crash on empty tensor contraction.
(cherry picked from commit b0f877f8e01e90a5b0f3a79d46ea234899f8b499)
2023-07-10 14:52:08 -07:00
Antonio Sánchez
72b0759451 Fix arm builds.
(cherry picked from commit 2c8011c2dd72d6c2086b181aad8cbb6204fed5db)
2023-07-10 14:52:08 -07:00
Chip Kerchner
8f1b6198c2 Fix epsilon and dummy_precision values in long double for double doubles. Prevented some algorithms from converging on PPC.
(cherry picked from commit 54459214a1b9c67df04bc529474fca1ec9f4c84f)
2023-07-10 14:52:08 -07:00
Antonio Sánchez
669dc8fadf Eliminate bool bitwise warnings.
(cherry picked from commit b8e93bf589fa66da404c66c48dc512b3e7484713)
2023-07-07 15:21:17 -07:00
Antonio Sánchez
ea57f9b78f AutoDiff depends on Core, so include appropriate header.
(cherry picked from commit e1165dbf9a16527ab085bec2749b02096fa1b584)
2023-07-07 15:21:17 -07:00
Antonio Sánchez
f55a112cb1 Fix ODR violations.
(cherry picked from commit bb51d9f4fa3cf1114348b9180640d6da7d3964f9)
2023-07-07 15:21:17 -07:00
Antonio Sánchez
a11bdf3965 Skip f16/bf16 bessel specializations on AVX512 if unavailable.
(cherry picked from commit 8ed3b9dcd6dd2e58ec0ad27438d09a90c72e549a)
2023-07-07 15:21:17 -07:00
Antonio Sánchez
80c5b8b3c3 Fix ambiguous comparisons for c++20 (again again)
(cherry picked from commit 8c2e0e3cb8c6ddcd828d6f1d2062a243c0dc9948)
2023-07-07 15:21:17 -07:00
Antonio Sánchez
af912a7b5c Fix MSVC+CUDA issues.
(cherry picked from commit 5ed7a86ae96d411c450fb190f5a725f38f2aea9d)
2023-07-07 15:21:17 -07:00
Antonio Sanchez
ac78f84b72 Eliminate trace unused warning.
(cherry picked from commit 9bc9992dd37e0379be888186a234b7641af306f7)
2023-07-07 15:06:18 -07:00
Antonio Sánchez
b158fcaa74 Fix edge-case in zeta for large inputs.
(cherry picked from commit 9296bb4b933973365d19b4b71e7d2b205d00a1ad)
2023-07-07 15:06:18 -07:00
Antonio Sánchez
b30a2a527e Remove poor non-convergence checks in NonLinearOptimization.
(cherry picked from commit d819a33bf64c4fce95c55f8e44a68b486f064a79)
2023-07-07 11:50:25 -07:00
Antonio Sanchez
bc1b354b32 Adjust tolerance of matrix_power test for MSVC.
(cherry picked from commit 1c2690ed248327539f7a248ddb12e1690da81b68)
2023-07-07 11:50:02 -07:00
Antonio Sánchez
36be6747e0 Modify test expression to avoid numerical differences (#2402).
(cherry picked from commit ae86a146b1ac9a49bf72e485254c08d237fd094a)
2023-07-07 11:45:56 -07:00
Antonio Sanchez
21e0ad056e Fix ODR failures in TensorRandom.
(cherry picked from commit bded5028a5bd112181b94b2a246ac2c20e671c2f)
2023-07-07 11:43:03 -07:00
Antonio Sánchez
52e545324e Fix ODR violations.
(cherry picked from commit cafeadffef2a7ba41f2da5cf34c38068d74499eb)
2023-07-07 11:37:31 -07:00
Antonio Sánchez
f3aaba8705 Revert "Replace call to FixedDimensions() with a singleton instance of"
This reverts commit 19e6496ce0c52fef33265bca54285ba77b2155be

(cherry picked from commit f7b31f864c0dec7872038cab79f6e677de2ecc71)
2022-04-10 15:34:11 +00:00
Antonio Sanchez
7e3bc4177e Fix tensor broadcast off-by-one error.
Caught by JAX unit tests.  Triggered if broadcast is smaller than packet
size.


(cherry picked from commit ffb78e23a1b3bc232a07773144cfa5fa1759852d)
2021-11-16 18:41:25 +00:00
Nico
71320af66a Fix -Wbitwise-instead-of-logical clang warning
& and | short-circuit, && and || don't. When both arguments to those
are boolean, the short-circuiting version is usually the desired one, so
clang warns on this.

Here, it is inconsequential, so switch to && and || to suppress the warning.

(cherry picked from commit b17bcddbca749f621040990a3efb840046315050)
2021-11-03 23:32:57 +00:00
Antonio Sanchez
0ab1f8ec03 Fix broadcasting oob error.
For vectorized 1-dimensional inputs that do not take the special
blocking path (e.g. `std::complex<...>`), there was an
index-out-of-bounds error causing the broadcast size to be
computed incorrectly.  Here we fix this, and make other minor
cleanup changes.

Fixes #2351.


(cherry picked from commit a500da1dc089b08e2f2b3b05a2eb23194425460e)
2021-11-03 23:30:47 +00:00
Antonio Sanchez
f9b2e92040 Remove bad "take" impl that causes g++-11 crash.
For some reason, having `take<n, numeric_list<T>>` for `n > 0` causes
g++-11 to ICE with
```
sorry, unimplemented: unexpected AST of kind nontype_argument_pack
```
It does work with other versions of gcc, and with clang.
I filed a GCC bug
[here](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102999).

Technically we should never actually run into this case, since you
can't take n > 0 elements from an empty list.  Commenting it out
allows our Eigen tests to pass.


(cherry picked from commit 8f8c2ba2fe19c6c2e47bbe2fbaf87594642e523d)
2021-11-03 23:26:34 +00:00
Maxiwell S. Garcia
b8cf1ed753 Rename 'vec_all_nan' of cxx11_tensor_expr test because this symbol is used by altivec.h
(cherry picked from commit 09fc0f97b53e22d8fef94acf0fbfeed3717ab906)
2021-09-01 17:26:59 +00:00
Antonio Sanchez
c2b6df6e60 Disable cuda Eigen::half vectorization on host.
All cuda `__half` functions are device-only in CUDA 9, including
conversions. Host-side conversions were added in CUDA 10.
The existing code doesn't build prior to 10.0.

All arithmetic functions are always device-only, so there's
therefore no reason to use vectorization on the host at all.

Modified the code to disable vectorization for `__half` on host,
which required also updating the `TensorReductionGpu` implementation
which previously made assumptions about available packets.


(cherry picked from commit cc3573ab4451853774cd5c3497373d5fe8914774)
2021-08-31 21:23:11 +00:00
jenswehner
338924602d added includes for unordered_map
(cherry picked from commit e3e74001f7c4bf95f0dde572e8a08c5b2918a3ab)
2021-08-10 16:10:03 +00:00
Antonio Sanchez
46ecdcd745 Fix MPReal detection and support.
The latest version of `mpreal` has a bug that breaks `min`/`max`.
It also breaks with the latest dev version of `mpfr`. Here we
add `FindMPREAL.cmake` which searches for the library and tests if
compilation works.

Removed our internal copy of `mpreal.h` under `unsupported/test`, as
it is out-of-sync with the latest, and similarly breaks with
the latest `mpfr`.  It would be best to use the installed version
of `mpreal` anyways, since that's what we actually want to test.

Fixes #2282.


(cherry picked from commit 31f796ebef35eeadd0e26878aab3fe99ca412a45)
2021-08-03 18:13:12 +00:00
Antonio Sanchez
bb33880e57 Fix TriSycl CMake files.
This is to enable compiling with the latest trisycl. `FindTriSYCL.cmake` was
broken by commit 00f32752, which modified `add_sycl_to_target` for ComputeCPP.
This makes the corresponding modifications for trisycl to make them consistent.

Also, trisycl now requires c++17.


(cherry picked from commit 8cf6cb27baa9607cc00e5dbb42a1c31efda41b74)
2021-08-03 17:25:17 +00:00
Alexander Karatarakis
c334eece44 _DerType -> DerivativeType as underscore-followed-by-caps is a reserved identifier
(cherry picked from commit f357283d3128a6253af09705155ce4f9f113e3c8)
2021-07-29 18:18:47 +00:00
Jonas Harsch
5a3c9eddb4 Removed superfluous boolean degenerate in TensorMorphing.h.
(cherry picked from commit e9c9a3130b7307a240335aa527a6d4c5fb2ee471)
2021-07-08 18:34:10 +00:00
Antonio Sanchez
84955d109f Fix Tensor documentation page.
The extra [TOC] tag is generating a huge floating duplicated
table-of-contents, which obscures the majority of the page
(see bottom of https://eigen.tuxfamily.org/dox/unsupported/eigen_tensors.html).
Remove it.

Also, headers do not support markup (see
[doxygen bug](https://github.com/doxygen/doxygen/issues/7467)), so
backticks like
```
```
end up generating titles that looks like
```
Constructor <tt>Tensor<double,2></tt>
```
Removing backticks for now.  To generate proper formatted headers, we
must directly use html instead of markdown, i.e.
```
<h2>Constructor <code>Tensor&lt;double,2&gt;</code></h2>
```
which is ugly.

Fixes #2254.


(cherry picked from commit f5a9873bbb5488bcba3e37f92b4ec09a8db76081)
2021-07-07 17:18:20 +00:00
Jonas Harsch
601814b575 Don't crash when attempting to shuffle an empty tensor.
(cherry picked from commit aab747021be5ed1a1e9667243d884eb72003599d)
2021-07-02 21:08:38 +00:00
Antonio Sanchez
8190739f12 Fix compile issues for gcc 4.8.
- Move constructors can only be defaulted as NOEXCEPT if all members
have NOEXCEPT move constructors.
- gcc 4.8 has some funny parsing bug in `a < b->c`, thinking `b-` is a template parameter.


(cherry picked from commit 6035da5283f12f7e6a49cda0c21696c8e5a115b7)
2021-07-01 23:18:10 +00:00
Antonio Sanchez
d82d915047 Modify tensor argmin/argmax to always return first occurence.
As written, depending on multithreading/gpu, the returned index from
`argmin`/`argmax` is not currently stable.  Here we modify the functors
to always keep the first occurence (i.e. if the value is equal to the
current min/max, then keep the one with the smallest index).

This is otherwise causing unpredictable results in some TF tests.


(cherry picked from commit 3a087ccb99b454dc34484333e608e836e7032213)
2021-06-29 23:28:37 +00:00
Antonio Sanchez
a2040ef796 Rewrite balancer to avoid overflows.
The previous balancer overflowed for large row/column norms.
Modified to prevent that.

Fixes #2273.


(cherry picked from commit e9ab4278b7aba6f279c964d99ae5a312d12ab04b)
2021-06-21 18:14:53 +00:00
Antonio Sanchez
2d6eaaf687 Fix placement of permanent GPU defines.
(cherry picked from commit 954879183b1e008d7f0fefb97e48a925c4e3fb16)
2021-06-15 19:18:20 +00:00
Rasmus Munk Larsen
47722a66f2 Fix more enum arithmetic.
(cherry picked from commit 13fb5ab92c3226f7b9be20882b0418d53516d35a)
2021-06-15 16:40:35 +00:00
Antonio Sanchez
b5fc69bdd8 Add ability to permanently enable HIP/CUDA gpu* defines.
When using Eigen for gpu, these simplify portability.  If
`EIGEN_PERMANENTLY_ENABLE_GPU_HIP_CUDA_DEFINES` is set, then
we do not undefine them.


(cherry picked from commit 514977f31b1c00b233969f12321a25d859dd1efa)
2021-06-11 17:48:37 +00:00
Antonio Sanchez
4b683b65df Allow custom TENSOR_CONTRACTION_DISPATCH macro.
Currently TF lite needs to hack around with the Tensor headers in order
to customize the contraction dispatch method. Here we add simple `#ifndef`
guards to allow them to provide their own dispatch prior to inclusion.


(cherry picked from commit 6aec83263d32c29f6c5623b9716ec7e367693078)
2021-06-11 17:19:29 +00:00
Rohit Santhanam
cbb6ae6296 Removed dead code from GPU float16 unit test.
(cherry picked from commit c8d40a7bf1915015c991b108cf2cd6a32138fdc8)
2021-06-10 17:16:47 +00:00
Nathan Luehr
82f13830e6 Fix calls to device functions from host code
(cherry picked from commit 972cf0c28a8d2ee0808c1277dea2c5c206591ce6)
2021-05-12 17:01:45 +00:00
Antonio Sanchez
25424f4cf1 Clean up gpu device properties.
Made a class and singleton to encapsulate initialization and retrieval of
device properties.

Related to !481, which already changed the API to address a static
linkage issue.


(cherry picked from commit 0eba8a1fe3e0fa78f0e6760c0e1265817491845d)
2021-05-07 18:13:40 +00:00
Antonio Sanchez
da19f7a910 Simplify TensorRandom and remove time-dependence.
Time-dependence prevents tests from being repeatable. This has long
been an issue with debugging the tensor tests. Removing this will allow
future tests to be repeatable in the usual way.

Also, the recently added macros in !476 are causing headaches across different
platforms. For example, checking `_XOPEN_SOURCE` is leading to multiple
ambiguous macro errors across Google, and `_DEFAULT_SOURCE`/`_SVID_SOURCE`/`_BSD_SOURCE`
are sometimes defined with values, sometimes defined as empty, and sometimes
not defined at all when they probably should be.  This is leading to
multiple build breakages.

The simplest approach is to generate a seed via
`Eigen::internal::random<uint64_t>()` if on CPU. For GPU, we use a
hash based on the current thread ID (since `rand()` isn't supported
on GPU).

Fixes #1602.


(cherry picked from commit e3b7f59659689015aa254ed67c48d870831f086f)
2021-05-05 23:37:48 +00:00
Turing Eret
baf601a0e3 Fix for issue with static global variables in TensorDeviceGpu.h
m_deviceProperties and m_devicePropInitialized are defined as global
statics which will define multiple copies which can cause issues if
initializeDeviceProp() is called in one translation unit and then
m_deviceProperties is used in a different translation unit. Added
inline functions getDeviceProperties() and getDevicePropInitialized()
which defines those variables as static locals. As per the C++ standard
7.1.2/4, a static local declared in an inline function always refers
to the same object, so this should be safer. Credit to Sun Chenggen
for this fix.

This fixes issue #1475.


(cherry picked from commit 3804ca0d905a0a03357db50abc7468f5f90abc98)
2021-04-23 19:06:16 +00:00
Antonio Sanchez
587a691516 Check existence of BSD random before use.
`TensorRandom` currently relies on BSD `random()`, which is not always
available.  The [linux manpage](https://man7.org/linux/man-pages/man3/srandom.3.html)
gives the glibc condition:
```
_XOPEN_SOURCE >= 500
               || /* Glibc since 2.19: */ _DEFAULT_SOURCE
	       || /* Glibc <= 2.19: */ _SVID_SOURCE ||  _BSD_SOURCE
```
In particular, this was failing to compile for MinGW via msys2. If not
available, we fall back to using `rand()`.


(cherry picked from commit 045c0609b5c059974104f29dad91bcc3828e91ac)
2021-04-23 00:35:05 +00:00