Rasmus Munk Larsen
7b5a8b6bc5
Improve plog: 20% speedup for float + handle denormals
2022-01-05 23:40:31 +00:00
Andrew Johnson
a491c7f898
Allow specifying inner & outer stride for CWiseUnaryView - fixes #2398
2022-01-05 19:24:46 +00:00
Erik Schultheis
9210e71fb3
ensure that eigen::internal::size is not found by ADL, rename to ssize and...
2022-01-05 00:46:09 +00:00
Lingzhu Xiang
7244a74ab0
Add bounds checking to Eigen serializer
2022-01-03 17:00:24 +08:00
Shiva Ghose
a4098ac676
Fix duplicate include guard *ALTIVEC_H -> *ZVECTOR_H
...
Some some header guards were repeated between the `AltiVec` package and the
`ZVector` packages. This could cause a problem if (for whatever reason) someone
attempts to include headers for both architectures.
2021-12-31 08:43:24 +00:00
David Tellenbach
d705eb5f86
Revert "Select AVX2 even if the data size is not a multiple of 8"
...
Tests are failing for AVX and NEON.
This reverts commit eb85b97339e3791d533592bac20999b1b3ebca09.
2021-12-28 23:57:06 +01:00
Rasmus Munk Larsen
8eab7b6886
Improve exp<float>(): Don't flush denormal results +4% speedup.
...
1. Speed up exp(x) by reducing the polynomial approximant from degree 7 to
degree 6. With exactly representable coefficients computed by the Sollya tool,
this still gives a maximum relative error of 1 ulp, i.e. faithfully rounded, for
arguments where exp(x) is a normalized float. This change results in a speedup
of about 4% for AVX2.
2. Extend the range where exp(x) returns a non-zero result to from ~[-88;88] to
~[-104;88] i.e. return denormalized values for large negative arguments instead
of zero. Compared to exp<double>(x) the denormalized results gradually decrease
in accuracy down to 0.033 relative error for arguments around x = -104 where
exp(x) is ~std::numeric<float>::denorm_min(). This is expected and acceptable.
2021-12-28 15:00:19 +00:00
David Tellenbach
c06c3e52a0
Include immintrin.h if F16C is available and vectorization is disabled
...
If EIGEN_DONT_VECTORIZE is defined, immintrin.h is not included even if F16C is available. Trying to use F16C intrinsics thus fails.
This fixes issue #2395 .
2021-12-25 19:51:42 +00:00
Erik Schultheis
f7a056bf04
Small fixes
...
This MR fixes a bunch of smaller issues, making the following changes:
* Template parameters in the documentation are documented with `\tparam` instead
of `\param`
* Superfluous semicolon warnings fixed
* Fixed the type of literals used to initialize float variables
2021-12-21 16:46:09 +00:00
Erik Schultheis
dee6428a71
fixed clang warnings about alignment change and floating point precision
2021-12-18 17:18:16 +00:00
Kolja Brix
d0b4b75fbb
Simplify logical_xor()
2021-12-16 20:20:47 +00:00
Erik Schultheis
e939c06b0e
Small speed-up in row-major sparse dense product
2021-12-15 18:46:25 +00:00
Erik Schultheis
c20e908ebc
turn some macros intro constexpr functions
2021-12-10 19:27:01 +00:00
Rasmus Munk Larsen
f04fd8b168
Make sure exp(-Inf) is zero for vectorized expressions. This fixes #2385 .
2021-12-08 17:57:23 +00:00
Erik Schultheis
cc11e240ac
Some further cleanup
2021-12-06 18:01:15 +00:00
Rasmus Munk Larsen
3ffefcb95c
Only include <atomic> if needed.
2021-12-02 23:55:25 +00:00
Erik Schultheis
d60f7fa518
Improved lapacke binding code for HouseholderQR and PartialPivLU
2021-12-02 00:10:58 +00:00
Erik Schultheis
ec2fd0f7ed
Require recent GCC and MSCV and removed EIGEN_HAS_CXX14
and some other feature test macros
2021-12-01 00:48:34 +00:00
Rasmus Munk Larsen
085c2fc5d5
Revert "Update SVD Module to allow specifying computation options with a...
2021-11-30 18:45:54 +00:00
Erik Schultheis
4dd126c630
fixed cholesky with 0 sized matrix (cf. #785 )
2021-11-30 17:17:41 +00:00
Rohit Santhanam
4d3e50036f
Fix for HIP compilation breakage in selfAdjoint and triangular view classes.
2021-11-30 14:00:59 +00:00
Erik Schultheis
63abb35dfd
SFINAE'ing away non-const overloads if selfAdjoint/triangular view is not referring to an lvalue
2021-11-29 22:51:26 +00:00
Jakub Gałecki
1b8dce564a
bugfix: issue #2375
2021-11-29 22:26:15 +00:00
Francesco Mazzoli
eb85b97339
Select AVX2 even if the data size is not a multiple of 8
2021-11-29 21:13:24 +00:00
Arthur
eef33946b7
Update SVD Module to allow specifying computation options with a template parameter. Resolves #2051
2021-11-29 20:50:46 +00:00
Erik Schultheis
f33a31b823
removed EIGEN_HAS_CXX11_* and redundant EIGEN_COMP_CXXVER checks
2021-11-29 19:18:57 +00:00
Rohit Santhanam
9d3ffb3fbf
Fix for HIP compilation failure in DenseBase.
2021-11-28 15:59:30 +00:00
David Tellenbach
08da52eb85
Remove DenseBase::nonZeros() which just calls DenseBase::size()
...
Fixes #2382 .
2021-11-27 14:31:00 +00:00
Ali Can Demiralp
96e537d6fd
Add EIGEN_DEVICE_FUNC to DenseBase::hasNaN() and DenseBase::allFinite().
2021-11-27 11:27:52 +00:00
Erik Schultheis
b8b6566f0f
Currently, the binding of LLT to Lapacke is done using a large macro. This factors out a large part of the functionality of the macro and implement them explicitly.
2021-11-25 16:11:25 +00:00
Erik Schultheis
ec4efbd696
remove EIGEN_HAS_CXX11
2021-11-24 20:08:49 +00:00
Rasmus Munk Larsen
5137a5157a
Make numeric_limits members constexpr as per the newer C++ standards.
...
Author: majnemer@google.com
2021-11-19 15:58:36 +00:00
Erik Schultheis
7e586635ba
don't use deprecated MappedSparseMatrix
2021-11-19 15:58:04 +00:00
Erik Schultheis
b0fb5417d3
Fixed Sparse-Sparse Product in case of mixed StorageIndex types
2021-11-18 18:33:31 +00:00
Pablo Speciale
d04edff570
Update Umeyama.h: src_var
is only used when with_scaling == true
. Therefore, the actual computation can be avoided when with_scaling == false
.
2021-11-16 17:58:22 +00:00
Rasmus Munk Larsen
2b9297196c
Update Transform.h to make transform_construct_from_matrix
and transform_take_affine_part
callable from device code. Fixes #2377 .
2021-11-16 00:58:30 +00:00
Erik Schultheis
ca9c848679
use consistent StorageIndex
types in SparseMatrix::Map
...
and `SparseMatrix::TransposedSparseMatrix`
2021-11-15 22:18:26 +00:00
Erik Schultheis
13954c4440
moved pruning code to SparseVector.h
2021-11-15 22:16:01 +00:00
Nathan Luehr
da79095923
Convert diag pragmas to nv_diag.
2021-11-15 03:42:42 +00:00
Erik Schultheis
532cc73f39
fix a typo
2021-11-13 13:11:06 +02:00
Gengxin Xie
5c642950a5
Bug Fix: correct the bug that won't define EIGEN_HAS_FP16_C
...
if the compiler isn't clang
2021-11-04 22:13:01 +00:00
Gilad
0d73440fb2
Documentation of Quaternion constructor from MatrixBase
2021-11-04 16:21:26 +00:00
Xinle Liu
478a1bdda6
Fix total deflation issue in BDCSVD, when & only when M is already diagonal.
2021-11-02 16:53:55 +00:00
Chip Kerchner
9cf34ee0ae
Invert rows and depth in non-vectorized portion of packing (PowerPC).
2021-10-28 21:59:41 +00:00
Ilya Tokar
e1cb6369b0
Add AVX vector path to float2half/half2float
...
Makes e. g. matrix multiplication 2x faster:
name old cpu/op new cpu/op delta
BM_convers 181ms ± 1% 62ms ± 9% -65.82% (p=0.016 n=4+5)
Tested on all possible input values (not adding tests, since they
take a long time).
2021-10-28 13:59:01 -04:00
Antonio Sanchez
03d4cbb307
Fix min/max nan-propagation for scalar "other".
...
Copied input type from `EIGEN_MAKE_CWISE_BINARY_OP`.
Fixes #2362 .
2021-10-28 09:28:29 -07:00
Antonio Sanchez
e559701981
Fix compile issue for gcc 4.8
2021-10-28 08:23:19 -07:00
Rohit Santhanam
48e40b22bf
Preliminary HIP bfloat16 GPU support.
2021-10-27 18:36:45 +00:00
Antonio Sanchez
40bbe8a4d0
Fix ZVector build.
...
Cross-compiled via `s390x-linux-gnu-g++`, run via qemu. This allows the
packetmath tests to pass.
2021-10-27 16:30:15 +00:00
Alex Druinsky
6bb6a6bf53
Vectorize fp16 tanh and logistic functions on Neon
...
Activates vectorization of the Eigen::half versions of the tanh and
logistic functions when they run on Neon. Both functions convert their
inputs to float before computing the output, and as a result of this
commit, the conversions and the computation in float are vectorized.
2021-10-27 16:09:16 +00:00