eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-10-22 13:01:06 +08:00

Author	SHA1	Message	Date
Erik Schultheis	9210e71fb3	ensure that eigen::internal::size is not found by ADL, rename to ssize and...	2022-01-05 00:46:09 +00:00
Lingzhu Xiang	7244a74ab0	Add bounds checking to Eigen serializer	2022-01-03 17:00:24 +08:00
Shiva Ghose	a4098ac676	Fix duplicate include guard ALTIVEC_H -> ZVECTOR_H Some some header guards were repeated between the `AltiVec` package and the `ZVector` packages. This could cause a problem if (for whatever reason) someone attempts to include headers for both architectures.	2021-12-31 08:43:24 +00:00
David Tellenbach	d705eb5f86	Revert "Select AVX2 even if the data size is not a multiple of 8" Tests are failing for AVX and NEON. This reverts commit eb85b97339e3791d533592bac20999b1b3ebca09.	2021-12-28 23:57:06 +01:00
Rasmus Munk Larsen	8eab7b6886	Improve exp<float>(): Don't flush denormal results +4% speedup. 1. Speed up exp(x) by reducing the polynomial approximant from degree 7 to degree 6. With exactly representable coefficients computed by the Sollya tool, this still gives a maximum relative error of 1 ulp, i.e. faithfully rounded, for arguments where exp(x) is a normalized float. This change results in a speedup of about 4% for AVX2. 2. Extend the range where exp(x) returns a non-zero result to from ~[-88;88] to ~[-104;88] i.e. return denormalized values for large negative arguments instead of zero. Compared to exp<double>(x) the denormalized results gradually decrease in accuracy down to 0.033 relative error for arguments around x = -104 where exp(x) is ~std::numeric<float>::denorm_min(). This is expected and acceptable.	2021-12-28 15:00:19 +00:00
David Tellenbach	c06c3e52a0	Include immintrin.h if F16C is available and vectorization is disabled If EIGEN_DONT_VECTORIZE is defined, immintrin.h is not included even if F16C is available. Trying to use F16C intrinsics thus fails. This fixes issue #2395.	2021-12-25 19:51:42 +00:00
Erik Schultheis	f7a056bf04	Small fixes This MR fixes a bunch of smaller issues, making the following changes: * Template parameters in the documentation are documented with `\tparam` instead of `\param` * Superfluous semicolon warnings fixed * Fixed the type of literals used to initialize float variables	2021-12-21 16:46:09 +00:00
Erik Schultheis	dee6428a71	fixed clang warnings about alignment change and floating point precision	2021-12-18 17:18:16 +00:00
Kolja Brix	d0b4b75fbb	Simplify logical_xor()	2021-12-16 20:20:47 +00:00
Erik Schultheis	c20e908ebc	turn some macros intro constexpr functions	2021-12-10 19:27:01 +00:00
Rasmus Munk Larsen	f04fd8b168	Make sure exp(-Inf) is zero for vectorized expressions. This fixes #2385 .	2021-12-08 17:57:23 +00:00
Erik Schultheis	cc11e240ac	Some further cleanup	2021-12-06 18:01:15 +00:00
Rasmus Munk Larsen	3ffefcb95c	Only include <atomic> if needed.	2021-12-02 23:55:25 +00:00
Erik Schultheis	ec2fd0f7ed	Require recent GCC and MSCV and removed `EIGEN_HAS_CXX14` and some other feature test macros	2021-12-01 00:48:34 +00:00
Rasmus Munk Larsen	085c2fc5d5	Revert "Update SVD Module to allow specifying computation options with a...	2021-11-30 18:45:54 +00:00
Rohit Santhanam	4d3e50036f	Fix for HIP compilation breakage in selfAdjoint and triangular view classes.	2021-11-30 14:00:59 +00:00
Erik Schultheis	63abb35dfd	SFINAE'ing away non-const overloads if selfAdjoint/triangular view is not referring to an lvalue	2021-11-29 22:51:26 +00:00
Jakub Gałecki	1b8dce564a	bugfix: issue #2375	2021-11-29 22:26:15 +00:00
Francesco Mazzoli	eb85b97339	Select AVX2 even if the data size is not a multiple of 8	2021-11-29 21:13:24 +00:00
Arthur	eef33946b7	Update SVD Module to allow specifying computation options with a template parameter. Resolves #2051	2021-11-29 20:50:46 +00:00
Erik Schultheis	f33a31b823	removed EIGEN_HAS_CXX11_* and redundant EIGEN_COMP_CXXVER checks	2021-11-29 19:18:57 +00:00
Rohit Santhanam	9d3ffb3fbf	Fix for HIP compilation failure in DenseBase.	2021-11-28 15:59:30 +00:00
David Tellenbach	08da52eb85	Remove DenseBase::nonZeros() which just calls DenseBase::size() Fixes #2382.	2021-11-27 14:31:00 +00:00
Ali Can Demiralp	96e537d6fd	Add EIGEN_DEVICE_FUNC to DenseBase::hasNaN() and DenseBase::allFinite().	2021-11-27 11:27:52 +00:00
Erik Schultheis	ec4efbd696	remove EIGEN_HAS_CXX11	2021-11-24 20:08:49 +00:00
Rasmus Munk Larsen	5137a5157a	Make numeric_limits members constexpr as per the newer C++ standards. Author: majnemer@google.com	2021-11-19 15:58:36 +00:00
Nathan Luehr	da79095923	Convert diag pragmas to nv_diag.	2021-11-15 03:42:42 +00:00
Erik Schultheis	532cc73f39	fix a typo	2021-11-13 13:11:06 +02:00
Gengxin Xie	5c642950a5	Bug Fix: correct the bug that won't define EIGEN_HAS_FP16_C if the compiler isn't clang	2021-11-04 22:13:01 +00:00
Chip Kerchner	9cf34ee0ae	Invert rows and depth in non-vectorized portion of packing (PowerPC).	2021-10-28 21:59:41 +00:00
Ilya Tokar	e1cb6369b0	Add AVX vector path to float2half/half2float Makes e. g. matrix multiplication 2x faster: name old cpu/op new cpu/op delta BM_convers 181ms ± 1% 62ms ± 9% -65.82% (p=0.016 n=4+5) Tested on all possible input values (not adding tests, since they take a long time).	2021-10-28 13:59:01 -04:00
Antonio Sanchez	e559701981	Fix compile issue for gcc 4.8	2021-10-28 08:23:19 -07:00
Rohit Santhanam	48e40b22bf	Preliminary HIP bfloat16 GPU support.	2021-10-27 18:36:45 +00:00
Antonio Sanchez	40bbe8a4d0	Fix ZVector build. Cross-compiled via `s390x-linux-gnu-g++`, run via qemu. This allows the packetmath tests to pass.	2021-10-27 16:30:15 +00:00
Alex Druinsky	6bb6a6bf53	Vectorize fp16 tanh and logistic functions on Neon Activates vectorization of the Eigen::half versions of the tanh and logistic functions when they run on Neon. Both functions convert their inputs to float before computing the output, and as a result of this commit, the conversions and the computation in float are vectorized.	2021-10-27 16:09:16 +00:00
Andreas Krebbel	8faafc3aaa	ZVector: Move alignas qualifier to come first We currently have plenty of type definitions with the alignment qualifier coming after the type. The compiler warns about ignoring them: int EIGEN_ALIGN16 ai[4]; Turn this into: EIGEN_ALIGN16 int ai[4];	2021-10-26 15:33:47 +02:00
Alex Druinsky	d0e3791b1a	Fix vectorized reductions for Eigen::half Fixes compiler errors in expressions that look like Eigen::Matrix<Eigen::half, 3, 1>::Random().maxCoeff() The error comes from the code that creates the initial value for vectorized reductions. The fix is to specify the scalar type of the reduction's initial value. The cahnge is necessary for Eigen::half because unlike other types, Eigen::half scalars cannot be implicitly created from integers.	2021-10-25 14:44:33 -07:00
Yann Billeter	6c3206152a	fix(CommaInitializer): pass dims at compile-time	2021-10-25 19:53:38 +00:00
Antonio Sanchez	0578feaabc	Remove const from visitor return type. This seems to interfere with `pload`/`ploadu`, since `pload<const Packet**>` are not defined. This should unbreak the arm/ppc builds.	2021-10-25 19:09:50 +00:00
Lennart Steffen	163f11e24a	Included note on inner stride for compile-time vectors. See https://gitlab.com/libeigen/eigen/-/issues/2355#note_711078126	2021-10-22 09:46:43 +00:00
Antonio Sanchez	b86e013321	Revert bit_cast to use memcpy for CUDA. To elide the memcpy, we need to first load the `src` value into registers by making a local copy. This avoids the need to resort to potential UB by using `reinterpret_cast`. This change doesn't seem to affect CPU (at least not with gcc/clang). With optimizations on, the copy is also elided.	2021-10-21 08:14:11 -07:00
Antonio Sanchez	45e67a6fda	Use reinterpret_cast on GPU for bit_cast. This seems to be the recommended approach for doing type punning in CUDA. See for example - https://stackoverflow.com/questions/47037104/cuda-type-punning-memcpy-vs-ub-union - https://developer.nvidia.com/blog/faster-parallel-reductions-kepler/ (the latter puns a double to an `int2`). The issue is that for CUDA, the `memcpy` is not elided, and ends up being an expensive operation. We already have similar `reintepret_cast`s across the Eigen codebase for GPU (as does TensorFlow).	2021-10-20 21:34:40 +00:00
Antonio Sanchez	95bb645e92	Fix MSVC+NVCC EIGEN_INHERIT_ASSIGNMENT_EQUAL_OPERATOR compilation. Looks like we need to update the `EIGEN_INHERIT_ASSIGNMENT_EQUAL_OPERATOR` for newer versions of MSVC as well when compiling with NVCC. Fixes build issues for VS 2017.	2021-10-20 19:38:14 +00:00
Antonio Sanchez	fd5f48e465	Fix tuple compilation for VS2017. VS2017 doesn't like deducing alias types, leading to a bunch of compile errors for functions involving the `tuple` alias. Replacing with `TupleImpl` seems to solve this, allowing the test to compile/pass.	2021-10-20 19:18:34 +00:00
Antonio Sanchez	d0d34524a1	Move CUDA/Complex.h to GPU/Complex.h, remove TensorReductionCuda.h The `Complex.h` file applies equally to HIP/CUDA, so placing under the generic `GPU` folder. The `TensorReductionCuda.h` has already been deprecated, now removing for the next Eigen version.	2021-10-20 12:00:19 -07:00
Rasmus Munk Larsen	f2c9c2d2f7	Vectorize Visitor.h.	2021-10-20 16:58:01 +00:00
Antonio Sanchez	f0f1d7938b	Disable testing of complex compound assignment operators for MSVC. MSVC does not support specializing compound assignments for `std::complex`, since it already specializes them (contrary to the standard). Trying to use one of these on device will currently lead to a duplicate definition error. This is still probably preferable to no error though. If we remove the definitions for MSVC, then it will compile, but the kernel will fail silently. The only proper solution would be to define our own custom `Complex` type.	2021-09-27 15:15:11 -07:00
Antonio Sanchez	21640612be	Disable more CUDA warnings. For cuda 9.2 and 11.4, they changed the numbers again. Fixes #2331.	2021-09-24 21:31:14 -07:00
Antonio Sanchez	e9e90892fe	Disable another device warning	2021-09-23 13:43:18 -07:00
Antonio Sanchez	86c0decc48	Disable more NVCC warnings. The 2979 warning is yet another "calling a __host__ function from a __host__ device__ function. Although we probably should eventually address these, they are flooding the logs. Most of these are harmless since we only call the original from the host. In cases where these are actually called from device, an error is generated instead anyways. The 2977 warning is a bit strange - although the warning suggests the `__device__` annotation is ignored, this doesn't actually seem to be the case. Without the `__device__` declarations, the kernel actually fails to run when attempting to construct such objects. Again, these warnings are flooding the logs, so disabling for now.	2021-09-23 10:52:39 -07:00

... 5 6 7 8 9 ...

4922 Commits