eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-10-18 19:11:30 +08:00

Author	SHA1	Message	Date
aaraujom	d49ede4dc4	Add AVX512 s/dgemm optimizations for compute kernel (2nd try)	2022-05-28 02:00:21 +00:00
Antonio Sánchez	9b9496ad98	Revert "Add AVX512 optimizations for matrix multiply" This reverts commit 25db0b4a824ba9a092bbb514fbada51bf9d37a18	2022-05-13 18:50:33 +00:00
aaraujom	25db0b4a82	Add AVX512 optimizations for matrix multiply	2022-05-12 23:41:19 +00:00
Shi, Brian	fc1d888415	Remove AVX512VL dependency in trsm	2022-04-14 12:44:24 -07:00
Antonio Sánchez	07db964bde	Restrict new AVX512 trsm to AVX512VL, rename files for consistency.	2022-04-14 16:58:32 +00:00
b-shi	518fc321cb	AVX512 Optimizations for Triangular Solve	2022-03-16 18:04:50 +00:00
Erik Schultheis	cc11e240ac	Some further cleanup	2021-12-06 18:01:15 +00:00
Rasmus Munk Larsen	3ffefcb95c	Only include <atomic> if needed.	2021-12-02 23:55:25 +00:00
Erik Schultheis	ec2fd0f7ed	Require recent GCC and MSCV and removed `EIGEN_HAS_CXX14` and some other feature test macros	2021-12-01 00:48:34 +00:00
Erik Schultheis	ec4efbd696	remove EIGEN_HAS_CXX11	2021-11-24 20:08:49 +00:00
Alex Druinsky	6bb6a6bf53	Vectorize fp16 tanh and logistic functions on Neon Activates vectorization of the Eigen::half versions of the tanh and logistic functions when they run on Neon. Both functions convert their inputs to float before computing the output, and as a result of this commit, the conversions and the computation in float are vectorized.	2021-10-27 16:09:16 +00:00
Antonio Sanchez	d0d34524a1	Move CUDA/Complex.h to GPU/Complex.h, remove TensorReductionCuda.h The `Complex.h` file applies equally to HIP/CUDA, so placing under the generic `GPU` folder. The `TensorReductionCuda.h` has already been deprecated, now removing for the next Eigen version.	2021-10-20 12:00:19 -07:00
Rasmus Munk Larsen	d7d0bf832d	Issue an error in case of direct inclusion of internal headers.	2021-09-10 19:12:26 +00:00
Antonio Sanchez	fcd73b4884	Add a simple serialization mechanism. The `Serializer<T>` class implements a binary serialization that can write to (`serialize`) and read from (`deserialize`) a byte buffer. Also added convenience routines for serializing a list of arguments. This will mainly be for testing, specifically to transfer data to and from the GPU.	2021-09-08 09:38:59 -07:00
Adam Kallai	1415817d8d	win: include intrin header in Windows on ARM intrin header is needed for _BitScanReverse and _BitScanReverse64	2021-08-31 10:57:34 +02:00
Antonio Sanchez	d24f9f9b55	Fix NVCC+ICC issues. NVCC does not understand `__forceinline`, so we need to use `inline` when compiling for GPU. ICC specializes `std::complex` operators for `float` and `double` by default, which cannot be used on device and conflict with Eigen's workaround in CUDA/Complex.h. This can be prevented by defining `_OVERRIDE_COMPLEX_SPECIALIZATION_` before including `<complex>`. Added this define to the tests and to `Eigen/Core`, but this will not work if the user includes `<complex>` before `<Eigen/Core>`. ICC also seems to generate a duplicate `Map` symbol in `PlainObjectBase`: ``` error: "Map" has already been declared in the current scope static ConstMapType Map(const Scalar *data) ``` I tracked this down to `friend class Eigen::Map`. Putting the `friend` statements at the bottom of the class seems to resolve this issue. Fixes #2180	2021-03-15 18:42:04 +00:00
Antonio Sanchez	f85038b7f3	Fix excessive GEBP register spilling for 32-bit NEON. Clang does a poor job of optimizing the GEBP microkernel on 32-bit ARM, leading to excessive 16-byte register spills, slowing down basic f32 matrix multiplication by approx 50%. By specializing `gebp_traits`, we can eliminate the register spills. Volatile inline ASM both acts as a barrier to prevent reordering and enforces strict register use. In a simple f32 matrix multiply example, this modification reduces 16-byte spills from 109 instances to zero, leading to a 1.5x speed increase (search for `16-byte Spill` in the assembly in https://godbolt.org/z/chsPbE). This is a replacement of !379. See there for further discussion. Also moved `gebp_traits` specializations for NEON to `Eigen/src/Core/arch/NEON/GeneralBlockPanelKernel.h` to be alongside other NEON-specific code. Fixes #2138.	2021-02-03 09:01:48 -08:00
David Tellenbach	65e2169c45	Add support for Arm SVE This patch adds support for Arm's new vector extension SVE (Scalable Vector Extension). In contrast to other vector extensions that are supported by Eigen, SVE types are inherently sizeless. For the use in Eigen we fix their size at compile-time (note that this is not necessary in general, SVE is length agnostic). During compilation the flag `-msve-vector-bits=N` has to be set where `N` is a power of two in the range of `128`to `2048`, indicating the length of an SVE vector. Since SVE is rather young, we decided to disable it by default even if it would be available. A user has to enable it explicitly by defining `EIGEN_ARM64_USE_SVE`. This patch introduces the packet types `PacketXf` and `PacketXi` for packets of `float` and `int32_t` respectively. The size of these packets depends on the SVE vector length. E.g. if `-msve-vector-bits=512` is set, `PacketXf` will contain `512/32 = 16` elements. This MR is joint work with Miguel Tairum <miguel.tairum@arm.com>.	2021-01-21 21:11:57 +00:00
David Tellenbach	4091f6b25c	Drop EIGEN_USING_STD_MATH in favour of EIGEN_USING_STD	2020-10-09 02:05:05 +02:00
Everton Constantino	6fe88a3c9d	MatrixProuct enhancements: - Changes to Altivec/MatrixProduct Adapting code to gcc 10. Generic code style and performance enhancements. Adding PanelMode support. Adding stride/offset support. Enabling float64, std::complex and std::complex. Fixing lack of symm_pack. Enabling mixedtypes. - Adding std::complex tests to blasutil. - Adding an implementation of storePacketBlock when Incr!= 1.	2020-09-02 18:21:36 -03:00
Teng Lu	386d809bde	Support BFloat16 in Eigen	2020-06-20 19:16:24 +00:00
Sebastien Boisvert	463ec86648	Fix #1757 : remove the word 'suicide'	2020-06-11 00:56:54 +00:00
Gael Guennebaud	99b7f7cb9c	Fix #556 : warnings with mingw	2020-05-31 00:39:44 +02:00
Yong Tang	8e1df5b082	Fix incorrect usage of `if defined(EIGEN_ARCH_PPC)` => `if EIGEN_ARCH_PPC` This PR tries to fix an incorrect usage of `if defined(EIGEN_ARCH_PPC)` in `Eigen/Core` header. In `Eigen/src/Core/util/Macros.h`, EIGEN_ARCH_PPC was explicitly defined as either 0 or 1. As a result `if defined(EIGEN_ARCH_PPC)` will always be true. This causes issues when building on non PPC platform and `MatrixProduct.h` is not available. This fix changes `if defined(EIGEN_ARCH_PPC)` => `if EIGEN_ARCH_PPC`. Signed-off-by: Yong Tang <yong.tang.github@outlook.com>	2020-05-28 05:53:44 -07:00
Everton Constantino	8a7f360ec3	- Vectorizing MMA packing. - Optimizing MMA kernel. - Adding PacketBlock store to blas_data_mapper.	2020-05-19 19:24:11 +00:00
mehdi-goli	d3e81db6c5	Eigen moved the `scanLauncehr` function inside the internal namespace. This commit applies the following changes: - Moving the `scamLauncher` specialization inside internal namespace to fix compiler crash on TensorScan for SYCL backend. - Replacing `SYCL/sycl.hpp` to `CL/sycl.hpp` in order to follow SYCL 1.2.1 standard. - minor fixes: commenting out an unused variable to avoid compiler warnings.	2020-05-11 16:10:33 +01:00
Tobias Bosch	f0ce88cff7	Include <sstream> explicitly, and don't rely on the implicit include via <complex>. This implicit dependency does no longer exist in a recent llbm release (sha 78be61871704).	2020-02-24 23:09:36 +00:00
Gael Guennebaud	17226100c5	Fix a circular dependency regarding pshift* functions and GenericPacketMathFunctions. Another solution would have been to make pshift* fully generic template functions with partial specialization which is always a mess in c++03.	2019-09-06 09:26:04 +02:00
Gael Guennebaud	55b63d4ea3	Fix compilation without vector engine available (e.g., x86 with SSE disabled): -> ppolevl is required by ndtri even for the scalar path	2019-09-05 18:16:46 +02:00
Rasmus Munk Larsen	f6c51d9209	Fix missing header inclusion and colliding definitions for half type casting, which broke build with -march=native on Haswell/Skylake.	2019-08-30 14:03:29 -07:00
Rasmus Munk Larsen	b021cdea6d	Clean up float16 a.k.a. Eigen::half support in Eigen. Move the definition of half to Core/arch/Default and move arch-specific packet ops to their respective sub-directories.	2019-08-27 11:30:31 -07:00
Mehdi Goli	16a56b2ddd	[SYCL] This PR adds the minimum modifications to Eigen core required to run Eigen unsupported modules on devices supporting SYCL. * Adding SYCL memory model * Enabling/Disabling SYCL backend in Core * Supporting Vectorization	2019-06-27 12:25:09 +01:00
Gael Guennebaud	c53eececb0	Implement AVX512 vectorization of std::complex<float/double>	2018-12-06 15:58:06 +01:00
Gael Guennebaud	3fba59ea59	temporarily re-disable SSE/AVX vectorization of complex<> on AVX512 -> this needs to be fixed though!	2018-12-06 00:13:26 +01:00
Gael Guennebaud	f91500d303	Fix pandnot order in AVX512	2018-11-30 14:32:06 +01:00
Gael Guennebaud	aa6097395b	Add missing SSE/AVX type-casting in AVX512 mode	2018-11-28 16:09:08 +01:00
Gael Guennebaud	b131a4db24	bug #1631 : fix compilation with ARM NEON and clang, and cleanup the weird pshiftright_and_cast and pcast_and_shiftleft functions.	2018-11-27 23:45:00 +01:00
Gael Guennebaud	fa7fd61eda	Unify SSE/AVX psin functions. It is based on the SSE version which is much more accurate, though very slightly slower. This changeset also includes the following required changes: - add packet-float to packet-int type traits - add packet float<->int reinterpret casts - add faster pselect for AVX based on blendv	2018-11-27 22:41:51 +01:00
Christian von Schultz	4a40b3785d	Collapsed revision (based on pull request PR-325) * Support compiling without IO streams Add the preprocessor definition EIGEN_NO_IO which, if defined, disables all use of the IO streams part of the standard library.	2018-10-22 21:14:40 +02:00
Gael Guennebaud	1dd1f8e454	bug #65 : add vectorization of partial reductions along the outer-dimension, for instance: colmajor_mat.rowwise().mean()	2018-10-09 23:36:50 +02:00
Gael Guennebaud	b0c66adfb1	bug #231 : initial implementation of STL iterators for dense expressions	2018-10-01 23:21:37 +02:00
Gael Guennebaud	a488d59787	merge with default Eigen	2018-09-21 11:51:49 +02:00
Mehdi Goli	01358300d5	Creating separate SYCL required PR for uncontroversial files.	2018-08-03 16:59:15 +01:00
Rasmus Munk Larsen	2ebcb911b2	Add pcast packet op for NEON.	2018-07-26 14:28:48 -07:00
Alexey Frunze	1f523e7304	Add MIPS changes missing from previous merge.	2018-07-18 12:27:50 -07:00
Gael Guennebaud	308725c3c9	More clearly disable the inclusion of src/Core/arch/CUDA/Complex.h without CUDA	2018-07-18 13:51:36 +02:00
Gael Guennebaud	86d9c0255c	Forward declaring std::array does not work with all std libs, so let's just include <array>	2018-07-13 13:06:44 +02:00
Gael Guennebaud	006e18e52b	Cleanup the mess in Eigen/Core by moving CUDA/HIP stuff at more appropriate places (Macros.h), and alignment/vectorization logic is now in util/ConfigureVectorization.h	2018-07-12 16:57:41 +02:00
Deven Desai	876f392c39	Updates corresponding to the latest round of PR feedback The major changes are 1. Moving CUDA/PacketMath.h to GPU/PacketMath.h 2. Moving CUDA/MathFunctions.h to GPU/MathFunction.h 3. Moving CUDA/CudaSpecialFunctions.h to GPU/GpuSpecialFunctions.h The above three changes effectively enable the Eigen "Packet" layer for the HIP platform 4. Merging the "hip_basic" and "cuda_basic" unit tests into one ("gpu_basic") 5. Updating the "EIGEN_DEVICE_FUNC" marking in some places The change has been tested on the HIP and CUDA platforms.	2018-07-11 10:39:54 -04:00
Deven Desai	38807a2575	merging updates from upstream	2018-07-11 09:17:33 -04:00

1 2 3 4 5 ...

440 Commits