eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-07-31 09:12:02 +08:00

Author	SHA1	Message	Date
Antonio Sanchez	3580a38298	Use native _Float16 for AVX512FP16 and update vectorization. This allows us to do faster native scalar operations. Also updated half/quarter packets to use the native type if available. Benchmark improvement: ``` Comparing ./2910_without_float16 to ./2910_with_float16 Benchmark Time CPU Time Old Time New CPU Old CPU New ------------------------------------------------------------------------------------------------------------------------------------ BM_CalcMat<float>/10000/768/500 -0.0041 -0.0040 58276392 58039442 58273420 58039582 BM_CalcMat<_Float16>/10000/768/500 +0.0073 +0.0073 642506339 647214446 642481384 647188303 BM_CalcMat<Eigen::half>/10000/768/500 -0.3170 -0.3170 92511115 63182101 92506771 63179258 BM_CalcVec<float>/10000/768/500 +0.0022 +0.0022 5198157 5209469 5197913 5209334 BM_CalcVec<_Float16>/10000/768/500 +0.0025 +0.0026 10133324 10159111 10132641 10158507 BM_CalcVec<Eigen::half>/10000/768/500 -0.7760 -0.7760 45337937 10156952 45336532 10156389 OVERALL_GEOMEAN -0.2677 -0.2677 0 0 0 0 ``` Fixes #2910.	2025-03-18 10:46:32 -07:00
Antonio Sánchez	b1e74b1ccd	Fix all the doxygen warnings.	2025-02-01 00:00:31 +00:00
Pengzhou0810	e986838464	Add LoongArch64 architecture LSX support.(build/test )	2025-01-20 18:37:44 +00:00
Charles Schlosser	8ad4344ca7	optimize setConstant, setZero	2024-11-22 03:39:19 +00:00
Charles Schlosser	fb477b8be1	Better dot products	2024-09-10 21:02:31 +00:00
Rasmus Munk Larsen	1dbc7581ec	Include <thread> for std::this_thread::yield().	2024-08-14 17:44:14 +00:00
Tobias Wood	5a9f66fb35	Fix Thread tests	2024-05-24 16:50:14 +00:00
Charles Schlosser	99adca8b34	Incorporate Threadpool in Eigen Core	2024-05-20 23:42:51 +00:00
Charles Schlosser	e63d9f6ccb	Fix random again	2024-03-29 21:49:27 +00:00
Cheng Wang	2c6b61c006	Add half and quarter vector support to HVX architecture	2024-01-22 21:23:21 +00:00
Tobias Wood	f38e16c193	Apply clang-format	2023-11-29 11:12:48 +00:00
Rasmus Munk Larsen	76e8c04553	Generalize parallel GEMM implementation in Core to work with ThreadPool in addition to OpenMP.	2023-11-10 17:42:30 +00:00
cheng wang	66e8f38891	Add architecture definition files for Qualcomm Hexagon Vector Extension (HVX)	2023-08-01 17:47:57 +00:00
Charles Schlosser	59b3ef5409	Partially Vectorize Cast	2023-06-09 16:54:31 +00:00
Charles Schlosser	fbf7189bd5	Fix cuda compilation	2023-05-08 16:15:47 +00:00
Mehdi Goli	0623791930	[SYCL-2020] Enabling USM support for SYCL. SYCL-1.2.1 did not have support for USM.	2023-05-05 17:30:36 +00:00
Antonio Sánchez	2d0c6ad873	Revert "Vectorize cast" This reverts commit eb5ff1861a4783876564a1a79573c3b9ff566863	2023-04-26 18:03:36 +00:00
Charles Schlosser	eb5ff1861a	Vectorize cast	2023-04-26 02:50:13 +00:00
Chip Kerchner	3f3ce214e6	New BF16 pcast functions and move type casting to TypeCasting.h	2023-04-18 02:38:38 +00:00
Charles Schlosser	1ce8b25825	Vectorize any() / all()	2023-03-06 23:54:02 +00:00
Antonio Sánchez	3f7e775715	Add IWYU export pragmas to top-level headers.	2023-02-08 17:40:31 +00:00
Charles Schlosser	2a90653395	fix lapacke config	2023-02-03 16:40:08 +00:00
Antonio Sánchez	08c961e837	Add custom ODR-safe assert.	2023-01-20 17:38:13 +00:00
Sean McBride	d70b4864d9	issue #2581 : review and cleanup of compiler version checks	2023-01-17 18:58:34 +00:00
Mehdi Goli	b523120687	[SYCL-2020 Support] Enabling Intel DPCPP Compiler support to Eigen	2023-01-16 07:04:08 +00:00
Alexander Richardson	37de432907	Avoid using std::raise() for divide by zero	2022-12-14 20:06:16 +00:00
Rasmus Munk Larsen	7b2901e2aa	Add vectorized integer division for int32 with AVX512, AVX or SSE.	2022-09-21 00:27:23 +00:00
Thomas Gloor	ec9c7163a3	Feature/skew symmetric matrix3	2022-09-08 20:44:40 +00:00
Matthew Sterrett	7a3b667c43	Add support for AVX512-FP16 for vectorizing half precision math	2022-08-17 18:15:21 +00:00
Chip Kerchner	9e0afe0f02	Fix non-VSX PowerPC build	2022-08-08 18:18:17 +00:00
Alexander Richardson	b7668c0371	Avoid including <sstream> with EIGEN_NO_IO	2022-07-29 18:02:51 +00:00
aaraujom	d49ede4dc4	Add AVX512 s/dgemm optimizations for compute kernel (2nd try)	2022-05-28 02:00:21 +00:00
Antonio Sánchez	9b9496ad98	Revert "Add AVX512 optimizations for matrix multiply" This reverts commit 25db0b4a824ba9a092bbb514fbada51bf9d37a18	2022-05-13 18:50:33 +00:00
aaraujom	25db0b4a82	Add AVX512 optimizations for matrix multiply	2022-05-12 23:41:19 +00:00
Shi, Brian	fc1d888415	Remove AVX512VL dependency in trsm	2022-04-14 12:44:24 -07:00
Antonio Sánchez	07db964bde	Restrict new AVX512 trsm to AVX512VL, rename files for consistency.	2022-04-14 16:58:32 +00:00
b-shi	518fc321cb	AVX512 Optimizations for Triangular Solve	2022-03-16 18:04:50 +00:00
Erik Schultheis	cc11e240ac	Some further cleanup	2021-12-06 18:01:15 +00:00
Rasmus Munk Larsen	3ffefcb95c	Only include <atomic> if needed.	2021-12-02 23:55:25 +00:00
Erik Schultheis	ec2fd0f7ed	Require recent GCC and MSCV and removed `EIGEN_HAS_CXX14` and some other feature test macros	2021-12-01 00:48:34 +00:00
Erik Schultheis	ec4efbd696	remove EIGEN_HAS_CXX11	2021-11-24 20:08:49 +00:00
Alex Druinsky	6bb6a6bf53	Vectorize fp16 tanh and logistic functions on Neon Activates vectorization of the Eigen::half versions of the tanh and logistic functions when they run on Neon. Both functions convert their inputs to float before computing the output, and as a result of this commit, the conversions and the computation in float are vectorized.	2021-10-27 16:09:16 +00:00
Antonio Sanchez	d0d34524a1	Move CUDA/Complex.h to GPU/Complex.h, remove TensorReductionCuda.h The `Complex.h` file applies equally to HIP/CUDA, so placing under the generic `GPU` folder. The `TensorReductionCuda.h` has already been deprecated, now removing for the next Eigen version.	2021-10-20 12:00:19 -07:00
Rasmus Munk Larsen	d7d0bf832d	Issue an error in case of direct inclusion of internal headers.	2021-09-10 19:12:26 +00:00
Antonio Sanchez	fcd73b4884	Add a simple serialization mechanism. The `Serializer<T>` class implements a binary serialization that can write to (`serialize`) and read from (`deserialize`) a byte buffer. Also added convenience routines for serializing a list of arguments. This will mainly be for testing, specifically to transfer data to and from the GPU.	2021-09-08 09:38:59 -07:00
Adam Kallai	1415817d8d	win: include intrin header in Windows on ARM intrin header is needed for _BitScanReverse and _BitScanReverse64	2021-08-31 10:57:34 +02:00
Antonio Sanchez	d24f9f9b55	Fix NVCC+ICC issues. NVCC does not understand `__forceinline`, so we need to use `inline` when compiling for GPU. ICC specializes `std::complex` operators for `float` and `double` by default, which cannot be used on device and conflict with Eigen's workaround in CUDA/Complex.h. This can be prevented by defining `_OVERRIDE_COMPLEX_SPECIALIZATION_` before including `<complex>`. Added this define to the tests and to `Eigen/Core`, but this will not work if the user includes `<complex>` before `<Eigen/Core>`. ICC also seems to generate a duplicate `Map` symbol in `PlainObjectBase`: ``` error: "Map" has already been declared in the current scope static ConstMapType Map(const Scalar *data) ``` I tracked this down to `friend class Eigen::Map`. Putting the `friend` statements at the bottom of the class seems to resolve this issue. Fixes #2180	2021-03-15 18:42:04 +00:00
Antonio Sanchez	f85038b7f3	Fix excessive GEBP register spilling for 32-bit NEON. Clang does a poor job of optimizing the GEBP microkernel on 32-bit ARM, leading to excessive 16-byte register spills, slowing down basic f32 matrix multiplication by approx 50%. By specializing `gebp_traits`, we can eliminate the register spills. Volatile inline ASM both acts as a barrier to prevent reordering and enforces strict register use. In a simple f32 matrix multiply example, this modification reduces 16-byte spills from 109 instances to zero, leading to a 1.5x speed increase (search for `16-byte Spill` in the assembly in https://godbolt.org/z/chsPbE). This is a replacement of !379. See there for further discussion. Also moved `gebp_traits` specializations for NEON to `Eigen/src/Core/arch/NEON/GeneralBlockPanelKernel.h` to be alongside other NEON-specific code. Fixes #2138.	2021-02-03 09:01:48 -08:00
David Tellenbach	65e2169c45	Add support for Arm SVE This patch adds support for Arm's new vector extension SVE (Scalable Vector Extension). In contrast to other vector extensions that are supported by Eigen, SVE types are inherently sizeless. For the use in Eigen we fix their size at compile-time (note that this is not necessary in general, SVE is length agnostic). During compilation the flag `-msve-vector-bits=N` has to be set where `N` is a power of two in the range of `128`to `2048`, indicating the length of an SVE vector. Since SVE is rather young, we decided to disable it by default even if it would be available. A user has to enable it explicitly by defining `EIGEN_ARM64_USE_SVE`. This patch introduces the packet types `PacketXf` and `PacketXi` for packets of `float` and `int32_t` respectively. The size of these packets depends on the SVE vector length. E.g. if `-msve-vector-bits=512` is set, `PacketXf` will contain `512/32 = 16` elements. This MR is joint work with Miguel Tairum <miguel.tairum@arm.com>.	2021-01-21 21:11:57 +00:00
David Tellenbach	4091f6b25c	Drop EIGEN_USING_STD_MATH in favour of EIGEN_USING_STD	2020-10-09 02:05:05 +02:00

1 2 3 4 5 ...

471 Commits