eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-07-22 04:44:25 +08:00

Author	SHA1	Message	Date
Charles Schlosser	d0b490ee09	Optimize maxCoeff and friends	2025-06-06 14:55:49 +00:00
Charles Schlosser	4fdf87bbf5	clean up intel packet reductions	2025-05-30 19:18:07 +00:00
Antonio Sánchez	70f2aead9a	Use native _Float16 for AVX512FP16 and update vectorization.	2025-03-19 19:55:26 +00:00
Antonio Sánchez	b1e74b1ccd	Fix all the doxygen warnings.	2025-02-01 00:00:31 +00:00
Pengzhou0810	e986838464	Add LoongArch64 architecture LSX support.(build/test )	2025-01-20 18:37:44 +00:00
Charles Schlosser	8ad4344ca7	optimize setConstant, setZero	2024-11-22 03:39:19 +00:00
Charles Schlosser	fb477b8be1	Better dot products	2024-09-10 21:02:31 +00:00
Rasmus Munk Larsen	1dbc7581ec	Include <thread> for std::this_thread::yield().	2024-08-14 17:44:14 +00:00
Tobias Wood	5a9f66fb35	Fix Thread tests	2024-05-24 16:50:14 +00:00
Charles Schlosser	99adca8b34	Incorporate Threadpool in Eigen Core	2024-05-20 23:42:51 +00:00
Charles Schlosser	e63d9f6ccb	Fix random again	2024-03-29 21:49:27 +00:00
Cheng Wang	2c6b61c006	Add half and quarter vector support to HVX architecture	2024-01-22 21:23:21 +00:00
Tobias Wood	f38e16c193	Apply clang-format	2023-11-29 11:12:48 +00:00
Rasmus Munk Larsen	76e8c04553	Generalize parallel GEMM implementation in Core to work with ThreadPool in addition to OpenMP.	2023-11-10 17:42:30 +00:00
cheng wang	66e8f38891	Add architecture definition files for Qualcomm Hexagon Vector Extension (HVX)	2023-08-01 17:47:57 +00:00
Charles Schlosser	59b3ef5409	Partially Vectorize Cast	2023-06-09 16:54:31 +00:00
Charles Schlosser	fbf7189bd5	Fix cuda compilation	2023-05-08 16:15:47 +00:00
Mehdi Goli	0623791930	[SYCL-2020] Enabling USM support for SYCL. SYCL-1.2.1 did not have support for USM.	2023-05-05 17:30:36 +00:00
Antonio Sánchez	2d0c6ad873	Revert "Vectorize cast" This reverts commit eb5ff1861a4783876564a1a79573c3b9ff566863	2023-04-26 18:03:36 +00:00
Charles Schlosser	eb5ff1861a	Vectorize cast	2023-04-26 02:50:13 +00:00
Chip Kerchner	3f3ce214e6	New BF16 pcast functions and move type casting to TypeCasting.h	2023-04-18 02:38:38 +00:00
Charles Schlosser	1ce8b25825	Vectorize any() / all()	2023-03-06 23:54:02 +00:00
Antonio Sánchez	3f7e775715	Add IWYU export pragmas to top-level headers.	2023-02-08 17:40:31 +00:00
Charles Schlosser	2a90653395	fix lapacke config	2023-02-03 16:40:08 +00:00
Antonio Sánchez	08c961e837	Add custom ODR-safe assert.	2023-01-20 17:38:13 +00:00
Sean McBride	d70b4864d9	issue #2581 : review and cleanup of compiler version checks	2023-01-17 18:58:34 +00:00
Mehdi Goli	b523120687	[SYCL-2020 Support] Enabling Intel DPCPP Compiler support to Eigen	2023-01-16 07:04:08 +00:00
Alexander Richardson	37de432907	Avoid using std::raise() for divide by zero	2022-12-14 20:06:16 +00:00
Rasmus Munk Larsen	7b2901e2aa	Add vectorized integer division for int32 with AVX512, AVX or SSE.	2022-09-21 00:27:23 +00:00
Thomas Gloor	ec9c7163a3	Feature/skew symmetric matrix3	2022-09-08 20:44:40 +00:00
Matthew Sterrett	7a3b667c43	Add support for AVX512-FP16 for vectorizing half precision math	2022-08-17 18:15:21 +00:00
Chip Kerchner	9e0afe0f02	Fix non-VSX PowerPC build	2022-08-08 18:18:17 +00:00
Alexander Richardson	b7668c0371	Avoid including <sstream> with EIGEN_NO_IO	2022-07-29 18:02:51 +00:00
aaraujom	d49ede4dc4	Add AVX512 s/dgemm optimizations for compute kernel (2nd try)	2022-05-28 02:00:21 +00:00
Antonio Sánchez	9b9496ad98	Revert "Add AVX512 optimizations for matrix multiply" This reverts commit 25db0b4a824ba9a092bbb514fbada51bf9d37a18	2022-05-13 18:50:33 +00:00
aaraujom	25db0b4a82	Add AVX512 optimizations for matrix multiply	2022-05-12 23:41:19 +00:00
Shi, Brian	fc1d888415	Remove AVX512VL dependency in trsm	2022-04-14 12:44:24 -07:00
Antonio Sánchez	07db964bde	Restrict new AVX512 trsm to AVX512VL, rename files for consistency.	2022-04-14 16:58:32 +00:00
b-shi	518fc321cb	AVX512 Optimizations for Triangular Solve	2022-03-16 18:04:50 +00:00
Erik Schultheis	cc11e240ac	Some further cleanup	2021-12-06 18:01:15 +00:00
Rasmus Munk Larsen	3ffefcb95c	Only include <atomic> if needed.	2021-12-02 23:55:25 +00:00
Erik Schultheis	ec2fd0f7ed	Require recent GCC and MSCV and removed `EIGEN_HAS_CXX14` and some other feature test macros	2021-12-01 00:48:34 +00:00
Erik Schultheis	ec4efbd696	remove EIGEN_HAS_CXX11	2021-11-24 20:08:49 +00:00
Alex Druinsky	6bb6a6bf53	Vectorize fp16 tanh and logistic functions on Neon Activates vectorization of the Eigen::half versions of the tanh and logistic functions when they run on Neon. Both functions convert their inputs to float before computing the output, and as a result of this commit, the conversions and the computation in float are vectorized.	2021-10-27 16:09:16 +00:00
Antonio Sanchez	d0d34524a1	Move CUDA/Complex.h to GPU/Complex.h, remove TensorReductionCuda.h The `Complex.h` file applies equally to HIP/CUDA, so placing under the generic `GPU` folder. The `TensorReductionCuda.h` has already been deprecated, now removing for the next Eigen version.	2021-10-20 12:00:19 -07:00
Rasmus Munk Larsen	d7d0bf832d	Issue an error in case of direct inclusion of internal headers.	2021-09-10 19:12:26 +00:00
Antonio Sanchez	fcd73b4884	Add a simple serialization mechanism. The `Serializer<T>` class implements a binary serialization that can write to (`serialize`) and read from (`deserialize`) a byte buffer. Also added convenience routines for serializing a list of arguments. This will mainly be for testing, specifically to transfer data to and from the GPU.	2021-09-08 09:38:59 -07:00
Adam Kallai	1415817d8d	win: include intrin header in Windows on ARM intrin header is needed for _BitScanReverse and _BitScanReverse64	2021-08-31 10:57:34 +02:00
Antonio Sanchez	d24f9f9b55	Fix NVCC+ICC issues. NVCC does not understand `__forceinline`, so we need to use `inline` when compiling for GPU. ICC specializes `std::complex` operators for `float` and `double` by default, which cannot be used on device and conflict with Eigen's workaround in CUDA/Complex.h. This can be prevented by defining `_OVERRIDE_COMPLEX_SPECIALIZATION_` before including `<complex>`. Added this define to the tests and to `Eigen/Core`, but this will not work if the user includes `<complex>` before `<Eigen/Core>`. ICC also seems to generate a duplicate `Map` symbol in `PlainObjectBase`: ``` error: "Map" has already been declared in the current scope static ConstMapType Map(const Scalar *data) ``` I tracked this down to `friend class Eigen::Map`. Putting the `friend` statements at the bottom of the class seems to resolve this issue. Fixes #2180	2021-03-15 18:42:04 +00:00
Antonio Sanchez	f85038b7f3	Fix excessive GEBP register spilling for 32-bit NEON. Clang does a poor job of optimizing the GEBP microkernel on 32-bit ARM, leading to excessive 16-byte register spills, slowing down basic f32 matrix multiplication by approx 50%. By specializing `gebp_traits`, we can eliminate the register spills. Volatile inline ASM both acts as a barrier to prevent reordering and enforces strict register use. In a simple f32 matrix multiply example, this modification reduces 16-byte spills from 109 instances to zero, leading to a 1.5x speed increase (search for `16-byte Spill` in the assembly in https://godbolt.org/z/chsPbE). This is a replacement of !379. See there for further discussion. Also moved `gebp_traits` specializations for NEON to `Eigen/src/Core/arch/NEON/GeneralBlockPanelKernel.h` to be alongside other NEON-specific code. Fixes #2138.	2021-02-03 09:01:48 -08:00

1 2 3 4 5 ...

473 Commits