473 Commits

Author SHA1 Message Date
Charles Schlosser
d0b490ee09 Optimize maxCoeff and friends 2025-06-06 14:55:49 +00:00
Charles Schlosser
4fdf87bbf5 clean up intel packet reductions 2025-05-30 19:18:07 +00:00
Antonio Sánchez
70f2aead9a Use native _Float16 for AVX512FP16 and update vectorization. 2025-03-19 19:55:26 +00:00
Antonio Sánchez
b1e74b1ccd Fix all the doxygen warnings. 2025-02-01 00:00:31 +00:00
Pengzhou0810
e986838464 Add LoongArch64 architecture LSX support.(build/test ) 2025-01-20 18:37:44 +00:00
Charles Schlosser
8ad4344ca7 optimize setConstant, setZero 2024-11-22 03:39:19 +00:00
Charles Schlosser
fb477b8be1 Better dot products 2024-09-10 21:02:31 +00:00
Rasmus Munk Larsen
1dbc7581ec Include <thread> for std::this_thread::yield(). 2024-08-14 17:44:14 +00:00
Tobias Wood
5a9f66fb35 Fix Thread tests 2024-05-24 16:50:14 +00:00
Charles Schlosser
99adca8b34 Incorporate Threadpool in Eigen Core 2024-05-20 23:42:51 +00:00
Charles Schlosser
e63d9f6ccb Fix random again 2024-03-29 21:49:27 +00:00
Cheng Wang
2c6b61c006 Add half and quarter vector support to HVX architecture 2024-01-22 21:23:21 +00:00
Tobias Wood
f38e16c193 Apply clang-format 2023-11-29 11:12:48 +00:00
Rasmus Munk Larsen
76e8c04553 Generalize parallel GEMM implementation in Core to work with ThreadPool in addition to OpenMP. 2023-11-10 17:42:30 +00:00
cheng wang
66e8f38891 Add architecture definition files for Qualcomm Hexagon Vector Extension (HVX) 2023-08-01 17:47:57 +00:00
Charles Schlosser
59b3ef5409 Partially Vectorize Cast 2023-06-09 16:54:31 +00:00
Charles Schlosser
fbf7189bd5 Fix cuda compilation 2023-05-08 16:15:47 +00:00
Mehdi Goli
0623791930 [SYCL-2020] Enabling USM support for SYCL. SYCL-1.2.1 did not have support for USM. 2023-05-05 17:30:36 +00:00
Antonio Sánchez
2d0c6ad873 Revert "Vectorize cast"
This reverts commit eb5ff1861a4783876564a1a79573c3b9ff566863
2023-04-26 18:03:36 +00:00
Charles Schlosser
eb5ff1861a Vectorize cast 2023-04-26 02:50:13 +00:00
Chip Kerchner
3f3ce214e6 New BF16 pcast functions and move type casting to TypeCasting.h 2023-04-18 02:38:38 +00:00
Charles Schlosser
1ce8b25825 Vectorize any() / all() 2023-03-06 23:54:02 +00:00
Antonio Sánchez
3f7e775715 Add IWYU export pragmas to top-level headers. 2023-02-08 17:40:31 +00:00
Charles Schlosser
2a90653395 fix lapacke config 2023-02-03 16:40:08 +00:00
Antonio Sánchez
08c961e837 Add custom ODR-safe assert. 2023-01-20 17:38:13 +00:00
Sean McBride
d70b4864d9 issue #2581: review and cleanup of compiler version checks 2023-01-17 18:58:34 +00:00
Mehdi Goli
b523120687 [SYCL-2020 Support] Enabling Intel DPCPP Compiler support to Eigen 2023-01-16 07:04:08 +00:00
Alexander Richardson
37de432907 Avoid using std::raise() for divide by zero 2022-12-14 20:06:16 +00:00
Rasmus Munk Larsen
7b2901e2aa Add vectorized integer division for int32 with AVX512, AVX or SSE. 2022-09-21 00:27:23 +00:00
Thomas Gloor
ec9c7163a3 Feature/skew symmetric matrix3 2022-09-08 20:44:40 +00:00
Matthew Sterrett
7a3b667c43 Add support for AVX512-FP16 for vectorizing half precision math 2022-08-17 18:15:21 +00:00
Chip Kerchner
9e0afe0f02 Fix non-VSX PowerPC build 2022-08-08 18:18:17 +00:00
Alexander Richardson
b7668c0371 Avoid including <sstream> with EIGEN_NO_IO 2022-07-29 18:02:51 +00:00
aaraujom
d49ede4dc4 Add AVX512 s/dgemm optimizations for compute kernel (2nd try) 2022-05-28 02:00:21 +00:00
Antonio Sánchez
9b9496ad98 Revert "Add AVX512 optimizations for matrix multiply"
This reverts commit 25db0b4a824ba9a092bbb514fbada51bf9d37a18
2022-05-13 18:50:33 +00:00
aaraujom
25db0b4a82 Add AVX512 optimizations for matrix multiply 2022-05-12 23:41:19 +00:00
Shi, Brian
fc1d888415 Remove AVX512VL dependency in trsm 2022-04-14 12:44:24 -07:00
Antonio Sánchez
07db964bde Restrict new AVX512 trsm to AVX512VL, rename files for consistency. 2022-04-14 16:58:32 +00:00
b-shi
518fc321cb AVX512 Optimizations for Triangular Solve 2022-03-16 18:04:50 +00:00
Erik Schultheis
cc11e240ac Some further cleanup 2021-12-06 18:01:15 +00:00
Rasmus Munk Larsen
3ffefcb95c Only include <atomic> if needed. 2021-12-02 23:55:25 +00:00
Erik Schultheis
ec2fd0f7ed Require recent GCC and MSCV and removed EIGEN_HAS_CXX14 and some other feature test macros 2021-12-01 00:48:34 +00:00
Erik Schultheis
ec4efbd696 remove EIGEN_HAS_CXX11 2021-11-24 20:08:49 +00:00
Alex Druinsky
6bb6a6bf53 Vectorize fp16 tanh and logistic functions on Neon
Activates vectorization of the Eigen::half versions of the tanh and
logistic functions when they run on Neon. Both functions convert their
inputs to float before computing the output, and as a result of this
commit, the conversions and the computation in float are vectorized.
2021-10-27 16:09:16 +00:00
Antonio Sanchez
d0d34524a1 Move CUDA/Complex.h to GPU/Complex.h, remove TensorReductionCuda.h
The `Complex.h` file applies equally to HIP/CUDA, so placing under the
generic `GPU` folder.

The `TensorReductionCuda.h` has already been deprecated, now removing
for the next Eigen version.
2021-10-20 12:00:19 -07:00
Rasmus Munk Larsen
d7d0bf832d Issue an error in case of direct inclusion of internal headers. 2021-09-10 19:12:26 +00:00
Antonio Sanchez
fcd73b4884 Add a simple serialization mechanism.
The `Serializer<T>` class implements a binary serialization that
can write to (`serialize`) and read from (`deserialize`) a byte
buffer.  Also added convenience routines for serializing
a list of arguments.

This will mainly be for testing, specifically to transfer data to
and from the GPU.
2021-09-08 09:38:59 -07:00
Adam Kallai
1415817d8d win: include intrin header in Windows on ARM
intrin header is needed for _BitScanReverse and
_BitScanReverse64
2021-08-31 10:57:34 +02:00
Antonio Sanchez
d24f9f9b55 Fix NVCC+ICC issues.
NVCC does not understand `__forceinline`, so we need to use `inline`
when compiling for GPU.

ICC specializes `std::complex` operators for `float` and `double`
by default, which cannot be used on device and conflict with Eigen's
workaround in CUDA/Complex.h.  This can be prevented by defining
`_OVERRIDE_COMPLEX_SPECIALIZATION_` before including `<complex>`.
Added this define to the tests and to `Eigen/Core`, but this will
not work if the user includes `<complex>` before `<Eigen/Core>`.

ICC also seems to generate a duplicate `Map` symbol in
`PlainObjectBase`:
```
error: "Map" has already been declared in the current scope
  static ConstMapType Map(const Scalar *data)

```
I tracked this down to `friend class Eigen::Map`.  Putting the `friend`
statements at the bottom of the class seems to resolve this issue.

Fixes #2180
2021-03-15 18:42:04 +00:00
Antonio Sanchez
f85038b7f3 Fix excessive GEBP register spilling for 32-bit NEON.
Clang does a poor job of optimizing the GEBP microkernel on 32-bit ARM,
leading to excessive 16-byte register spills, slowing down basic f32
matrix multiplication by approx 50%.

By specializing `gebp_traits`, we can eliminate the register spills.
Volatile inline ASM both acts as a barrier to prevent reordering and
enforces strict register use. In a simple f32 matrix multiply example,
this modification reduces 16-byte spills from 109 instances to zero,
leading to a 1.5x speed increase (search for `16-byte Spill` in the
assembly in https://godbolt.org/z/chsPbE).

This is a replacement of !379.  See there for further discussion.

Also moved `gebp_traits` specializations for NEON to
`Eigen/src/Core/arch/NEON/GeneralBlockPanelKernel.h` to be alongside
other NEON-specific code.

Fixes #2138.
2021-02-03 09:01:48 -08:00