Frédéric BRIOL
2a3465102a
Refactor code to use constexpr for data() functions.
2024-09-23 16:43:53 +00:00
Tobias Wood
f38e16c193
Apply clang-format
2023-11-29 11:12:48 +00:00
Antonio Sánchez
6e4d5d4832
Add IWYU private pragmas to internal headers.
2023-08-21 16:25:22 +00:00
Chip Kerchner
7769eb1b2e
Fix problems with recent changes and Tensorflow in Power
2023-07-26 16:24:58 +00:00
Colin Broderick
8f9b8e3630
Replaced all instances of internal::(U)IntPtr with std::(u)intptr_t. Remove ICC workaround.
2023-03-21 16:50:23 +00:00
Antonio Sánchez
17ae83a966
Fix bugs exposed by enabling GPU asserts.
2023-01-27 21:43:00 +00:00
Antonio Sánchez
2e61c0c6b4
Add missing EIGEN_DEVICE_FUNC in a few places when called by asserts.
2023-01-15 02:06:17 +00:00
Chip Kerchner
d20fe21ae4
Improve performance for Power10 MMA bfloat16 GEMM
2023-01-06 23:08:37 +00:00
Pedro Caldeira
31ab62d347
Add support for Power10 (AltiVec) MMA instructions for bfloat16.
2022-11-30 23:33:37 +00:00
Rasmus Munk Larsen
273e0c884e
Revert "Add constexpr, test for C++14 constexpr."
2022-09-16 21:14:29 +00:00
Tobias Schlüter
133498c329
Add constexpr, test for C++14 constexpr.
2022-09-07 03:42:34 +00:00
Chip Kerchner
ce60a7be83
Partial Packet support for GEMM real-only (PowerPC). Also fix compilation warnings & errors for some conditions in new API.
2022-08-03 18:15:19 +00:00
Chip Kerchner
84cf3ff18d
Add pload_partial, pstore_partial (and unaligned versions), pgather_partial, pscatter_partial, loadPacketPartial and storePacketPartial.
2022-06-27 19:18:00 +00:00
aaraujom
d49ede4dc4
Add AVX512 s/dgemm optimizations for compute kernel (2nd try)
2022-05-28 02:00:21 +00:00
Eisuke Kawashima
ac5c83a3f5
unset executable flag
2022-05-22 22:47:43 +09:00
Antonio Sánchez
9b9496ad98
Revert "Add AVX512 optimizations for matrix multiply"
...
This reverts commit 25db0b4a824ba9a092bbb514fbada51bf9d37a18
2022-05-13 18:50:33 +00:00
aaraujom
25db0b4a82
Add AVX512 optimizations for matrix multiply
2022-05-12 23:41:19 +00:00
Chip Kerchner
403fa33409
Performance improvements in GEMM for Power
2022-04-05 12:18:53 +00:00
Erik Schultheis
421cbf0866
Replace Eigen type metaprogramming with corresponding std types and make use of alias templates
2022-03-16 16:43:40 +00:00
Chip Kerchner
708fd6d136
Add MMA and performance improvements for VSX in GEMV for PowerPC.
2022-01-13 13:23:18 +00:00
Kolja Brix
8d81a2339c
Reduce usage of reserved names
2022-01-10 20:53:29 +00:00
Rasmus Munk Larsen
d7d0bf832d
Issue an error in case of direct inclusion of internal headers.
2021-09-10 19:12:26 +00:00
Alexander Karatarakis
4ba872bd75
Avoid leading underscore followed by cap in template identifiers
2021-08-04 22:41:52 +00:00
Rohit Santhanam
beea14a18f
Enable extract et. al. for HIP GPU.
2021-07-09 14:58:07 +00:00
Rasmus Munk Larsen
52a5f98212
Get rid of code duplication for conj_helper. For packets where LhsType=RhsType a single generic implementation suffices. For scalars, the generic implementation of pconj automatically forwards to numext::conj, so much of the existing specialization can be avoided. For mixed types we still need specializations.
2021-06-24 15:47:48 -07:00
Christoph Hertzberg
12dda34b15
Eliminate boolean product warnings by factoring out a
...
`combine_scalar_factors` helper function.
2021-01-05 18:15:30 +00:00
Everton Constantino
6fe88a3c9d
MatrixProuct enhancements:
...
- Changes to Altivec/MatrixProduct
Adapting code to gcc 10.
Generic code style and performance enhancements.
Adding PanelMode support.
Adding stride/offset support.
Enabling float64, std::complex and std::complex.
Fixing lack of symm_pack.
Enabling mixedtypes.
- Adding std::complex tests to blasutil.
- Adding an implementation of storePacketBlock when Incr!= 1.
2020-09-02 18:21:36 -03:00
Everton Constantino
8a7f360ec3
- Vectorizing MMA packing.
...
- Optimizing MMA kernel.
- Adding PacketBlock store to blas_data_mapper.
2020-05-19 19:24:11 +00:00
Gael Guennebaud
ea0d5dc956
bug #1741 : fix C.noalias() = A*C; with C.innerStride()!=1
2019-09-10 16:25:24 +02:00
Christoph Hertzberg
a1646fc960
Commas at the end of enumerator lists are not allowed in C++03
2019-02-19 14:32:25 +01:00
Gael Guennebaud
512b74aaa1
GEMM: catch all scalar-multiple variants when falling-back to a coeff-based product.
...
Before only s*A*B was caught which was both inconsistent with GEMM, sub-optimal,
and could even lead to compilation-errors (https://stackoverflow.com/questions/54738495 ).
2019-02-18 11:47:54 +01:00
Gael Guennebaud
71496b0e25
Fix gebp kernel for real+complex in case only reals are vectorized (e.g., AVX512).
...
This commit also removes "half-packet" from data-mappers: it was not used and conceptually broken anyways.
2018-09-20 17:01:24 +02:00
Andrea Bocci
f7124b3e46
Extend CUDA support to matrix inversion and selfadjointeigensolver
2018-06-11 18:33:24 +02:00
Gael Guennebaud
3abc827354
Clean debugging code
2016-12-05 12:59:32 +01:00
Gael Guennebaud
6a5fe86098
Complete rewrite of column-major-matrix * vector product to deliver higher performance of modern CPU.
...
The previous code has been optimized for Intel core2 for which unaligned loads/stores were prohibitively expensive.
This new version exhibits much higher instruction independence (better pipelining) and explicitly leverage FMA.
According to my benchmark, on Haswell this new kernel is always faster than the previous one, and sometimes even twice as fast.
Even higher performance could be achieved with a better blocking size heuristic and, perhaps, with explicit prefetching.
We should also check triangular product/solve to optimally exploit this new kernel (working on vertical panel of 4 columns is probably not optimal anymore).
2016-12-03 21:14:14 +01:00
Gael Guennebaud
0decc31aa8
Add generic implementation of conj_helper for custom complex types.
2016-08-29 09:42:29 +02:00
Gael Guennebaud
e47a8928ec
Fix compilation in check_for_aliasing due to ambiguous specializations
2016-08-23 16:19:10 +02:00
Gael Guennebaud
6a3c451c1c
Permits call to explicit ctor.
2016-07-18 12:02:20 +02:00
Gael Guennebaud
1004c4df99
Cleanup unused functors.
2016-06-14 15:27:28 +02:00
Gael Guennebaud
64fcfd314f
Implement scalar multiples and division by a scalar as a binary-expression with a constant expression.
...
This slightly complexifies the type of the expressions and implies that we now have to distinguish between scalar*expr and expr*scalar to catch scalar-multiple expression (e.g., see BlasUtil.h), but this brings several advantages:
- it makes it clear on each side the scalar is applied,
- it clearly reflects that we are dealing with a binary-expression,
- the complexity of the type is hidden through macros defined at the end of Macros.h,
- distinguishing between "scalar op expr" and "expr op scalar" is important to support non commutative fields (like quaternions)
- "scalar op expr" is now fully equivalent to "ConstantExpr(scalar) op expr"
- scalar_multiple_op, scalar_quotient1_op and scalar_quotient2_op are not used anymore in officially supported modules (still used in Tensor)
2016-06-14 11:26:57 +02:00
Gael Guennebaud
27f0434233
Introduce internal's UIntPtr and IntPtr types for pointer to integer conversions.
...
This fixes "conversion from pointer to same-sized integral type" warnings by ICC.
Ideally, we would use the std::[u]intptr_t types all the time, but since they are C99/C++11 only,
let's be safe.
2016-05-26 10:52:12 +02:00
Benoit Steiner
bbdabbb379
Made the blas utils usable from within a cuda kernel
2016-01-11 17:26:56 -08:00
Gael Guennebaud
2afdef6a54
Generalize first_aligned to take the requested alignment as a template parameter, and add a first_default_aligned variante calling first_aligned with the requirement of the largest packet for the given scalar type.
2015-08-06 17:52:01 +02:00
Gael Guennebaud
793e4c6d77
bug #923 : fix EIGEN_USE_BLAS mode
2015-06-23 11:13:24 +02:00
Gael Guennebaud
d6b2f300db
Fix MSVC compilation: aligned type must be passed by reference
2015-03-19 17:28:32 +01:00
Gael Guennebaud
20cac72b82
Packet must be passed by const reference and not by value to avoid alignment issue.
2015-02-17 22:58:32 +01:00
Benoit Steiner
c739102ef9
Pulled the latest changes from the trunk
2015-02-06 05:25:03 -08:00
Benoit Steiner
2dde63499c
Generalized the matrix vector product code.
2014-10-31 16:33:51 -07:00
Benoit Steiner
b7271dffb5
Generalized the gebp apis
2014-10-02 16:51:57 -07:00
Christoph Hertzberg
36448c9e28
Make constructors explicit if they could lead to unintended implicit conversion
2014-09-23 14:28:23 +02:00