Benoit Steiner
|
95a430a2ca
|
Vector primitives for CUDA
|
2014-10-03 19:45:19 -07:00 |
|
Benoit Steiner
|
10a79ca3a3
|
Merged latest updates from the Eigen trunk.
|
2014-09-15 09:18:16 -07:00 |
|
Benoit Steiner
|
16047c8d4a
|
Pulled in the latest changes from the Eigen trunk
|
2014-08-13 22:25:29 -07:00 |
|
Jitse Niesen
|
25bceefb4e
|
Replace asm by __asm__ (bug #873)
|
2014-09-06 11:47:24 +01:00 |
|
Gael Guennebaud
|
0369db12af
|
bug #871: fix compilation on ARM/Neon regarding __has_builtin usage
|
2014-09-01 10:52:58 +02:00 |
|
Konstantinos Margaritis
|
2c625ec9ba
|
Simplification of some Altivec constants, reuse existing constants and avoid loading from RAM esp in the case of p16uc_COMPLEX_TRANSPOSE*
|
2014-07-22 20:46:03 +00:00 |
|
Konstantinos Margaritis
|
0a945687b7
|
Added HasDiv=1 to Altivec PacketMath.h, now vectorization_logic test passes.
Added comments to the constants, indicative of the actual values
|
2014-07-15 11:02:51 +00:00 |
|
Christoph Hertzberg
|
d1460d9278
|
stride must be DenseIndex not int
|
2014-07-10 16:23:20 +02:00 |
|
Gael Guennebaud
|
b47ef1431f
|
Fix many long to int implicit conversions
|
2014-07-08 16:47:11 +02:00 |
|
Benoit Steiner
|
4304c73542
|
Pulled latest updates from the Eigen main trunk.
|
2014-06-10 10:23:32 -07:00 |
|
Benoit Steiner
|
8c8ae2d819
|
Fixed a typo
|
2014-06-07 11:24:38 -07:00 |
|
Benoit Steiner
|
29aebf96e6
|
Created the pblend packet primitive and implemented it using SSE and AVX instructions.
|
2014-06-06 20:18:44 -07:00 |
|
Gael Guennebaud
|
d67aa1549b
|
Add missing add_subdirectory directive
|
2014-05-03 10:46:11 +02:00 |
|
Gael Guennebaud
|
450d0c3de0
|
Make sure that calls to broadcast4 are 16 bytes aligned
|
2014-04-25 22:25:48 +02:00 |
|
Gael Guennebaud
|
2dbfd83424
|
Implement pbroadcast4 on altivec
|
2014-04-25 02:46:57 -07:00 |
|
Gael Guennebaud
|
4def7b1fa5
|
Fix ptranspose overload prototypes for NEON
|
2014-04-25 11:15:13 +02:00 |
|
Gael Guennebaud
|
3d8d0f6269
|
Enable vectorization of pack_rhs with a column-major RHS.
Rename and generalize Kernel<*> to PacketBlock<*,N>.
|
2014-04-25 10:56:18 +02:00 |
|
Gael Guennebaud
|
b0e19db1cf
|
Enable fused madd for Altivec
|
2014-04-24 23:17:18 +02:00 |
|
Gael Guennebaud
|
8d85ce88e1
|
Implement ptranspose on altivec and fix pgather/pscatter
|
2014-04-24 05:47:53 -07:00 |
|
Benoit Steiner
|
4eb92e5647
|
Fixed the NEON implementation of predux_max<Packet4i>.
|
2014-04-23 18:23:07 -07:00 |
|
Benoit Steiner
|
ccb4dec719
|
Created a NEON version of the ptranspose packet primitives
|
2014-04-23 18:22:10 -07:00 |
|
Gael Guennebaud
|
82b09fcb91
|
Add Altivec implementation of pgather/pscatter (not tested)
|
2014-04-23 13:09:26 +02:00 |
|
Gael Guennebaud
|
934ce93886
|
merge with default branch
|
2014-04-22 17:00:38 +02:00 |
|
Gael Guennebaud
|
5c5231ab71
|
Workaround gcc's default ABI not being able to distinghish between vector types of different sizes.
|
2014-04-22 16:03:19 +02:00 |
|
Gael Guennebaud
|
1388f4f9fd
|
Fix typo (was working with clang\!)
|
2014-04-18 11:43:13 +02:00 |
|
Gael Guennebaud
|
2c3c95990d
|
merge
|
2014-04-17 22:50:49 +02:00 |
|
Benoit Steiner
|
6d6df90c9a
|
Implemented the pgather/pscatter packet primitives for the arm/NEON architecture
|
2014-04-17 12:28:01 -07:00 |
|
Gael Guennebaud
|
9746396d1b
|
Optimize AVX pset1 for complexes and ploaddup
|
2014-04-17 20:51:04 +02:00 |
|
Gael Guennebaud
|
0fa8290366
|
Optimize ploaddup for AVX
|
2014-04-17 16:02:27 +02:00 |
|
Gael Guennebaud
|
d5a795f673
|
New gebp kernel handling up to 3 packets x 4 register-level blocks. Huge speeup on Haswell.
This changeset also introduce new vector functions: ploadquad and predux4.
|
2014-04-16 17:05:11 +02:00 |
|
Benoit Steiner
|
feaf7c7e6d
|
Optimized SSE unaligned loads and stores when compiling a 64bit target with a recent version of gcc (ie gcc 4.8).
|
2014-04-14 10:44:17 -07:00 |
|
Benoit Steiner
|
8044b00a7f
|
bug #782: Workaround for gcc <= 4.4 compilation error on the NEON PacketMath code.
|
2014-04-03 23:41:47 +02:00 |
|
Gael Guennebaud
|
1c0728043a
|
Workaround alignment warnings
|
2014-03-30 22:43:47 +02:00 |
|
Gael Guennebaud
|
10aa14592a
|
Add a mechanism to recursively access to half-size packet types
|
2014-03-28 10:18:04 +01:00 |
|
Benoit Steiner
|
51e85c936d
|
Merged latest changes from parent.
|
2014-03-27 18:32:15 -07:00 |
|
Benoit Steiner
|
8a94cb3edd
|
Implemented the SSE version of the gather and scatter packet primitives.
|
2014-03-27 18:29:01 -07:00 |
|
Benoit Steiner
|
7f3162f707
|
Implemented the AVX version of the gather and scatter packet primitives.
|
2014-03-27 17:42:25 -07:00 |
|
Gael Guennebaud
|
58fe2fc2b2
|
enforce the use of vfmadd231ps for pmadd (gcc and clang stupidely generates the other fmadd variants plus some register moves...)
|
2014-03-27 23:38:50 +01:00 |
|
Benoit Steiner
|
c4902a3d01
|
Implemented the AVX version of the ptranspose packet primitive.
|
2014-03-27 09:34:51 -07:00 |
|
Gael Guennebaud
|
052aedd394
|
Implement pcplflip, palign, predux and the likes from AVC/complexes
|
2014-03-27 14:47:00 +01:00 |
|
Benoit Steiner
|
a419cea4a0
|
Created the ptranspose packet primitive that can transpose an array of N packets, where N is the number of words in each packet. This primitive will be used to complete the vectorization of the gemm_pack_lhs and gemm_pack_rhs functions.
Implemented the primitive using SSE instructions.
|
2014-03-26 19:03:07 -07:00 |
|
Benoit Steiner
|
e45a6bed45
|
Specialized the pload1 packet primitive for Packet8f and Packet4d in order to take advantage of the vbroadcastss and vbroadcastsd instructions whenever possible.
|
2014-03-26 15:58:13 -07:00 |
|
Benoit Steiner
|
cc73164aa8
|
Merged latest updates from the parent branch
|
2014-03-26 15:23:59 -07:00 |
|
Benoit Steiner
|
a078f442a3
|
Vectorized the multiplication and division of complex numbers using AVX instructions.
|
2014-03-26 15:11:18 -07:00 |
|
Benoit Steiner
|
cf1a7bfbe1
|
Used AVX instructions to vectorize the complex version of the pfirst and ploaddup packet primitives.
Silenced a few compilation warnings.
|
2014-03-26 12:03:31 -07:00 |
|
Gael Guennebaud
|
bc401eb6fa
|
Implement new 1 packet x 8 gebp kernel
|
2014-03-26 18:53:00 +01:00 |
|
Gael Guennebaud
|
b286a1e75c
|
add pbroadcast2/4 generic intrinsics
|
2014-03-26 16:46:36 +01:00 |
|
Benoit Steiner
|
6bf3cc2732
|
Use AVX instructions to vectorize pset1<Packet2cd>, pset1<Packet4cf>, preverse<Packet2cd>, and preverse<Packet4cf>
|
2014-03-25 09:00:43 -07:00 |
|
Benoit Steiner
|
7ae9b0805d
|
Used AVX instructions to vectorize the predux_min<Packet8f>, predux_min<Packet4d>, predux_max<Packet8f>, and predux_max<Packet4d> packet primitives.
|
2014-03-24 13:33:40 -07:00 |
|
Benoit Steiner
|
db7d49efbb
|
Added support for FMA instructions
|
2014-02-24 13:45:32 -08:00 |
|