Gael Guennebaud
4def7b1fa5
Fix ptranspose overload prototypes for NEON
2014-04-25 11:15:13 +02:00
Gael Guennebaud
3d8d0f6269
Enable vectorization of pack_rhs with a column-major RHS.
...
Rename and generalize Kernel<*> to PacketBlock<*,N>.
2014-04-25 10:56:18 +02:00
Gael Guennebaud
b0e19db1cf
Enable fused madd for Altivec
2014-04-24 23:17:18 +02:00
Gael Guennebaud
8d85ce88e1
Implement ptranspose on altivec and fix pgather/pscatter
2014-04-24 05:47:53 -07:00
Benoit Steiner
4eb92e5647
Fixed the NEON implementation of predux_max<Packet4i>.
2014-04-23 18:23:07 -07:00
Benoit Steiner
ccb4dec719
Created a NEON version of the ptranspose packet primitives
2014-04-23 18:22:10 -07:00
Gael Guennebaud
82b09fcb91
Add Altivec implementation of pgather/pscatter (not tested)
2014-04-23 13:09:26 +02:00
Gael Guennebaud
934ce93886
merge with default branch
2014-04-22 17:00:38 +02:00
Gael Guennebaud
5c5231ab71
Workaround gcc's default ABI not being able to distinghish between vector types of different sizes.
2014-04-22 16:03:19 +02:00
Gael Guennebaud
1388f4f9fd
Fix typo (was working with clang\!)
2014-04-18 11:43:13 +02:00
Gael Guennebaud
2c3c95990d
merge
2014-04-17 22:50:49 +02:00
Benoit Steiner
6d6df90c9a
Implemented the pgather/pscatter packet primitives for the arm/NEON architecture
2014-04-17 12:28:01 -07:00
Gael Guennebaud
9746396d1b
Optimize AVX pset1 for complexes and ploaddup
2014-04-17 20:51:04 +02:00
Gael Guennebaud
0fa8290366
Optimize ploaddup for AVX
2014-04-17 16:02:27 +02:00
Gael Guennebaud
d5a795f673
New gebp kernel handling up to 3 packets x 4 register-level blocks. Huge speeup on Haswell.
...
This changeset also introduce new vector functions: ploadquad and predux4.
2014-04-16 17:05:11 +02:00
Benoit Steiner
feaf7c7e6d
Optimized SSE unaligned loads and stores when compiling a 64bit target with a recent version of gcc (ie gcc 4.8).
2014-04-14 10:44:17 -07:00
Benoit Steiner
8044b00a7f
bug #782 : Workaround for gcc <= 4.4 compilation error on the NEON PacketMath code.
2014-04-03 23:41:47 +02:00
Gael Guennebaud
1c0728043a
Workaround alignment warnings
2014-03-30 22:43:47 +02:00
Gael Guennebaud
10aa14592a
Add a mechanism to recursively access to half-size packet types
2014-03-28 10:18:04 +01:00
Benoit Steiner
51e85c936d
Merged latest changes from parent.
2014-03-27 18:32:15 -07:00
Benoit Steiner
8a94cb3edd
Implemented the SSE version of the gather and scatter packet primitives.
2014-03-27 18:29:01 -07:00
Benoit Steiner
7f3162f707
Implemented the AVX version of the gather and scatter packet primitives.
2014-03-27 17:42:25 -07:00
Gael Guennebaud
58fe2fc2b2
enforce the use of vfmadd231ps for pmadd (gcc and clang stupidely generates the other fmadd variants plus some register moves...)
2014-03-27 23:38:50 +01:00
Benoit Steiner
c4902a3d01
Implemented the AVX version of the ptranspose packet primitive.
2014-03-27 09:34:51 -07:00
Gael Guennebaud
052aedd394
Implement pcplflip, palign, predux and the likes from AVC/complexes
2014-03-27 14:47:00 +01:00
Benoit Steiner
a419cea4a0
Created the ptranspose packet primitive that can transpose an array of N packets, where N is the number of words in each packet. This primitive will be used to complete the vectorization of the gemm_pack_lhs and gemm_pack_rhs functions.
...
Implemented the primitive using SSE instructions.
2014-03-26 19:03:07 -07:00
Benoit Steiner
e45a6bed45
Specialized the pload1 packet primitive for Packet8f and Packet4d in order to take advantage of the vbroadcastss and vbroadcastsd instructions whenever possible.
2014-03-26 15:58:13 -07:00
Benoit Steiner
cc73164aa8
Merged latest updates from the parent branch
2014-03-26 15:23:59 -07:00
Benoit Steiner
a078f442a3
Vectorized the multiplication and division of complex numbers using AVX instructions.
2014-03-26 15:11:18 -07:00
Benoit Steiner
cf1a7bfbe1
Used AVX instructions to vectorize the complex version of the pfirst and ploaddup packet primitives.
...
Silenced a few compilation warnings.
2014-03-26 12:03:31 -07:00
Gael Guennebaud
bc401eb6fa
Implement new 1 packet x 8 gebp kernel
2014-03-26 18:53:00 +01:00
Gael Guennebaud
b286a1e75c
add pbroadcast2/4 generic intrinsics
2014-03-26 16:46:36 +01:00
Benoit Steiner
6bf3cc2732
Use AVX instructions to vectorize pset1<Packet2cd>, pset1<Packet4cf>, preverse<Packet2cd>, and preverse<Packet4cf>
2014-03-25 09:00:43 -07:00
Benoit Steiner
7ae9b0805d
Used AVX instructions to vectorize the predux_min<Packet8f>, predux_min<Packet4d>, predux_max<Packet8f>, and predux_max<Packet4d> packet primitives.
2014-03-24 13:33:40 -07:00
Benoit Steiner
db7d49efbb
Added support for FMA instructions
2014-02-24 13:45:32 -08:00
Benoit Steiner
64a85800bd
Added support for AVX to Eigen.
2014-01-29 11:43:05 -08:00
Gael Guennebaud
a7621809fe
Remove useless register keyword, and optimize predux_min/max for SSE4
2014-01-25 16:54:13 +01:00
Gael Guennebaud
01fd880424
Revert previous change and introduce a new workaround regarding gcc generating a shufps instruction instead of the more efficient pshufd instruction.
...
The trick consists in introducing a new pload1 function to be used in low level product kernels for which bug #203 does not apply.
Indeed, it turned out that using inline assembly prevents gcc of doing a good job at instructtion reordering.
2014-03-20 16:03:46 +01:00
Gael Guennebaud
c39a3fa7a1
Makes gcc to generate a pshufd instruction for pset1
2014-03-20 10:14:26 +01:00
Gael Guennebaud
19521c83b8
bug #677 : fix usage of pld instrinsics for ccomplexes
2013-11-02 12:10:48 +01:00
Gael Guennebaud
6dc0e59b1e
Fix bug #677 : compilation issue on arm64 which does not have the PLD instruction
2013-10-31 13:52:43 +01:00
Gael Guennebaud
9f3f42d66a
fix a few "dead stores" warnings
2013-10-26 13:59:02 +02:00
Gael Guennebaud
4612a1cd87
Fix ploaddup and lin-spaced with AltiVec.
2013-09-10 16:13:59 +02:00
Gael Guennebaud
c47010e3d2
typo
2013-08-19 16:10:00 +02:00
Gael Guennebaud
d4dd6aaed2
Fix bug #642 : add vectorization of sqrt for doubles, and make sqrt really safe if EIGEN_FAST_MATH is disabled
2013-08-19 16:02:27 +02:00
Simon Pilgrim
fab0235369
Fix bug #590 : NEON Duplicate lane load
2013-06-23 14:13:21 +02:00
Gael Guennebaud
9f11f80db1
Make psqrt works with numeric_limits<float>::min
2013-06-14 10:55:05 +02:00
Jeff Dean
d5fa5001a7
Fix bug #613 : psqrt was incorrect for small numbers
2013-06-13 18:17:27 +02:00
Gael Guennebaud
62670c83a0
Fix bug #314 : move remaining math functions from internal to numext namespace
2013-06-10 23:40:56 +02:00
Simon Pilgrim
ca67c60150
Fix bug #591 : minor optimization in NEON vectorization support
2013-06-10 15:59:03 +02:00