Gael Guennebaud
fe25f3b8e3
FMA has been wrongly disabled
2015-02-10 23:11:35 +01:00
Benoit Jacob
0f21613698
bug #936 , patch 2/3: Remove EIGEN_VECTORIZE_FMA, was redundant with EIGEN_HAS_SINGLE_INSTRUCTION_MADD
2015-01-30 17:44:26 -05:00
Benoit Jacob
340b8afb14
bug #936 , patch 1.5/3: rename _FUSED_ macros to _SINGLE_INSTRUCTION_,
...
because this is what they are about. "Fused" means "no intermediate rounding
between the mul and the add, only one rounding at the end". Instead,
what we are concerned about here is whether a temporary register is needed,
i.e. whether the MUL and ADD are separate instructions.
Concretely, on ARM NEON, a single-instruction mul-add is always available: VMLA.
But a true fused mul-add is only available on VFPv4: VFMA.
2015-01-31 14:15:57 -05:00
Gael Guennebaud
ee06f78679
Introduce unified macros to identify compiler, OS, and architecture. They are all defined in util/Macros.h and prefixed with EIGEN_COMP_, EIGEN_OS_, and EIGEN_ARCH_ respectively.
2014-11-04 21:58:52 +01:00
Jitse Niesen
25bceefb4e
Replace asm by __asm__ (bug #873 )
2014-09-06 11:47:24 +01:00
Gael Guennebaud
b47ef1431f
Fix many long to int implicit conversions
2014-07-08 16:47:11 +02:00
Gael Guennebaud
3d8d0f6269
Enable vectorization of pack_rhs with a column-major RHS.
...
Rename and generalize Kernel<*> to PacketBlock<*,N>.
2014-04-25 10:56:18 +02:00
Gael Guennebaud
b0e19db1cf
Enable fused madd for Altivec
2014-04-24 23:17:18 +02:00
Gael Guennebaud
9746396d1b
Optimize AVX pset1 for complexes and ploaddup
2014-04-17 20:51:04 +02:00
Gael Guennebaud
0fa8290366
Optimize ploaddup for AVX
2014-04-17 16:02:27 +02:00
Gael Guennebaud
d5a795f673
New gebp kernel handling up to 3 packets x 4 register-level blocks. Huge speeup on Haswell.
...
This changeset also introduce new vector functions: ploadquad and predux4.
2014-04-16 17:05:11 +02:00
Gael Guennebaud
1c0728043a
Workaround alignment warnings
2014-03-30 22:43:47 +02:00
Gael Guennebaud
10aa14592a
Add a mechanism to recursively access to half-size packet types
2014-03-28 10:18:04 +01:00
Benoit Steiner
51e85c936d
Merged latest changes from parent.
2014-03-27 18:32:15 -07:00
Benoit Steiner
7f3162f707
Implemented the AVX version of the gather and scatter packet primitives.
2014-03-27 17:42:25 -07:00
Gael Guennebaud
58fe2fc2b2
enforce the use of vfmadd231ps for pmadd (gcc and clang stupidely generates the other fmadd variants plus some register moves...)
2014-03-27 23:38:50 +01:00
Benoit Steiner
c4902a3d01
Implemented the AVX version of the ptranspose packet primitive.
2014-03-27 09:34:51 -07:00
Benoit Steiner
e45a6bed45
Specialized the pload1 packet primitive for Packet8f and Packet4d in order to take advantage of the vbroadcastss and vbroadcastsd instructions whenever possible.
2014-03-26 15:58:13 -07:00
Benoit Steiner
7ae9b0805d
Used AVX instructions to vectorize the predux_min<Packet8f>, predux_min<Packet4d>, predux_max<Packet8f>, and predux_max<Packet4d> packet primitives.
2014-03-24 13:33:40 -07:00
Benoit Steiner
db7d49efbb
Added support for FMA instructions
2014-02-24 13:45:32 -08:00
Benoit Steiner
64a85800bd
Added support for AVX to Eigen.
2014-01-29 11:43:05 -08:00