21 Commits

Author SHA1 Message Date
Gael Guennebaud
fe25f3b8e3 FMA has been wrongly disabled 2015-02-10 23:11:35 +01:00
Benoit Jacob
0f21613698 bug #936, patch 2/3: Remove EIGEN_VECTORIZE_FMA, was redundant with EIGEN_HAS_SINGLE_INSTRUCTION_MADD 2015-01-30 17:44:26 -05:00
Benoit Jacob
340b8afb14 bug #936, patch 1.5/3: rename _FUSED_ macros to _SINGLE_INSTRUCTION_,
because this is what they are about. "Fused" means "no intermediate rounding
between the mul and the add, only one rounding at the end". Instead,
what we are concerned about here is whether a temporary register is needed,
i.e. whether the MUL and ADD are separate instructions.
Concretely, on ARM NEON, a single-instruction mul-add is always available: VMLA.
But a true fused mul-add is only available on VFPv4: VFMA.
2015-01-31 14:15:57 -05:00
Gael Guennebaud
ee06f78679 Introduce unified macros to identify compiler, OS, and architecture. They are all defined in util/Macros.h and prefixed with EIGEN_COMP_, EIGEN_OS_, and EIGEN_ARCH_ respectively. 2014-11-04 21:58:52 +01:00
Jitse Niesen
25bceefb4e Replace asm by __asm__ (bug #873) 2014-09-06 11:47:24 +01:00
Gael Guennebaud
b47ef1431f Fix many long to int implicit conversions 2014-07-08 16:47:11 +02:00
Gael Guennebaud
3d8d0f6269 Enable vectorization of pack_rhs with a column-major RHS.
Rename and generalize Kernel<*> to PacketBlock<*,N>.
2014-04-25 10:56:18 +02:00
Gael Guennebaud
b0e19db1cf Enable fused madd for Altivec 2014-04-24 23:17:18 +02:00
Gael Guennebaud
9746396d1b Optimize AVX pset1 for complexes and ploaddup 2014-04-17 20:51:04 +02:00
Gael Guennebaud
0fa8290366 Optimize ploaddup for AVX 2014-04-17 16:02:27 +02:00
Gael Guennebaud
d5a795f673 New gebp kernel handling up to 3 packets x 4 register-level blocks. Huge speeup on Haswell.
This changeset also introduce new vector functions: ploadquad and predux4.
2014-04-16 17:05:11 +02:00
Gael Guennebaud
1c0728043a Workaround alignment warnings 2014-03-30 22:43:47 +02:00
Gael Guennebaud
10aa14592a Add a mechanism to recursively access to half-size packet types 2014-03-28 10:18:04 +01:00
Benoit Steiner
51e85c936d Merged latest changes from parent. 2014-03-27 18:32:15 -07:00
Benoit Steiner
7f3162f707 Implemented the AVX version of the gather and scatter packet primitives. 2014-03-27 17:42:25 -07:00
Gael Guennebaud
58fe2fc2b2 enforce the use of vfmadd231ps for pmadd (gcc and clang stupidely generates the other fmadd variants plus some register moves...) 2014-03-27 23:38:50 +01:00
Benoit Steiner
c4902a3d01 Implemented the AVX version of the ptranspose packet primitive. 2014-03-27 09:34:51 -07:00
Benoit Steiner
e45a6bed45 Specialized the pload1 packet primitive for Packet8f and Packet4d in order to take advantage of the vbroadcastss and vbroadcastsd instructions whenever possible. 2014-03-26 15:58:13 -07:00
Benoit Steiner
7ae9b0805d Used AVX instructions to vectorize the predux_min<Packet8f>, predux_min<Packet4d>, predux_max<Packet8f>, and predux_max<Packet4d> packet primitives. 2014-03-24 13:33:40 -07:00
Benoit Steiner
db7d49efbb Added support for FMA instructions 2014-02-24 13:45:32 -08:00
Benoit Steiner
64a85800bd Added support for AVX to Eigen. 2014-01-29 11:43:05 -08:00