Gael Guennebaud
772e59d475
bug #1360 : fix sign issue with pmull on altivec
...
(grafted from 8c0e70150433e8fe50c980ff629a9f80162eaf92
)
2016-12-18 22:13:19 +00:00
Konstantinos Margaritis
9f7caa7e7d
minor fixes for big endian altivec/vsx
2016-07-10 07:05:10 -03:00
Konstantinos Margaritis
be107e387b
fix compilation with clang 3.9, fix performance with pset1, use vector operators instead of intrinsics in some cases
2016-06-23 10:19:05 -03:00
Konstantinos Margaritis
b410d46482
mostly cleanups and modernizing code
2016-06-19 16:12:52 -03:00
Konstantinos Margaritis
8ed26120c8
bring Altivec/VSX to a better state, implement some of the missing functions
2016-04-28 14:32:42 -03:00
Doug Kwan
5c9ee73eb9
Implement plog and pexp for AltiVec.
2015-07-30 11:12:42 -07:00
Gael Guennebaud
6245591349
Fix prototype of plset and generalize linspace functor.
2015-08-07 19:27:59 +02:00
Gael Guennebaud
ce57dbd937
Let unpacket_traits<> exposes the required alignment and make use of it everywhere
2015-08-07 10:44:01 +02:00
Gael Guennebaud
45cbb0bbb1
The usage of DenseIndex is deprecated, so let's replace DenseIndex by Index
2015-02-16 15:05:41 +01:00
Benoit Jacob
0f21613698
bug #936 , patch 2/3: Remove EIGEN_VECTORIZE_FMA, was redundant with EIGEN_HAS_SINGLE_INSTRUCTION_MADD
2015-01-30 17:44:26 -05:00
Benoit Jacob
340b8afb14
bug #936 , patch 1.5/3: rename _FUSED_ macros to _SINGLE_INSTRUCTION_,
...
because this is what they are about. "Fused" means "no intermediate rounding
between the mul and the add, only one rounding at the end". Instead,
what we are concerned about here is whether a temporary register is needed,
i.e. whether the MUL and ADD are separate instructions.
Concretely, on ARM NEON, a single-instruction mul-add is always available: VMLA.
But a true fused mul-add is only available on VFPv4: VFMA.
2015-01-31 14:15:57 -05:00
Benoit Jacob
9f99f61e69
bug #936 , patch 1/3: some cleanup and renaming for consistency.
2015-01-30 17:43:56 -05:00
Konstantinos Margaritis
9d3c69952b
fixed to make big-endian VSX work as well
2014-10-01 09:43:56 +00:00
Konstantinos Margaritis
de38ff2499
prefetch are noops on VSX, actually disable the prefetch trait
2014-09-21 11:56:07 +00:00
Konstantinos Margaritis
56408504e4
fix compile error on big endian altivec
2014-09-21 13:59:30 +03:00
Konstantinos Margaritis
974fe38ca3
prefetch are noops on VSX
2014-09-21 11:24:30 +00:00
Konstantinos Margaritis
c0205ca4af
VSX supports vec_div, implement where appropriate (float/doubles)
2014-09-21 08:12:22 +00:00
Konstantinos Margaritis
10f8aabb61
VSX port passes packetmath_[1-5] tests!
2014-09-20 22:31:31 +00:00
Konstantinos Margaritis
60663a510a
32-bit floats/ints, 64-bit doubles pass packetmath tests, complex 32/64-bit remaining
2014-09-19 21:05:01 +00:00
Konstantinos Margaritis
470aa15c35
First time it compiles, but fails to pass the tests.
2014-09-09 16:58:48 +00:00
Konstantinos Margaritis
7ff266e3ce
Initial VSX commit
2014-08-29 20:03:49 +00:00
Konstantinos Margaritis
0a945687b7
Added HasDiv=1 to Altivec PacketMath.h, now vectorization_logic test passes.
...
Added comments to the constants, indicative of the actual values
2014-07-15 11:02:51 +00:00
Gael Guennebaud
b47ef1431f
Fix many long to int implicit conversions
2014-07-08 16:47:11 +02:00
Gael Guennebaud
2dbfd83424
Implement pbroadcast4 on altivec
2014-04-25 02:46:57 -07:00
Gael Guennebaud
3d8d0f6269
Enable vectorization of pack_rhs with a column-major RHS.
...
Rename and generalize Kernel<*> to PacketBlock<*,N>.
2014-04-25 10:56:18 +02:00
Gael Guennebaud
b0e19db1cf
Enable fused madd for Altivec
2014-04-24 23:17:18 +02:00
Gael Guennebaud
8d85ce88e1
Implement ptranspose on altivec and fix pgather/pscatter
2014-04-24 05:47:53 -07:00
Gael Guennebaud
82b09fcb91
Add Altivec implementation of pgather/pscatter (not tested)
2014-04-23 13:09:26 +02:00
Gael Guennebaud
d5a795f673
New gebp kernel handling up to 3 packets x 4 register-level blocks. Huge speeup on Haswell.
...
This changeset also introduce new vector functions: ploadquad and predux4.
2014-04-16 17:05:11 +02:00
Gael Guennebaud
10aa14592a
Add a mechanism to recursively access to half-size packet types
2014-03-28 10:18:04 +01:00
Gael Guennebaud
4612a1cd87
Fix ploaddup and lin-spaced with AltiVec.
2013-09-10 16:13:59 +02:00
Gael Guennebaud
b3adc4face
Add missing pconj specializations
2013-05-17 17:25:29 +02:00
Benoit Jacob
69124cfca2
Automatic relicensing to MPL2 using Keirs script. Manual fixup follows.
2012-07-13 14:42:47 -04:00
Jitse Niesen
3c412183b2
Get rid of include directives inside namespace blocks (bug #339 ).
2012-04-15 11:06:28 +01:00
Gael Guennebaud
9c86ee2695
fix static inline versus inline static issues (the former is the correct order)
2012-01-31 12:58:52 +01:00
Thomas Capricelli
883219041f
better fix for gcc 4.6.0 / ptrdiff_t, as suggested by Benoit
2011-05-05 18:48:18 +02:00
Thomas Capricelli
a18a1be42d
Fix compilation with gcc-4.6.0, patch provided by Anton Gladky <gladky.anton@gmail.com>,
...
working on debian packaging.
2011-05-05 00:44:24 +02:00
Gael Guennebaud
bb9a465c5a
fix AltiVec ploaddup
2011-02-24 00:23:50 +03:00
Gael Guennebaud
955c099eb5
implement ploaddup for altivec and add respective unit test
2011-02-23 18:20:55 +03:00
Jitse Niesen
e2d46eac42
Remove all references to EIGEN_TUNE_CPU_CACHE_SIZE.
...
This macro is no longer used as of revision 0212eec23f4cb64e8426bf32568156df302f8fcf
.
2011-02-04 22:33:53 +01:00
Benoit Jacob
4716040703
bug #86 : use internal:: namespace instead of ei_ prefix
2010-10-25 10:15:22 -04:00
Gael Guennebaud
ff96c94043
mixing types in product step 2:
...
* pload* and pset1 are now templated on the packet type
* gemv routines are now embeded into a structure with
a consistent API with respect to gemm
* some configurations of vector * matrix and matrix * matrix works fine,
some need more work...
2010-07-11 15:48:30 +02:00
Gael Guennebaud
4161b8be67
sync
2010-07-10 22:58:51 +02:00
Konstantinos Margaritis
642cc27eb1
forgot to commit ei_p4f_FORWARD;
2010-07-09 18:08:18 +03:00
Gael Guennebaud
300a226ffa
scalars fitting in a single packet requires more work, step 1
...
* add a, Alignable trait
* update LinearVectorization assignment
2010-07-08 14:27:47 +02:00
Gael Guennebaud
b0896382a3
s/IsVectorized/Vectorizable
2010-07-07 11:10:46 +02:00
Gael Guennebaud
bfa606d16f
* add a IsVectorized mechanism (instead of packet-size>1...)
...
* vectorize complex<double>
2010-07-06 23:36:00 +02:00
Konstantinos Margaritis
cf3616b2c0
AltiVec signed integer pmadd removed, proved to be 2x slower than the scalar trait(!).
2010-06-28 21:24:55 +03:00
Gael Guennebaud
88cd6885be
Add a proof concept API to configure the blocking parameters at runtime.
...
After validation of the final API I'll update the other products to use it.
2010-06-07 16:35:25 +02:00
Konstantinos Margaritis
9337f371d2
(proper commit this time)
...
replaced _mm_prefetch in GeneralBlockPanelKernel.h, with ei_prefetch() inline function.
Implemented NEON and AltiVec versions, copied SSE version over from GeneralBlockPanelKernel.h.
Also in GCC case (or rather !_MSC_VER) it's implemented using __builtin_prefetch().
NEON managed to give a small but welcome boost, 0.88GFLOPS -> 0.91GFLOPS.
2010-04-24 00:58:44 +03:00