61 Commits

Author SHA1 Message Date
Konstantinos Margaritis
974fe38ca3 prefetch are noops on VSX 2014-09-21 11:24:30 +00:00
Konstantinos Margaritis
c0205ca4af VSX supports vec_div, implement where appropriate (float/doubles) 2014-09-21 08:12:22 +00:00
Konstantinos Margaritis
10f8aabb61 VSX port passes packetmath_[1-5] tests! 2014-09-20 22:31:31 +00:00
Konstantinos Margaritis
60663a510a 32-bit floats/ints, 64-bit doubles pass packetmath tests, complex 32/64-bit remaining 2014-09-19 21:05:01 +00:00
Konstantinos Margaritis
470aa15c35 First time it compiles, but fails to pass the tests. 2014-09-09 16:58:48 +00:00
Konstantinos Margaritis
7ff266e3ce Initial VSX commit 2014-08-29 20:03:49 +00:00
Konstantinos Margaritis
0a945687b7 Added HasDiv=1 to Altivec PacketMath.h, now vectorization_logic test passes.
Added comments to the constants, indicative of the actual values
2014-07-15 11:02:51 +00:00
Gael Guennebaud
b47ef1431f Fix many long to int implicit conversions 2014-07-08 16:47:11 +02:00
Gael Guennebaud
2dbfd83424 Implement pbroadcast4 on altivec 2014-04-25 02:46:57 -07:00
Gael Guennebaud
3d8d0f6269 Enable vectorization of pack_rhs with a column-major RHS.
Rename and generalize Kernel<*> to PacketBlock<*,N>.
2014-04-25 10:56:18 +02:00
Gael Guennebaud
b0e19db1cf Enable fused madd for Altivec 2014-04-24 23:17:18 +02:00
Gael Guennebaud
8d85ce88e1 Implement ptranspose on altivec and fix pgather/pscatter 2014-04-24 05:47:53 -07:00
Gael Guennebaud
82b09fcb91 Add Altivec implementation of pgather/pscatter (not tested) 2014-04-23 13:09:26 +02:00
Gael Guennebaud
d5a795f673 New gebp kernel handling up to 3 packets x 4 register-level blocks. Huge speeup on Haswell.
This changeset also introduce new vector functions: ploadquad and predux4.
2014-04-16 17:05:11 +02:00
Gael Guennebaud
10aa14592a Add a mechanism to recursively access to half-size packet types 2014-03-28 10:18:04 +01:00
Gael Guennebaud
4612a1cd87 Fix ploaddup and lin-spaced with AltiVec. 2013-09-10 16:13:59 +02:00
Gael Guennebaud
b3adc4face Add missing pconj specializations 2013-05-17 17:25:29 +02:00
Benoit Jacob
69124cfca2 Automatic relicensing to MPL2 using Keirs script. Manual fixup follows. 2012-07-13 14:42:47 -04:00
Jitse Niesen
3c412183b2 Get rid of include directives inside namespace blocks (bug #339). 2012-04-15 11:06:28 +01:00
Gael Guennebaud
9c86ee2695 fix static inline versus inline static issues (the former is the correct order) 2012-01-31 12:58:52 +01:00
Thomas Capricelli
883219041f better fix for gcc 4.6.0 / ptrdiff_t, as suggested by Benoit 2011-05-05 18:48:18 +02:00
Thomas Capricelli
a18a1be42d Fix compilation with gcc-4.6.0, patch provided by Anton Gladky <gladky.anton@gmail.com>,
working on debian packaging.
2011-05-05 00:44:24 +02:00
Gael Guennebaud
bb9a465c5a fix AltiVec ploaddup 2011-02-24 00:23:50 +03:00
Gael Guennebaud
955c099eb5 implement ploaddup for altivec and add respective unit test 2011-02-23 18:20:55 +03:00
Jitse Niesen
e2d46eac42 Remove all references to EIGEN_TUNE_CPU_CACHE_SIZE.
This macro is no longer used as of revision 0212eec23f4cb64e8426bf32568156df302f8fcf
.
2011-02-04 22:33:53 +01:00
Benoit Jacob
4716040703 bug #86 : use internal:: namespace instead of ei_ prefix 2010-10-25 10:15:22 -04:00
Gael Guennebaud
ff96c94043 mixing types in product step 2:
* pload* and pset1 are now templated on the packet type
* gemv routines are now embeded into a structure with
  a consistent API with respect to gemm
* some configurations of vector * matrix and matrix * matrix works fine,
  some need more work...
2010-07-11 15:48:30 +02:00
Gael Guennebaud
4161b8be67 sync 2010-07-10 22:58:51 +02:00
Konstantinos Margaritis
642cc27eb1 forgot to commit ei_p4f_FORWARD; 2010-07-09 18:08:18 +03:00
Gael Guennebaud
300a226ffa scalars fitting in a single packet requires more work, step 1
* add a, Alignable trait
* update LinearVectorization assignment
2010-07-08 14:27:47 +02:00
Gael Guennebaud
b0896382a3 s/IsVectorized/Vectorizable 2010-07-07 11:10:46 +02:00
Gael Guennebaud
bfa606d16f * add a IsVectorized mechanism (instead of packet-size>1...)
* vectorize complex<double>
2010-07-06 23:36:00 +02:00
Konstantinos Margaritis
cf3616b2c0 AltiVec signed integer pmadd removed, proved to be 2x slower than the scalar trait(!). 2010-06-28 21:24:55 +03:00
Gael Guennebaud
88cd6885be Add a proof concept API to configure the blocking parameters at runtime.
After validation of the final API I'll update the other products to use it.
2010-06-07 16:35:25 +02:00
Konstantinos Margaritis
9337f371d2 (proper commit this time)
replaced _mm_prefetch in GeneralBlockPanelKernel.h, with ei_prefetch() inline function.
Implemented NEON and AltiVec versions, copied SSE version over from GeneralBlockPanelKernel.h.
Also in GCC case (or rather !_MSC_VER) it's implemented using __builtin_prefetch().
NEON managed to give a small but welcome boost, 0.88GFLOPS -> 0.91GFLOPS.
2010-04-24 00:58:44 +03:00
Konstantinos Margaritis
5acf46bd12 Backed out changeset 6972c140f737874d88da0e225c7c27b4563a4518 2010-04-24 00:57:10 +03:00
oem
6972c140f7 replaced _mm_prefetch in GeneralBlockPanelKernel.h, with ei_prefetch() inline function.
Implemented NEON and AltiVec versions, copied SSE version over from GeneralBlockPanelKernel.h.
Also in GCC case (or rather !_MSC_VER) it's implemented using __builtin_prefetch().
NEON managed to give a small but welcome boost, 0.88GFLOPS -> 0.91GFLOPS.
2010-04-24 00:44:14 +03:00
Gael Guennebaud
afd7ee759b fix copy pasted comment 2010-03-05 21:35:11 +01:00
Konstantinos Margaritis
273b236f72 Altivec brought up to date. Most tests pass and performance is better than before too! 2010-03-05 22:28:49 +02:00
Konstantinos Margaritis
112c550b4a Added initial NEON support, most tests pass however we had to use some hackish workarounds
as gcc on ARM (both CodeSourcery 4.4.1 used and experimental 4.5) fail to
ensure proper alignment with __attribute__((aligned(16))). This has to be
fixed upstream to remove the workarounds.
2010-03-03 11:25:41 -06:00
Benoit Jacob
d41577819b we were already aligning to 16 byte boundary fixed-size objects that are multiple of 16 bytes;
now we also align to 8byte boundary fixed-size objects that are multiple of 8 bytes.
That's only useful for now for double, not e.g. for Vector2f, but that didn't seem to hurt. Am I missing something? Do you prefer that we don't align Vector2f at all?
Also, improvements in test_unalignedassert.
2009-10-05 10:11:11 -04:00
Benoit Jacob
6347b1db5b remove sentence "Eigen itself is part of the KDE project."
it never made very precise sense. but now does it still make any?
2009-05-22 20:25:33 +02:00
Gael Guennebaud
17860e578c add SSE2 versions of sin, cos, log, exp using code from Julien
Pommier. They are for float only, and they return exactly the same
result as the standard versions in about 90% of the cases. Otherwise the max error
is below 1e-7. However, for very large values (>1e3) the accuracy of sin and cos
slighlty decrease. They are about 3 or 4 times faster than 4 calls to their respective
standard versions. So, is it ok to enable them by default in their respective functors ?
2009-03-25 12:26:13 +00:00
Konstantinos A. Margaritis
fe00e864a1 ei_pnegate implemented for AltiVec 2009-03-20 17:26:50 +00:00
Gael Guennebaud
fbf415c547 add vectorization of unary operator-() (the AltiVec version is probably
broken)
2009-03-20 10:03:24 +00:00
Gael Guennebaud
3f80c68be5 add the vectorization of abs 2009-03-09 18:40:09 +00:00
Konstantinos A. Margaritis
349557db9a no reason for 3 vec_mins, 2 are enough apparently in ei_predux_min 2009-02-12 22:03:30 +00:00
Konstantinos A. Margaritis
ad2bf14dbb modified ei_predux_min/max to actually use altivec instructions 2009-02-12 21:58:44 +00:00
Gael Guennebaud
51c991af45 * exit Sum.h, exit Prod.h, welcome vectorization of redux() !
* add vectorization for minCoeff and maxCoeff
2009-02-12 15:18:59 +00:00
Gael Guennebaud
7954f7709a add ei_predux_mul for AltiVec 2009-02-10 18:26:59 +00:00