Gael Guennebaud
d5a795f673
New gebp kernel handling up to 3 packets x 4 register-level blocks. Huge speeup on Haswell.
...
This changeset also introduce new vector functions: ploadquad and predux4.
2014-04-16 17:05:11 +02:00
Gael Guennebaud
10aa14592a
Add a mechanism to recursively access to half-size packet types
2014-03-28 10:18:04 +01:00
Gael Guennebaud
6dc0e59b1e
Fix bug #677 : compilation issue on arm64 which does not have the PLD instruction
2013-10-31 13:52:43 +01:00
Simon Pilgrim
fab0235369
Fix bug #590 : NEON Duplicate lane load
2013-06-23 14:13:21 +02:00
Gael Guennebaud
b3adc4face
Add missing pconj specializations
2013-05-17 17:25:29 +02:00
Benoit Jacob
69124cfca2
Automatic relicensing to MPL2 using Keirs script. Manual fixup follows.
2012-07-13 14:42:47 -04:00
Konstantinos Margaritis
d878cf2227
fix typo
2012-07-04 11:28:59 +03:00
Konstantinos Margaritis
f737536744
fix NEON port, use vget_lane_*() instead of temporary variables (saves extra
...
load/store), following advice by Josh Bleecher Snyder <josharian@gmail.com>.
Also implement pmadd() using vmla instead of nested padd/pmul.
2012-07-04 11:12:02 +03:00
kmargar
97cdf6ce9e
ARM NEON supports multiply-accumulate instruction vmla, use that in pmadd().
2012-05-28 14:55:23 +03:00
Jitse Niesen
3c412183b2
Get rid of include directives inside namespace blocks (bug #339 ).
2012-04-15 11:06:28 +01:00
Marton Danoczy
f422668d39
Patches to support ARM NEON with Clang 3.0 and LLVM-GCC
2011-11-04 16:37:10 +01:00
Gael Guennebaud
f2837aebc4
NEON: fix plset
2011-05-18 21:12:08 +02:00
Gael Guennebaud
85c137ccd4
NEON: fix ploaddup
2011-05-18 08:15:47 +02:00
Gael Guennebaud
659c97ee49
gcc 4.4 also defines float32_t as a special type
2011-02-22 10:04:09 +01:00
Gael Guennebaud
c5c8efa575
workaround gcc 4.2 and 4.3 compilation issue with NEON
2011-02-07 16:41:21 +01:00
Jitse Niesen
e2d46eac42
Remove all references to EIGEN_TUNE_CPU_CACHE_SIZE.
...
This macro is no longer used as of revision 0212eec23f4cb64e8426bf32568156df302f8fcf
.
2011-02-04 22:33:53 +01:00
Konstantinos Margaritis
e05c79cbd8
Fixed NEON compilation errors, changed float-abi back to softfp (which is the most used right now).
...
Some complex tests appear to segfault, needs a more careful look.
2010-12-10 20:27:46 +02:00
Benoit Jacob
4716040703
bug #86 : use internal:: namespace instead of ei_ prefix
2010-10-25 10:15:22 -04:00
Gael Guennebaud
ced1a45f82
add NEON ploaddup and pcplxflip functions
2010-07-20 14:24:01 +02:00
Gael Guennebaud
ff96c94043
mixing types in product step 2:
...
* pload* and pset1 are now templated on the packet type
* gemv routines are now embeded into a structure with
a consistent API with respect to gemm
* some configurations of vector * matrix and matrix * matrix works fine,
some need more work...
2010-07-11 15:48:30 +02:00
Gael Guennebaud
4161b8be67
sync
2010-07-10 22:58:51 +02:00
Konstantinos Margaritis
6ad3f1ab1f
Added NEON/Complex.h, ~3.5x faster than scalar std::complex<float>
...
minor fix in AltiVec Complex.h
2010-07-10 00:09:29 +03:00
Gael Guennebaud
300a226ffa
scalars fitting in a single packet requires more work, step 1
...
* add a, Alignable trait
* update LinearVectorization assignment
2010-07-08 14:27:47 +02:00
Gael Guennebaud
b0896382a3
s/IsVectorized/Vectorizable
2010-07-07 11:10:46 +02:00
Gael Guennebaud
bfa606d16f
* add a IsVectorized mechanism (instead of packet-size>1...)
...
* vectorize complex<double>
2010-07-06 23:36:00 +02:00
Gael Guennebaud
28e64b0da3
email change
2010-06-24 23:21:58 +02:00
Gael Guennebaud
88cd6885be
Add a proof concept API to configure the blocking parameters at runtime.
...
After validation of the final API I'll update the other products to use it.
2010-06-07 16:35:25 +02:00
Konstantinos Margaritis
9337f371d2
(proper commit this time)
...
replaced _mm_prefetch in GeneralBlockPanelKernel.h, with ei_prefetch() inline function.
Implemented NEON and AltiVec versions, copied SSE version over from GeneralBlockPanelKernel.h.
Also in GCC case (or rather !_MSC_VER) it's implemented using __builtin_prefetch().
NEON managed to give a small but welcome boost, 0.88GFLOPS -> 0.91GFLOPS.
2010-04-24 00:58:44 +03:00
Konstantinos Margaritis
5acf46bd12
Backed out changeset 6972c140f737874d88da0e225c7c27b4563a4518
2010-04-24 00:57:10 +03:00
oem
6972c140f7
replaced _mm_prefetch in GeneralBlockPanelKernel.h, with ei_prefetch() inline function.
...
Implemented NEON and AltiVec versions, copied SSE version over from GeneralBlockPanelKernel.h.
Also in GCC case (or rather !_MSC_VER) it's implemented using __builtin_prefetch().
NEON managed to give a small but welcome boost, 0.88GFLOPS -> 0.91GFLOPS.
2010-04-24 00:44:14 +03:00
Gael Guennebaud
ea8cad5151
make the number of registers easier to configure per architectures
2010-03-04 18:58:12 +01:00
Gael Guennebaud
8ed1ef4469
add a minor FIXME
2010-03-04 18:30:28 +01:00
Konstantinos Margaritis
112c550b4a
Added initial NEON support, most tests pass however we had to use some hackish workarounds
...
as gcc on ARM (both CodeSourcery 4.4.1 used and experimental 4.5) fail to
ensure proper alignment with __attribute__((aligned(16))). This has to be
fixed upstream to remove the workarounds.
2010-03-03 11:25:41 -06:00