Gael Guennebaud
f8aae7a908
* _mm_loaddup_pd is slow
...
* optimize SSE ei_ploaddup<Packet4f>
2010-07-19 15:43:27 +02:00
Gael Guennebaud
cd0e5dca9b
wip: extend the gebp kernel to optimize complex and mixed products
2010-07-19 08:50:59 +02:00
Gael Guennebaud
36d9b51a44
optimize non fused MADD, and add a flatten attribute macro to enforce
...
inlining within a function
2010-07-13 15:16:34 +02:00
Gael Guennebaud
f8678272a4
mixing types step 3:
...
- improve support of colmajor by vector and matrix - matrix
- now all configurations are well handled, but the perf are not always very good
2010-07-11 23:57:23 +02:00
Gael Guennebaud
ff96c94043
mixing types in product step 2:
...
* pload* and pset1 are now templated on the packet type
* gemv routines are now embeded into a structure with
a consistent API with respect to gemm
* some configurations of vector * matrix and matrix * matrix works fine,
some need more work...
2010-07-11 15:48:30 +02:00
Gael Guennebaud
4161b8be67
sync
2010-07-10 22:58:51 +02:00
Benoit Jacob
6dcd373b9d
let ei_pset1 use _mm_loaddup_pd. Not a significant speed improvement, but also not a speed regression, and replaces 3 instructions by 1 single instruction.
2010-07-09 18:51:17 -04:00
Konstantinos Margaritis
6ad3f1ab1f
Added NEON/Complex.h, ~3.5x faster than scalar std::complex<float>
...
minor fix in AltiVec Complex.h
2010-07-10 00:09:29 +03:00
Gael Guennebaud
96f9015807
disable MSVC optimization when the underlying compiler is ICC
2010-07-09 19:33:43 +02:00
Konstantinos Margaritis
642cc27eb1
forgot to commit ei_p4f_FORWARD;
2010-07-09 18:08:18 +03:00
Konstantinos Margaritis
d9e134c73c
Altivec port of Complex.h.
...
Note: For some reason g++ 4.4 is >200% slower than g++ 4.3 on altivec code.
The same benchmark (bench_gemm) was tested, on the same hardware/OS (G4/Debian testing),
with same CFLAGS. With some code reorganizing I managed to get some minor gain
on 4.4, but I just could not reach 4.3 speed. This is most likely a bug, but I'm waiting
to see if it's fixed on 4.5. I'll look into this a bit more.
2010-07-09 17:54:41 +03:00
Gael Guennebaud
b1a17dbfe4
fix a few weird issues with gcc 4.3 32bits and complex<float>
2010-07-09 08:27:58 +02:00
Gael Guennebaud
300a226ffa
scalars fitting in a single packet requires more work, step 1
...
* add a, Alignable trait
* update LinearVectorization assignment
2010-07-08 14:27:47 +02:00
Gael Guennebaud
2066ed91de
enabling aligned loads/store for complex<double> is much more tricky,
...
so the temporary fix is to always perform unaligned load/store
2010-07-07 22:50:19 +02:00
Gael Guennebaud
d89925e6de
an attempt to fix wrong unaligned store
2010-07-07 22:35:06 +02:00
Gael Guennebaud
31a36aa9c4
support for real * complex matrix product - step 1 (works for some special cases)
2010-07-07 19:49:09 +02:00
Gael Guennebaud
a2415388ef
optimized conjugate products for SSE3
2010-07-07 16:37:20 +02:00
Gael Guennebaud
65257f6b29
optimize for SSE3 => significant speed up !!
2010-07-07 15:34:46 +02:00
Gael Guennebaud
dd18b22f0b
optimize pmul for complex<double>
2010-07-07 15:29:04 +02:00
Gael Guennebaud
e07c0f6bb5
cleanning
2010-07-07 11:41:29 +02:00
Gael Guennebaud
b0896382a3
s/IsVectorized/Vectorizable
2010-07-07 11:10:46 +02:00
Gael Guennebaud
f8d3b4c060
fix mixing types in DiagonalProduct
2010-07-07 09:43:29 +02:00
Gael Guennebaud
bfa606d16f
* add a IsVectorized mechanism (instead of packet-size>1...)
...
* vectorize complex<double>
2010-07-06 23:36:00 +02:00
Gael Guennebaud
d6454788d9
add support for vectorized conjugated products
2010-07-06 19:10:24 +02:00
Gael Guennebaud
c69a226192
* extend the Has* packet traits and makes all functor use it
...
* extend the packing routines to support conjugation
2010-07-05 23:27:54 +02:00
Gael Guennebaud
e1eccfad3f
add intitial support for the vectorization of complex<float>
2010-07-05 16:18:09 +02:00
Konstantinos Margaritis
cf3616b2c0
AltiVec signed integer pmadd removed, proved to be 2x slower than the scalar trait(!).
2010-06-28 21:24:55 +03:00
Gael Guennebaud
75b6d2b2f8
fix very annoying warning (gcc 4.3): type qualifiers ignored on function return type
2010-06-25 13:20:34 +02:00
Gael Guennebaud
28e64b0da3
email change
2010-06-24 23:21:58 +02:00
Benoit Jacob
f0a6d56f07
fix linking errors with multiply defined functions
2010-06-18 09:01:34 -04:00
Benoit Jacob
134ca4acb3
packet math functions:
...
- take const Packet& args like the other packet funcs
- SSE specializations: make them be actual template specializations
2010-06-15 08:29:21 -04:00
Gael Guennebaud
88cd6885be
Add a proof concept API to configure the blocking parameters at runtime.
...
After validation of the final API I'll update the other products to use it.
2010-06-07 16:35:25 +02:00
Konstantinos Margaritis
9337f371d2
(proper commit this time)
...
replaced _mm_prefetch in GeneralBlockPanelKernel.h, with ei_prefetch() inline function.
Implemented NEON and AltiVec versions, copied SSE version over from GeneralBlockPanelKernel.h.
Also in GCC case (or rather !_MSC_VER) it's implemented using __builtin_prefetch().
NEON managed to give a small but welcome boost, 0.88GFLOPS -> 0.91GFLOPS.
2010-04-24 00:58:44 +03:00
Konstantinos Margaritis
5acf46bd12
Backed out changeset 6972c140f737874d88da0e225c7c27b4563a4518
2010-04-24 00:57:10 +03:00
oem
6972c140f7
replaced _mm_prefetch in GeneralBlockPanelKernel.h, with ei_prefetch() inline function.
...
Implemented NEON and AltiVec versions, copied SSE version over from GeneralBlockPanelKernel.h.
Also in GCC case (or rather !_MSC_VER) it's implemented using __builtin_prefetch().
NEON managed to give a small but welcome boost, 0.88GFLOPS -> 0.91GFLOPS.
2010-04-24 00:44:14 +03:00
Hauke Heibel
9d6afdeb22
ei_psqrt fix for zero input
2010-04-01 15:10:52 +02:00
Gael Guennebaud
9a3b00c040
add missing cmake directives for arch/Default
2010-03-08 22:15:35 +01:00
Hauke Heibel
9fe040ad29
Reintroduced the if-clause for MSVC ei_ploadu via _loadu_.
2010-03-07 14:05:26 +01:00
Gael Guennebaud
afd7ee759b
fix copy pasted comment
2010-03-05 21:35:11 +01:00
Konstantinos Margaritis
273b236f72
Altivec brought up to date. Most tests pass and performance is better than before too!
2010-03-05 22:28:49 +02:00
Gael Guennebaud
7e2683dc39
merge
2010-03-04 18:59:56 +01:00
Gael Guennebaud
ea8cad5151
make the number of registers easier to configure per architectures
2010-03-04 18:58:12 +01:00
Gael Guennebaud
cefd9b8888
merge with default branch
2010-03-04 18:47:52 +01:00
Gael Guennebaud
8ed1ef4469
add a minor FIXME
2010-03-04 18:30:28 +01:00
Gael Guennebaud
7dd81aad74
factorize default performance related settings to a single file
...
included after the architecture specific files such that they
can be adapted by each platform.
2010-03-03 18:47:58 +01:00
Konstantinos Margaritis
112c550b4a
Added initial NEON support, most tests pass however we had to use some hackish workarounds
...
as gcc on ARM (both CodeSourcery 4.4.1 used and experimental 4.5) fail to
ensure proper alignment with __attribute__((aligned(16))). This has to be
fixed upstream to remove the workarounds.
2010-03-03 11:25:41 -06:00
Thomas Capricelli
0f3d69b65e
Provide "eigen" defines to decide which instruction set is used
...
(sse3, ssse3 and sse4), independantly from the compiler.
Only those defines should be used in other places, and the user can
rely on those to know which sets are used.
2010-02-24 21:43:30 +01:00
Gael Guennebaud
eb905500b6
significant speedup in the matrix-matrix products
2010-02-23 13:06:49 +01:00
Hauke Heibel
4365a48748
Added an ei_linspaced_op to create linearly spaced vectors.
...
Added setLinSpaced/LinSpaced functionality to DenseBase.
Improved vectorized assignment - overcomes MSVC optimization issues.
CwiseNullaryOp is now requiring functors to offer 1D and 2D operators.
Adapted existing functors to the new CwiseNullaryOp requirements.
Added ei_plset to create packages as [a, a+1, ..., a+size].
Added more nullaray unit tests.
2010-01-26 19:42:17 +01:00
Hauke Heibel
325da2ea3c
Fixed conservativeResize.
...
Fixed multiple overloads for operator=.
Removed debug output.
2010-01-11 13:57:50 +01:00