Gael Guennebaud
dd18b22f0b
optimize pmul for complex<double>
2010-07-07 15:29:04 +02:00
Gael Guennebaud
e07c0f6bb5
cleanning
2010-07-07 11:41:29 +02:00
Gael Guennebaud
b0896382a3
s/IsVectorized/Vectorizable
2010-07-07 11:10:46 +02:00
Gael Guennebaud
f8d3b4c060
fix mixing types in DiagonalProduct
2010-07-07 09:43:29 +02:00
Gael Guennebaud
bfa606d16f
* add a IsVectorized mechanism (instead of packet-size>1...)
...
* vectorize complex<double>
2010-07-06 23:36:00 +02:00
Gael Guennebaud
d6454788d9
add support for vectorized conjugated products
2010-07-06 19:10:24 +02:00
Gael Guennebaud
c69a226192
* extend the Has* packet traits and makes all functor use it
...
* extend the packing routines to support conjugation
2010-07-05 23:27:54 +02:00
Gael Guennebaud
e1eccfad3f
add intitial support for the vectorization of complex<float>
2010-07-05 16:18:09 +02:00
Konstantinos Margaritis
cf3616b2c0
AltiVec signed integer pmadd removed, proved to be 2x slower than the scalar trait(!).
2010-06-28 21:24:55 +03:00
Gael Guennebaud
75b6d2b2f8
fix very annoying warning (gcc 4.3): type qualifiers ignored on function return type
2010-06-25 13:20:34 +02:00
Gael Guennebaud
28e64b0da3
email change
2010-06-24 23:21:58 +02:00
Benoit Jacob
f0a6d56f07
fix linking errors with multiply defined functions
2010-06-18 09:01:34 -04:00
Benoit Jacob
134ca4acb3
packet math functions:
...
- take const Packet& args like the other packet funcs
- SSE specializations: make them be actual template specializations
2010-06-15 08:29:21 -04:00
Gael Guennebaud
88cd6885be
Add a proof concept API to configure the blocking parameters at runtime.
...
After validation of the final API I'll update the other products to use it.
2010-06-07 16:35:25 +02:00
Konstantinos Margaritis
9337f371d2
(proper commit this time)
...
replaced _mm_prefetch in GeneralBlockPanelKernel.h, with ei_prefetch() inline function.
Implemented NEON and AltiVec versions, copied SSE version over from GeneralBlockPanelKernel.h.
Also in GCC case (or rather !_MSC_VER) it's implemented using __builtin_prefetch().
NEON managed to give a small but welcome boost, 0.88GFLOPS -> 0.91GFLOPS.
2010-04-24 00:58:44 +03:00
Konstantinos Margaritis
5acf46bd12
Backed out changeset 6972c140f737874d88da0e225c7c27b4563a4518
2010-04-24 00:57:10 +03:00
oem
6972c140f7
replaced _mm_prefetch in GeneralBlockPanelKernel.h, with ei_prefetch() inline function.
...
Implemented NEON and AltiVec versions, copied SSE version over from GeneralBlockPanelKernel.h.
Also in GCC case (or rather !_MSC_VER) it's implemented using __builtin_prefetch().
NEON managed to give a small but welcome boost, 0.88GFLOPS -> 0.91GFLOPS.
2010-04-24 00:44:14 +03:00
Hauke Heibel
9d6afdeb22
ei_psqrt fix for zero input
2010-04-01 15:10:52 +02:00
Gael Guennebaud
9a3b00c040
add missing cmake directives for arch/Default
2010-03-08 22:15:35 +01:00
Hauke Heibel
9fe040ad29
Reintroduced the if-clause for MSVC ei_ploadu via _loadu_.
2010-03-07 14:05:26 +01:00
Gael Guennebaud
afd7ee759b
fix copy pasted comment
2010-03-05 21:35:11 +01:00
Konstantinos Margaritis
273b236f72
Altivec brought up to date. Most tests pass and performance is better than before too!
2010-03-05 22:28:49 +02:00
Gael Guennebaud
7e2683dc39
merge
2010-03-04 18:59:56 +01:00
Gael Guennebaud
ea8cad5151
make the number of registers easier to configure per architectures
2010-03-04 18:58:12 +01:00
Gael Guennebaud
cefd9b8888
merge with default branch
2010-03-04 18:47:52 +01:00
Gael Guennebaud
8ed1ef4469
add a minor FIXME
2010-03-04 18:30:28 +01:00
Gael Guennebaud
7dd81aad74
factorize default performance related settings to a single file
...
included after the architecture specific files such that they
can be adapted by each platform.
2010-03-03 18:47:58 +01:00
Konstantinos Margaritis
112c550b4a
Added initial NEON support, most tests pass however we had to use some hackish workarounds
...
as gcc on ARM (both CodeSourcery 4.4.1 used and experimental 4.5) fail to
ensure proper alignment with __attribute__((aligned(16))). This has to be
fixed upstream to remove the workarounds.
2010-03-03 11:25:41 -06:00
Thomas Capricelli
0f3d69b65e
Provide "eigen" defines to decide which instruction set is used
...
(sse3, ssse3 and sse4), independantly from the compiler.
Only those defines should be used in other places, and the user can
rely on those to know which sets are used.
2010-02-24 21:43:30 +01:00
Gael Guennebaud
eb905500b6
significant speedup in the matrix-matrix products
2010-02-23 13:06:49 +01:00
Hauke Heibel
4365a48748
Added an ei_linspaced_op to create linearly spaced vectors.
...
Added setLinSpaced/LinSpaced functionality to DenseBase.
Improved vectorized assignment - overcomes MSVC optimization issues.
CwiseNullaryOp is now requiring functors to offer 1D and 2D operators.
Adapted existing functors to the new CwiseNullaryOp requirements.
Added ei_plset to create packages as [a, a+1, ..., a+size].
Added more nullaray unit tests.
2010-01-26 19:42:17 +01:00
Hauke Heibel
325da2ea3c
Fixed conservativeResize.
...
Fixed multiple overloads for operator=.
Removed debug output.
2010-01-11 13:57:50 +01:00
Gael Guennebaud
eaaba30cac
merge with default branch
2009-12-22 22:51:08 +01:00
Gael Guennebaud
6db6774c46
* fix aliasing checks when the lhs is also transposed. At the same time,
...
significantly simplify the code of these checks while extending them
to catch much more expressions!
* move the enabling/disabling of vectorized sin/cos to the architecture traits
2009-12-16 11:41:16 +01:00
Hauke Heibel
3ea1f97f69
Suppressed the warning for missing assignment generators (forgot that in the last submission).
...
Commented Quake3's fast inverser sqrt in SSE's MathFunction header.
2009-12-15 08:09:14 +01:00
Benoit Jacob
684d76eba3
add SSE4 support, start with integer multiplication
2009-11-24 15:12:43 -05:00
Gael Guennebaud
eb8f450071
Hey, finally the copyCoeff stuff is not only used to implement swap anymore :)
...
Add an internal pseudo expression allowing to optimize operators like +=, *= using
the copyCoeff stuff.
This allows to easily enforce aligned load for the destination matrix everywhere.
2009-11-20 15:39:38 +01:00
Benoit Jacob
92749eed11
* merge
...
* remove a ctor in QuaternionBase as it gives a strange error with GCC 4.4.2.
2009-11-09 09:08:03 -05:00
Hauke Heibel
3979f6d8aa
Let's try to stick to the original code, thus activate the fix of #62 only for 64 bit builds.
2009-11-04 15:49:22 +01:00
Hauke Heibel
e2170b9f7e
Direct access of the packet structs fixes bug #62 and doe not seem to
...
influence compiler optimization.
2009-11-04 15:38:11 +01:00
Benoit Jacob
d41577819b
we were already aligning to 16 byte boundary fixed-size objects that are multiple of 16 bytes;
...
now we also align to 8byte boundary fixed-size objects that are multiple of 8 bytes.
That's only useful for now for double, not e.g. for Vector2f, but that didn't seem to hurt. Am I missing something? Do you prefer that we don't align Vector2f at all?
Also, improvements in test_unalignedassert.
2009-10-05 10:11:11 -04:00
Gael Guennebaud
5ba7fe3bee
clean the commented asm instructions because now I'm sure
...
the previous fix is ok
2009-09-17 23:34:00 +02:00
Gael Guennebaud
9395326e44
fix #53 : performance regression, hopefully I did not resurected another
...
perf. issue...
2009-09-17 23:18:21 +02:00
Gael Guennebaud
ef55e7f4ce
make custom asm directive volatile
2009-08-09 23:09:46 +02:00
Gael Guennebaud
d1dc088ef0
* implement a second level of micro blocking (faster for small sizes)
...
* workaround GCC bad implementation of _mm_set1_p*
2009-08-07 11:09:34 +02:00
Gael Guennebaud
1a1b2e9f27
finally directly calling the low-level products is faster
2009-07-10 10:41:26 +02:00
Benoit Jacob
fc9000f23e
only disable the inline ASM if we're NEITHER gcc nor icc. right ??
2009-06-26 05:32:21 +02:00
Gael Guennebaud
a44f7cf440
re-enable the fast unaligned loads for gcc and icc using inline assembly
...
(this allows to avoid incompatible pointer casts and to specify the dependency to the data explicitely)
2009-06-24 10:48:36 +02:00
Gael Guennebaud
aa17b5b514
use the slower unaligned load intrinsics in ei_ploadu because GCC mess up with my tricks
2009-06-23 23:28:34 +02:00
Benoit Jacob
6347b1db5b
remove sentence "Eigen itself is part of the KDE project."
...
it never made very precise sense. but now does it still make any?
2009-05-22 20:25:33 +02:00