Konstantinos Margaritis
273b236f72
Altivec brought up to date. Most tests pass and performance is better than before too!
2010-03-05 22:28:49 +02:00
Gael Guennebaud
7e2683dc39
merge
2010-03-04 18:59:56 +01:00
Gael Guennebaud
ea8cad5151
make the number of registers easier to configure per architectures
2010-03-04 18:58:12 +01:00
Gael Guennebaud
cefd9b8888
merge with default branch
2010-03-04 18:47:52 +01:00
Gael Guennebaud
8ed1ef4469
add a minor FIXME
2010-03-04 18:30:28 +01:00
Gael Guennebaud
7dd81aad74
factorize default performance related settings to a single file
...
included after the architecture specific files such that they
can be adapted by each platform.
2010-03-03 18:47:58 +01:00
Konstantinos Margaritis
112c550b4a
Added initial NEON support, most tests pass however we had to use some hackish workarounds
...
as gcc on ARM (both CodeSourcery 4.4.1 used and experimental 4.5) fail to
ensure proper alignment with __attribute__((aligned(16))). This has to be
fixed upstream to remove the workarounds.
2010-03-03 11:25:41 -06:00
Thomas Capricelli
0f3d69b65e
Provide "eigen" defines to decide which instruction set is used
...
(sse3, ssse3 and sse4), independantly from the compiler.
Only those defines should be used in other places, and the user can
rely on those to know which sets are used.
2010-02-24 21:43:30 +01:00
Gael Guennebaud
eb905500b6
significant speedup in the matrix-matrix products
2010-02-23 13:06:49 +01:00
Hauke Heibel
4365a48748
Added an ei_linspaced_op to create linearly spaced vectors.
...
Added setLinSpaced/LinSpaced functionality to DenseBase.
Improved vectorized assignment - overcomes MSVC optimization issues.
CwiseNullaryOp is now requiring functors to offer 1D and 2D operators.
Adapted existing functors to the new CwiseNullaryOp requirements.
Added ei_plset to create packages as [a, a+1, ..., a+size].
Added more nullaray unit tests.
2010-01-26 19:42:17 +01:00
Hauke Heibel
325da2ea3c
Fixed conservativeResize.
...
Fixed multiple overloads for operator=.
Removed debug output.
2010-01-11 13:57:50 +01:00
Gael Guennebaud
eaaba30cac
merge with default branch
2009-12-22 22:51:08 +01:00
Gael Guennebaud
6db6774c46
* fix aliasing checks when the lhs is also transposed. At the same time,
...
significantly simplify the code of these checks while extending them
to catch much more expressions!
* move the enabling/disabling of vectorized sin/cos to the architecture traits
2009-12-16 11:41:16 +01:00
Hauke Heibel
3ea1f97f69
Suppressed the warning for missing assignment generators (forgot that in the last submission).
...
Commented Quake3's fast inverser sqrt in SSE's MathFunction header.
2009-12-15 08:09:14 +01:00
Benoit Jacob
684d76eba3
add SSE4 support, start with integer multiplication
2009-11-24 15:12:43 -05:00
Gael Guennebaud
eb8f450071
Hey, finally the copyCoeff stuff is not only used to implement swap anymore :)
...
Add an internal pseudo expression allowing to optimize operators like +=, *= using
the copyCoeff stuff.
This allows to easily enforce aligned load for the destination matrix everywhere.
2009-11-20 15:39:38 +01:00
Benoit Jacob
92749eed11
* merge
...
* remove a ctor in QuaternionBase as it gives a strange error with GCC 4.4.2.
2009-11-09 09:08:03 -05:00
Hauke Heibel
3979f6d8aa
Let's try to stick to the original code, thus activate the fix of #62 only for 64 bit builds.
2009-11-04 15:49:22 +01:00
Hauke Heibel
e2170b9f7e
Direct access of the packet structs fixes bug #62 and doe not seem to
...
influence compiler optimization.
2009-11-04 15:38:11 +01:00
Benoit Jacob
d41577819b
we were already aligning to 16 byte boundary fixed-size objects that are multiple of 16 bytes;
...
now we also align to 8byte boundary fixed-size objects that are multiple of 8 bytes.
That's only useful for now for double, not e.g. for Vector2f, but that didn't seem to hurt. Am I missing something? Do you prefer that we don't align Vector2f at all?
Also, improvements in test_unalignedassert.
2009-10-05 10:11:11 -04:00
Gael Guennebaud
5ba7fe3bee
clean the commented asm instructions because now I'm sure
...
the previous fix is ok
2009-09-17 23:34:00 +02:00
Gael Guennebaud
9395326e44
fix #53 : performance regression, hopefully I did not resurected another
...
perf. issue...
2009-09-17 23:18:21 +02:00
Gael Guennebaud
ef55e7f4ce
make custom asm directive volatile
2009-08-09 23:09:46 +02:00
Gael Guennebaud
d1dc088ef0
* implement a second level of micro blocking (faster for small sizes)
...
* workaround GCC bad implementation of _mm_set1_p*
2009-08-07 11:09:34 +02:00
Gael Guennebaud
1a1b2e9f27
finally directly calling the low-level products is faster
2009-07-10 10:41:26 +02:00
Benoit Jacob
fc9000f23e
only disable the inline ASM if we're NEITHER gcc nor icc. right ??
2009-06-26 05:32:21 +02:00
Gael Guennebaud
a44f7cf440
re-enable the fast unaligned loads for gcc and icc using inline assembly
...
(this allows to avoid incompatible pointer casts and to specify the dependency to the data explicitely)
2009-06-24 10:48:36 +02:00
Gael Guennebaud
aa17b5b514
use the slower unaligned load intrinsics in ei_ploadu because GCC mess up with my tricks
2009-06-23 23:28:34 +02:00
Benoit Jacob
6347b1db5b
remove sentence "Eigen itself is part of the KDE project."
...
it never made very precise sense. but now does it still make any?
2009-05-22 20:25:33 +02:00
Gael Guennebaud
1e286464ab
* compilation fixes for gcc 3.3
...
* test Part::swap
2009-05-06 08:43:38 +00:00
Benoit Jacob
b60571a193
fix warnings with unused static functions
2009-05-04 12:49:56 +00:00
Gael Guennebaud
c7bb7436f9
make the ei_p* math functions overloads instead of template
...
specializations
2009-04-22 21:35:50 +00:00
Benoit Jacob
0c99de5a17
more patches from Hauke Heibel: compilation/warning fixes from VC++
2009-04-09 17:19:17 +00:00
Gael Guennebaud
e8329f9f45
relicence Julien Pommier's SSE code to Eigen's licenses
2009-04-09 06:03:51 +00:00
Benoit Jacob
502bf4a81d
* fix the binary bloat issue, Rohit's idea was the good one
...
* a few dox fixes (alloc routines do return 0 on error) and forgot to update version number in CMakeLists
2009-04-06 13:33:42 +00:00
Gael Guennebaud
49fc1e3e84
add vectorization of sqrt for float
2009-03-27 14:41:46 +00:00
Gael Guennebaud
a22ef7e1f3
for some reason passing the argument by const reference killed the perf
...
(in the packet version of sin, cos, exp, lop), so let's pass them by
value. Also, improve the perf of ei_plog by reducing dependencies.
2009-03-25 18:33:36 +00:00
Gael Guennebaud
17860e578c
add SSE2 versions of sin, cos, log, exp using code from Julien
...
Pommier. They are for float only, and they return exactly the same
result as the standard versions in about 90% of the cases. Otherwise the max error
is below 1e-7. However, for very large values (>1e3) the accuracy of sin and cos
slighlty decrease. They are about 3 or 4 times faster than 4 calls to their respective
standard versions. So, is it ok to enable them by default in their respective functors ?
2009-03-25 12:26:13 +00:00
Konstantinos A. Margaritis
fe00e864a1
ei_pnegate implemented for AltiVec
2009-03-20 17:26:50 +00:00
Gael Guennebaud
fbf415c547
add vectorization of unary operator-() (the AltiVec version is probably
...
broken)
2009-03-20 10:03:24 +00:00
Gael Guennebaud
3f80c68be5
add the vectorization of abs
2009-03-09 18:40:09 +00:00
Gael Guennebaud
7718a8ed83
slight optimization of SSE base integer mul (thanks to Rohit Garg)
2009-03-08 10:14:07 +00:00
Gael Guennebaud
3288e9e168
add much faster versions of unaligned stores (and slightly faster
...
unaligned loads)
2009-03-03 14:01:30 +00:00
Laurent Montel
2d6d14a3d3
Add COMPONENT Devel
2009-02-23 07:50:56 +00:00
Konstantinos A. Margaritis
349557db9a
no reason for 3 vec_mins, 2 are enough apparently in ei_predux_min
2009-02-12 22:03:30 +00:00
Konstantinos A. Margaritis
ad2bf14dbb
modified ei_predux_min/max to actually use altivec instructions
2009-02-12 21:58:44 +00:00
Gael Guennebaud
51c991af45
* exit Sum.h, exit Prod.h, welcome vectorization of redux() !
...
* add vectorization for minCoeff and maxCoeff
2009-02-12 15:18:59 +00:00
Gael Guennebaud
7954f7709a
add ei_predux_mul for AltiVec
2009-02-10 18:26:59 +00:00
Gael Guennebaud
cbbc6d940b
* add ei_predux_mul internal function
...
* apply Ricard Marxer's prod() patch with fixes for the vectorized path
2009-02-10 18:06:05 +00:00
Konstantinos A. Margaritis
15e40b1099
fixed preserve_mask definition for AltiVec (needed __vector keyword)
2009-02-08 18:43:57 +00:00