eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-10-21 20:41:06 +08:00

Author	SHA1	Message	Date
Konstantinos Margaritis	273b236f72	Altivec brought up to date. Most tests pass and performance is better than before too!	2010-03-05 22:28:49 +02:00
Gael Guennebaud	7e2683dc39	merge	2010-03-04 18:59:56 +01:00
Gael Guennebaud	ea8cad5151	make the number of registers easier to configure per architectures	2010-03-04 18:58:12 +01:00
Gael Guennebaud	cefd9b8888	merge with default branch	2010-03-04 18:47:52 +01:00
Gael Guennebaud	8ed1ef4469	add a minor FIXME	2010-03-04 18:30:28 +01:00
Gael Guennebaud	7dd81aad74	factorize default performance related settings to a single file included after the architecture specific files such that they can be adapted by each platform.	2010-03-03 18:47:58 +01:00
Konstantinos Margaritis	112c550b4a	Added initial NEON support, most tests pass however we had to use some hackish workarounds as gcc on ARM (both CodeSourcery 4.4.1 used and experimental 4.5) fail to ensure proper alignment with __attribute__((aligned(16))). This has to be fixed upstream to remove the workarounds.	2010-03-03 11:25:41 -06:00
Thomas Capricelli	0f3d69b65e	Provide "eigen" defines to decide which instruction set is used (sse3, ssse3 and sse4), independantly from the compiler. Only those defines should be used in other places, and the user can rely on those to know which sets are used.	2010-02-24 21:43:30 +01:00
Gael Guennebaud	eb905500b6	significant speedup in the matrix-matrix products	2010-02-23 13:06:49 +01:00
Hauke Heibel	4365a48748	Added an ei_linspaced_op to create linearly spaced vectors. Added setLinSpaced/LinSpaced functionality to DenseBase. Improved vectorized assignment - overcomes MSVC optimization issues. CwiseNullaryOp is now requiring functors to offer 1D and 2D operators. Adapted existing functors to the new CwiseNullaryOp requirements. Added ei_plset to create packages as [a, a+1, ..., a+size]. Added more nullaray unit tests.	2010-01-26 19:42:17 +01:00
Hauke Heibel	325da2ea3c	Fixed conservativeResize. Fixed multiple overloads for operator=. Removed debug output.	2010-01-11 13:57:50 +01:00
Gael Guennebaud	eaaba30cac	merge with default branch	2009-12-22 22:51:08 +01:00
Gael Guennebaud	6db6774c46	* fix aliasing checks when the lhs is also transposed. At the same time, significantly simplify the code of these checks while extending them to catch much more expressions! * move the enabling/disabling of vectorized sin/cos to the architecture traits	2009-12-16 11:41:16 +01:00
Hauke Heibel	3ea1f97f69	Suppressed the warning for missing assignment generators (forgot that in the last submission). Commented Quake3's fast inverser sqrt in SSE's MathFunction header.	2009-12-15 08:09:14 +01:00
Benoit Jacob	684d76eba3	add SSE4 support, start with integer multiplication	2009-11-24 15:12:43 -05:00
Gael Guennebaud	eb8f450071	Hey, finally the copyCoeff stuff is not only used to implement swap anymore :) Add an internal pseudo expression allowing to optimize operators like +=, *= using the copyCoeff stuff. This allows to easily enforce aligned load for the destination matrix everywhere.	2009-11-20 15:39:38 +01:00
Benoit Jacob	92749eed11	* merge * remove a ctor in QuaternionBase as it gives a strange error with GCC 4.4.2.	2009-11-09 09:08:03 -05:00
Hauke Heibel	3979f6d8aa	Let's try to stick to the original code, thus activate the fix of #62 only for 64 bit builds.	2009-11-04 15:49:22 +01:00
Hauke Heibel	e2170b9f7e	Direct access of the packet structs fixes bug #62 and doe not seem to influence compiler optimization.	2009-11-04 15:38:11 +01:00
Benoit Jacob	d41577819b	we were already aligning to 16 byte boundary fixed-size objects that are multiple of 16 bytes; now we also align to 8byte boundary fixed-size objects that are multiple of 8 bytes. That's only useful for now for double, not e.g. for Vector2f, but that didn't seem to hurt. Am I missing something? Do you prefer that we don't align Vector2f at all? Also, improvements in test_unalignedassert.	2009-10-05 10:11:11 -04:00
Gael Guennebaud	5ba7fe3bee	clean the commented asm instructions because now I'm sure the previous fix is ok	2009-09-17 23:34:00 +02:00
Gael Guennebaud	9395326e44	fix #53 : performance regression, hopefully I did not resurected another perf. issue...	2009-09-17 23:18:21 +02:00
Gael Guennebaud	ef55e7f4ce	make custom asm directive volatile	2009-08-09 23:09:46 +02:00
Gael Guennebaud	d1dc088ef0	* implement a second level of micro blocking (faster for small sizes) * workaround GCC bad implementation of _mm_set1_p*	2009-08-07 11:09:34 +02:00
Gael Guennebaud	1a1b2e9f27	finally directly calling the low-level products is faster	2009-07-10 10:41:26 +02:00
Benoit Jacob	fc9000f23e	only disable the inline ASM if we're NEITHER gcc nor icc. right ??	2009-06-26 05:32:21 +02:00
Gael Guennebaud	a44f7cf440	re-enable the fast unaligned loads for gcc and icc using inline assembly (this allows to avoid incompatible pointer casts and to specify the dependency to the data explicitely)	2009-06-24 10:48:36 +02:00
Gael Guennebaud	aa17b5b514	use the slower unaligned load intrinsics in ei_ploadu because GCC mess up with my tricks	2009-06-23 23:28:34 +02:00
Benoit Jacob	6347b1db5b	remove sentence "Eigen itself is part of the KDE project." it never made very precise sense. but now does it still make any?	2009-05-22 20:25:33 +02:00
Gael Guennebaud	1e286464ab	* compilation fixes for gcc 3.3 * test Part::swap	2009-05-06 08:43:38 +00:00
Benoit Jacob	b60571a193	fix warnings with unused static functions	2009-05-04 12:49:56 +00:00
Gael Guennebaud	c7bb7436f9	make the ei_p* math functions overloads instead of template specializations	2009-04-22 21:35:50 +00:00
Benoit Jacob	0c99de5a17	more patches from Hauke Heibel: compilation/warning fixes from VC++	2009-04-09 17:19:17 +00:00
Gael Guennebaud	e8329f9f45	relicence Julien Pommier's SSE code to Eigen's licenses	2009-04-09 06:03:51 +00:00
Benoit Jacob	502bf4a81d	* fix the binary bloat issue, Rohit's idea was the good one * a few dox fixes (alloc routines do return 0 on error) and forgot to update version number in CMakeLists	2009-04-06 13:33:42 +00:00
Gael Guennebaud	49fc1e3e84	add vectorization of sqrt for float	2009-03-27 14:41:46 +00:00
Gael Guennebaud	a22ef7e1f3	for some reason passing the argument by const reference killed the perf (in the packet version of sin, cos, exp, lop), so let's pass them by value. Also, improve the perf of ei_plog by reducing dependencies.	2009-03-25 18:33:36 +00:00
Gael Guennebaud	17860e578c	add SSE2 versions of sin, cos, log, exp using code from Julien Pommier. They are for float only, and they return exactly the same result as the standard versions in about 90% of the cases. Otherwise the max error is below 1e-7. However, for very large values (>1e3) the accuracy of sin and cos slighlty decrease. They are about 3 or 4 times faster than 4 calls to their respective standard versions. So, is it ok to enable them by default in their respective functors ?	2009-03-25 12:26:13 +00:00
Konstantinos A. Margaritis	fe00e864a1	ei_pnegate implemented for AltiVec	2009-03-20 17:26:50 +00:00
Gael Guennebaud	fbf415c547	add vectorization of unary operator-() (the AltiVec version is probably broken)	2009-03-20 10:03:24 +00:00
Gael Guennebaud	3f80c68be5	add the vectorization of abs	2009-03-09 18:40:09 +00:00
Gael Guennebaud	7718a8ed83	slight optimization of SSE base integer mul (thanks to Rohit Garg)	2009-03-08 10:14:07 +00:00
Gael Guennebaud	3288e9e168	add much faster versions of unaligned stores (and slightly faster unaligned loads)	2009-03-03 14:01:30 +00:00
Laurent Montel	2d6d14a3d3	Add COMPONENT Devel	2009-02-23 07:50:56 +00:00
Konstantinos A. Margaritis	349557db9a	no reason for 3 vec_mins, 2 are enough apparently in ei_predux_min	2009-02-12 22:03:30 +00:00
Konstantinos A. Margaritis	ad2bf14dbb	modified ei_predux_min/max to actually use altivec instructions	2009-02-12 21:58:44 +00:00
Gael Guennebaud	51c991af45	* exit Sum.h, exit Prod.h, welcome vectorization of redux() ! * add vectorization for minCoeff and maxCoeff	2009-02-12 15:18:59 +00:00
Gael Guennebaud	7954f7709a	add ei_predux_mul for AltiVec	2009-02-10 18:26:59 +00:00
Gael Guennebaud	cbbc6d940b	* add ei_predux_mul internal function * apply Ricard Marxer's prod() patch with fixes for the vectorized path	2009-02-10 18:06:05 +00:00
Konstantinos A. Margaritis	15e40b1099	fixed preserve_mask definition for AltiVec (needed __vector keyword)	2009-02-08 18:43:57 +00:00

1 2 3

125 Commits