eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-06-04 18:54:00 +08:00

Author	SHA1	Message	Date
Gael Guennebaud	b3151bca40	Implement pmadd for float and double to make it consistent with the vectorized path when FMA is available.	2016-08-23 14:24:08 +02:00
Benoit Jacob	40a16282c7	Remove now-unused protate PacketMath func	2016-05-24 11:01:18 -04:00
Benoit Steiner	bfb3fcd94f	Optimized implementation of the tanh function for SSE	2016-02-10 08:52:30 -08:00
Gael Guennebaud	c2bf2f56ef	Remove custom unaligned loads for SSE. They were only useful for core2 CPU.	2016-02-08 14:29:12 +01:00
Gael Guennebaud	ae87f094eb	Fix "," in non SSE4 mode	2015-11-05 12:08:36 +01:00
Alexandre Avenel	d46e2c10a6	Add round, ceil and floor for SSE4.1/AVX (Bug #70 )	2015-11-01 10:49:27 +01:00
Gael Guennebaud	6163db814c	bug #1085 : workaround gcc default ABI issue	2015-10-10 22:38:55 +02:00
Gael Guennebaud	f047ecc36a	_mm_hadd_epi32 is for SSSE3 only (and not SSE3)	2015-10-07 15:48:35 +02:00
Gael Guennebaud	2c676ddb40	Handle various TODOs in SSE vectorization (remove splitted storeu, enable SSE3 integer vectorization, plus minor tweaks)	2015-10-06 15:43:27 +02:00
Gael Guennebaud	6245591349	Fix prototype of plset and generalize linspace functor.	2015-08-07 19:27:59 +02:00
Gael Guennebaud	ce57dbd937	Let unpacket_traits<> exposes the required alignment and make use of it everywhere	2015-08-07 10:44:01 +02:00
Benoit Steiner	f41b1f1666	Added support for fast reciprocal square root computation.	2015-02-26 09:42:41 -08:00
Benoit Jacob	9bd8a4bab5	bug #955 - Implement a rotating kernel alternative in the 3px4 gebp path This is substantially faster on ARM, where it's important to minimize the number of loads. This is specific to the case where all packet types are of size 4. I made my best attempt to minimize how dirty this is... opinions welcome. Eventually one could have a generic rotated kernel, but it would take some work to get there. Also, on sandy bridge, in my experience, it's not beneficial (even about 1% slower).	2015-02-18 15:03:35 -05:00
Gael Guennebaud	159fb181c2	Disable __m128* wrappers when compiling with AVX and -fabi-version=4	2015-02-17 16:27:20 +01:00
Gael Guennebaud	91ab2489dd	Fix compilation with GCC/AVX (workaround __m128 and __m256 being the same type with default ABI)	2015-02-17 16:08:07 +01:00
Gael Guennebaud	45cbb0bbb1	The usage of DenseIndex is deprecated, so let's replace DenseIndex by Index	2015-02-16 15:05:41 +01:00
Gael Guennebaud	0918c51e60	merge Tensor module within Eigen/unsupported and update gemv BLAS wrapper	2015-02-12 21:48:41 +01:00
Gael Guennebaud	fe25f3b8e3	FMA has been wrongly disabled	2015-02-10 23:11:35 +01:00
Benoit Steiner	c739102ef9	Pulled the latest changes from the trunk	2015-02-06 05:25:03 -08:00
Benoit Jacob	0f21613698	bug #936 , patch 2/3: Remove EIGEN_VECTORIZE_FMA, was redundant with EIGEN_HAS_SINGLE_INSTRUCTION_MADD	2015-01-30 17:44:26 -05:00
Benoit Jacob	340b8afb14	bug #936 , patch 1.5/3: rename _FUSED_ macros to _SINGLE_INSTRUCTION_, because this is what they are about. "Fused" means "no intermediate rounding between the mul and the add, only one rounding at the end". Instead, what we are concerned about here is whether a temporary register is needed, i.e. whether the MUL and ADD are separate instructions. Concretely, on ARM NEON, a single-instruction mul-add is always available: VMLA. But a true fused mul-add is only available on VFPv4: VFMA.	2015-01-31 14:15:57 -05:00
Gael Guennebaud	ee06f78679	Introduce unified macros to identify compiler, OS, and architecture. They are all defined in util/Macros.h and prefixed with EIGEN_COMP_, EIGEN_OS_, and EIGEN_ARCH_ respectively.	2014-11-04 21:58:52 +01:00
Benoit Steiner	16047c8d4a	Pulled in the latest changes from the Eigen trunk	2014-08-13 22:25:29 -07:00
Gael Guennebaud	b47ef1431f	Fix many long to int implicit conversions	2014-07-08 16:47:11 +02:00
Benoit Steiner	29aebf96e6	Created the pblend packet primitive and implemented it using SSE and AVX instructions.	2014-06-06 20:18:44 -07:00
Gael Guennebaud	450d0c3de0	Make sure that calls to broadcast4 are 16 bytes aligned	2014-04-25 22:25:48 +02:00
Gael Guennebaud	3d8d0f6269	Enable vectorization of pack_rhs with a column-major RHS. Rename and generalize Kernel<> to PacketBlock<,N>.	2014-04-25 10:56:18 +02:00
Gael Guennebaud	b0e19db1cf	Enable fused madd for Altivec	2014-04-24 23:17:18 +02:00
Gael Guennebaud	5c5231ab71	Workaround gcc's default ABI not being able to distinghish between vector types of different sizes.	2014-04-22 16:03:19 +02:00
Gael Guennebaud	d5a795f673	New gebp kernel handling up to 3 packets x 4 register-level blocks. Huge speeup on Haswell. This changeset also introduce new vector functions: ploadquad and predux4.	2014-04-16 17:05:11 +02:00
Benoit Steiner	feaf7c7e6d	Optimized SSE unaligned loads and stores when compiling a 64bit target with a recent version of gcc (ie gcc 4.8).	2014-04-14 10:44:17 -07:00
Gael Guennebaud	10aa14592a	Add a mechanism to recursively access to half-size packet types	2014-03-28 10:18:04 +01:00
Benoit Steiner	8a94cb3edd	Implemented the SSE version of the gather and scatter packet primitives.	2014-03-27 18:29:01 -07:00
Benoit Steiner	a419cea4a0	Created the ptranspose packet primitive that can transpose an array of N packets, where N is the number of words in each packet. This primitive will be used to complete the vectorization of the gemm_pack_lhs and gemm_pack_rhs functions. Implemented the primitive using SSE instructions.	2014-03-26 19:03:07 -07:00
Benoit Steiner	cc73164aa8	Merged latest updates from the parent branch	2014-03-26 15:23:59 -07:00
Gael Guennebaud	bc401eb6fa	Implement new 1 packet x 8 gebp kernel	2014-03-26 18:53:00 +01:00
Gael Guennebaud	b286a1e75c	add pbroadcast2/4 generic intrinsics	2014-03-26 16:46:36 +01:00
Benoit Steiner	db7d49efbb	Added support for FMA instructions	2014-02-24 13:45:32 -08:00
Benoit Steiner	64a85800bd	Added support for AVX to Eigen.	2014-01-29 11:43:05 -08:00
Gael Guennebaud	a7621809fe	Remove useless register keyword, and optimize predux_min/max for SSE4	2014-01-25 16:54:13 +01:00
Gael Guennebaud	01fd880424	Revert previous change and introduce a new workaround regarding gcc generating a shufps instruction instead of the more efficient pshufd instruction. The trick consists in introducing a new pload1 function to be used in low level product kernels for which bug #203 does not apply. Indeed, it turned out that using inline assembly prevents gcc of doing a good job at instructtion reordering.	2014-03-20 16:03:46 +01:00
Gael Guennebaud	c39a3fa7a1	Makes gcc to generate a pshufd instruction for pset1	2014-03-20 10:14:26 +01:00
Gael Guennebaud	d4dd6aaed2	Fix bug #642 : add vectorization of sqrt for doubles, and make sqrt really safe if EIGEN_FAST_MATH is disabled	2013-08-19 16:02:27 +02:00
Gael Guennebaud	b3adc4face	Add missing pconj specializations	2013-05-17 17:25:29 +02:00
Gael Guennebaud	d63712163c	Add SSE4 min/max for integers	2013-03-20 18:28:40 +01:00
Gael Guennebaud	e8aa1f00c5	add SSE pexp function for double, make use of _mm_floor_p* for pexp with SSE4.1	2012-07-27 23:40:04 +02:00
Benoit Jacob	69124cfca2	Automatic relicensing to MPL2 using Keirs script. Manual fixup follows.	2012-07-13 14:42:47 -04:00
Jitse Niesen	3c412183b2	Get rid of include directives inside namespace blocks (bug #339 ).	2012-04-15 11:06:28 +01:00
Gael Guennebaud	634fedaf68	proper C++ casting	2012-01-31 18:56:25 +01:00
Gael Guennebaud	9c86ee2695	fix static inline versus inline static issues (the former is the correct order)	2012-01-31 12:58:52 +01:00

1 2 3

130 Commits