eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-10-18 02:51:30 +08:00

Author	SHA1	Message	Date
Gael Guennebaud	4def7b1fa5	Fix ptranspose overload prototypes for NEON	2014-04-25 11:15:13 +02:00
Gael Guennebaud	3d8d0f6269	Enable vectorization of pack_rhs with a column-major RHS. Rename and generalize Kernel<> to PacketBlock<,N>.	2014-04-25 10:56:18 +02:00
Gael Guennebaud	b0e19db1cf	Enable fused madd for Altivec	2014-04-24 23:17:18 +02:00
Gael Guennebaud	8d85ce88e1	Implement ptranspose on altivec and fix pgather/pscatter	2014-04-24 05:47:53 -07:00
Benoit Steiner	4eb92e5647	Fixed the NEON implementation of predux_max<Packet4i>.	2014-04-23 18:23:07 -07:00
Benoit Steiner	ccb4dec719	Created a NEON version of the ptranspose packet primitives	2014-04-23 18:22:10 -07:00
Gael Guennebaud	82b09fcb91	Add Altivec implementation of pgather/pscatter (not tested)	2014-04-23 13:09:26 +02:00
Gael Guennebaud	934ce93886	merge with default branch	2014-04-22 17:00:38 +02:00
Gael Guennebaud	5c5231ab71	Workaround gcc's default ABI not being able to distinghish between vector types of different sizes.	2014-04-22 16:03:19 +02:00
Gael Guennebaud	1388f4f9fd	Fix typo (was working with clang\!)	2014-04-18 11:43:13 +02:00
Gael Guennebaud	2c3c95990d	merge	2014-04-17 22:50:49 +02:00
Benoit Steiner	6d6df90c9a	Implemented the pgather/pscatter packet primitives for the arm/NEON architecture	2014-04-17 12:28:01 -07:00
Gael Guennebaud	9746396d1b	Optimize AVX pset1 for complexes and ploaddup	2014-04-17 20:51:04 +02:00
Gael Guennebaud	0fa8290366	Optimize ploaddup for AVX	2014-04-17 16:02:27 +02:00
Gael Guennebaud	d5a795f673	New gebp kernel handling up to 3 packets x 4 register-level blocks. Huge speeup on Haswell. This changeset also introduce new vector functions: ploadquad and predux4.	2014-04-16 17:05:11 +02:00
Benoit Steiner	feaf7c7e6d	Optimized SSE unaligned loads and stores when compiling a 64bit target with a recent version of gcc (ie gcc 4.8).	2014-04-14 10:44:17 -07:00
Benoit Steiner	8044b00a7f	bug #782 : Workaround for gcc <= 4.4 compilation error on the NEON PacketMath code.	2014-04-03 23:41:47 +02:00
Gael Guennebaud	1c0728043a	Workaround alignment warnings	2014-03-30 22:43:47 +02:00
Gael Guennebaud	10aa14592a	Add a mechanism to recursively access to half-size packet types	2014-03-28 10:18:04 +01:00
Benoit Steiner	51e85c936d	Merged latest changes from parent.	2014-03-27 18:32:15 -07:00
Benoit Steiner	8a94cb3edd	Implemented the SSE version of the gather and scatter packet primitives.	2014-03-27 18:29:01 -07:00
Benoit Steiner	7f3162f707	Implemented the AVX version of the gather and scatter packet primitives.	2014-03-27 17:42:25 -07:00
Gael Guennebaud	58fe2fc2b2	enforce the use of vfmadd231ps for pmadd (gcc and clang stupidely generates the other fmadd variants plus some register moves...)	2014-03-27 23:38:50 +01:00
Benoit Steiner	c4902a3d01	Implemented the AVX version of the ptranspose packet primitive.	2014-03-27 09:34:51 -07:00
Gael Guennebaud	052aedd394	Implement pcplflip, palign, predux and the likes from AVC/complexes	2014-03-27 14:47:00 +01:00
Benoit Steiner	a419cea4a0	Created the ptranspose packet primitive that can transpose an array of N packets, where N is the number of words in each packet. This primitive will be used to complete the vectorization of the gemm_pack_lhs and gemm_pack_rhs functions. Implemented the primitive using SSE instructions.	2014-03-26 19:03:07 -07:00
Benoit Steiner	e45a6bed45	Specialized the pload1 packet primitive for Packet8f and Packet4d in order to take advantage of the vbroadcastss and vbroadcastsd instructions whenever possible.	2014-03-26 15:58:13 -07:00
Benoit Steiner	cc73164aa8	Merged latest updates from the parent branch	2014-03-26 15:23:59 -07:00
Benoit Steiner	a078f442a3	Vectorized the multiplication and division of complex numbers using AVX instructions.	2014-03-26 15:11:18 -07:00
Benoit Steiner	cf1a7bfbe1	Used AVX instructions to vectorize the complex version of the pfirst and ploaddup packet primitives. Silenced a few compilation warnings.	2014-03-26 12:03:31 -07:00
Gael Guennebaud	bc401eb6fa	Implement new 1 packet x 8 gebp kernel	2014-03-26 18:53:00 +01:00
Gael Guennebaud	b286a1e75c	add pbroadcast2/4 generic intrinsics	2014-03-26 16:46:36 +01:00
Benoit Steiner	6bf3cc2732	Use AVX instructions to vectorize pset1<Packet2cd>, pset1<Packet4cf>, preverse<Packet2cd>, and preverse<Packet4cf>	2014-03-25 09:00:43 -07:00
Benoit Steiner	7ae9b0805d	Used AVX instructions to vectorize the predux_min<Packet8f>, predux_min<Packet4d>, predux_max<Packet8f>, and predux_max<Packet4d> packet primitives.	2014-03-24 13:33:40 -07:00
Benoit Steiner	db7d49efbb	Added support for FMA instructions	2014-02-24 13:45:32 -08:00
Benoit Steiner	64a85800bd	Added support for AVX to Eigen.	2014-01-29 11:43:05 -08:00
Gael Guennebaud	a7621809fe	Remove useless register keyword, and optimize predux_min/max for SSE4	2014-01-25 16:54:13 +01:00
Gael Guennebaud	01fd880424	Revert previous change and introduce a new workaround regarding gcc generating a shufps instruction instead of the more efficient pshufd instruction. The trick consists in introducing a new pload1 function to be used in low level product kernels for which bug #203 does not apply. Indeed, it turned out that using inline assembly prevents gcc of doing a good job at instructtion reordering.	2014-03-20 16:03:46 +01:00
Gael Guennebaud	c39a3fa7a1	Makes gcc to generate a pshufd instruction for pset1	2014-03-20 10:14:26 +01:00
Gael Guennebaud	19521c83b8	bug #677 : fix usage of pld instrinsics for ccomplexes	2013-11-02 12:10:48 +01:00
Gael Guennebaud	6dc0e59b1e	Fix bug #677 : compilation issue on arm64 which does not have the PLD instruction	2013-10-31 13:52:43 +01:00
Gael Guennebaud	9f3f42d66a	fix a few "dead stores" warnings	2013-10-26 13:59:02 +02:00
Gael Guennebaud	4612a1cd87	Fix ploaddup and lin-spaced with AltiVec.	2013-09-10 16:13:59 +02:00
Gael Guennebaud	c47010e3d2	typo	2013-08-19 16:10:00 +02:00
Gael Guennebaud	d4dd6aaed2	Fix bug #642 : add vectorization of sqrt for doubles, and make sqrt really safe if EIGEN_FAST_MATH is disabled	2013-08-19 16:02:27 +02:00
Simon Pilgrim	fab0235369	Fix bug #590 : NEON Duplicate lane load	2013-06-23 14:13:21 +02:00
Gael Guennebaud	9f11f80db1	Make psqrt works with numeric_limits<float>::min	2013-06-14 10:55:05 +02:00
Jeff Dean	d5fa5001a7	Fix bug #613 : psqrt was incorrect for small numbers	2013-06-13 18:17:27 +02:00
Gael Guennebaud	62670c83a0	Fix bug #314 : move remaining math functions from internal to numext namespace	2013-06-10 23:40:56 +02:00
Simon Pilgrim	ca67c60150	Fix bug #591 : minor optimization in NEON vectorization support	2013-06-10 15:59:03 +02:00

... 16 17 18 19 20 ...

1082 Commits