eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-07-05 20:55:12 +08:00

Author	SHA1	Message	Date
Gael Guennebaud	cbbf88c4d7	Use int32_t instead of int in NEON code. Some platforms with 16 bytes int supports ARM NEON.	2017-02-17 14:39:02 +01:00
Benoit Jacob	751e097c57	Use 32 registers on ARM64	2016-12-19 13:44:46 -05:00
Konstantinos Margaritis	ef05463fcf	Merged kmargar/eigen/tip into default, Altivec/VSX port should be working ok now.	2016-07-10 16:11:46 +03:00
Gael Guennebaud	0028049380	bug #1240 : Remove any assumption on NEON vector types.	2016-06-09 23:08:11 +02:00
Sean Templeton	bd21243821	Fix compile errors initializing packets on ARM DS-5 5.20 The ARM DS-5 5.20 compiler fails compiling with the following errors: "src/Core/arch/NEON/PacketMath.h", line 113: Error: #146: too many initializer values Packet4f countdown = EIGEN_INIT_NEON_PACKET4(0, 1, 2, 3); ^ "src/Core/arch/NEON/PacketMath.h", line 118: Error: #146: too many initializer values Packet4i countdown = EIGEN_INIT_NEON_PACKET4(0, 1, 2, 3); ^ "src/Core/arch/NEON/Complex.h", line 30: Error: #146: too many initializer values static uint32x4_t p4ui_CONJ_XOR = EIGEN_INIT_NEON_PACKET4(0x00000000, 0x80000000, 0x00000000, 0x80000000); ^ "src/Core/arch/NEON/Complex.h", line 31: Error: #146: too many initializer values static uint32x2_t p2ui_CONJ_XOR = EIGEN_INIT_NEON_PACKET2(0x00000000, 0x80000000); ^ The vectors are implemented as two doubles, hence the too many initializer values error. Changed the code to use intrinsic load functions which all compilers implementing NEON should have.	2016-06-03 10:51:35 -05:00
Benoit Jacob	40a16282c7	Remove now-unused protate PacketMath func	2016-05-24 11:01:18 -04:00
Konstantinos Margaritis	950158f6d1	add name to copyrights	2016-04-28 14:32:11 -03:00
Benoit Jacob	cd2b667ac8	Add references to filed LLVM bugs	2016-04-08 08:12:47 -04:00
Benoit Jacob	158fea0f5e	bug #1190 - Don't trust __ARM_FEATURE_FMA on Clang/ARM	2016-04-04 16:42:40 -04:00
Benoit Jacob	03f2997a11	bug #1191 - Prevent Clang/ARM from rewriting VMLA into VMUL+VADD	2016-04-04 16:41:47 -04:00
Benoit Jacob	01b5333e44	bug #1186 - vreinterpretq_u64_f64 fails to build on Android/Aarch64/Clang toolchain	2016-03-30 11:02:33 -04:00
Gael Guennebaud	6245591349	Fix prototype of plset and generalize linspace functor.	2015-08-07 19:27:59 +02:00
Gael Guennebaud	ce57dbd937	Let unpacket_traits<> exposes the required alignment and make use of it everywhere	2015-08-07 10:44:01 +02:00
Benoit Jacob	c88e1abaf3	also uninitialized here, see previous cset	2015-05-15 11:34:57 -04:00
Benoit Jacob	807793ec3b	Fix uninitialized var warning. The compiler was clearing the register anyway, so this does not change resulting code	2015-05-15 11:15:53 -04:00
Benoit Jacob	f839099512	Work around an ICE in Clang 3.5 in the iOS toolchain with double NEON intrinsics.	2015-03-03 09:35:22 -05:00
Benoit Jacob	1ec0f4fadf	HalfPacket also needed to be disabled for double, on ARMv8.	2015-03-02 16:08:54 -05:00
Benoit Jacob	2fc3b484d7	remove trailing comma	2015-02-27 11:37:45 -05:00
Benoit Jacob	33669348c4	Disable Packet2f/2i halfpacket support in NEON. I believe that it was erroneously turned on, since Packet2f/2i intrinsics are unimplemented, and code trying to use halfpackets just fails to compile on NEON, as it tries to use the default implementation of pload/pstore and the types don't match.	2015-02-27 11:35:37 -05:00
Benoit Jacob	9bd8a4bab5	bug #955 - Implement a rotating kernel alternative in the 3px4 gebp path This is substantially faster on ARM, where it's important to minimize the number of loads. This is specific to the case where all packet types are of size 4. I made my best attempt to minimize how dirty this is... opinions welcome. Eventually one could have a generic rotated kernel, but it would take some work to get there. Also, on sandy bridge, in my experience, it's not beneficial (even about 1% slower).	2015-02-18 15:03:35 -05:00
Gael Guennebaud	45cbb0bbb1	The usage of DenseIndex is deprecated, so let's replace DenseIndex by Index	2015-02-16 15:05:41 +01:00
Benoit Steiner	cc5d7ff523	Added vectorized implementation of the exponential function for ARM/NEON	2015-02-10 14:02:38 -08:00
Benoit Jacob	5ef95fabee	bug #936 , patch 3/3: Properly detect FMA support on ARM (requires VFPv4) and use it instead of MLA when available, because it's both more accurate, and faster.	2015-01-30 17:45:03 -05:00
Benoit Jacob	0f21613698	bug #936 , patch 2/3: Remove EIGEN_VECTORIZE_FMA, was redundant with EIGEN_HAS_SINGLE_INSTRUCTION_MADD	2015-01-30 17:44:26 -05:00
Benoit Jacob	340b8afb14	bug #936 , patch 1.5/3: rename _FUSED_ macros to _SINGLE_INSTRUCTION_, because this is what they are about. "Fused" means "no intermediate rounding between the mul and the add, only one rounding at the end". Instead, what we are concerned about here is whether a temporary register is needed, i.e. whether the MUL and ADD are separate instructions. Concretely, on ARM NEON, a single-instruction mul-add is always available: VMLA. But a true fused mul-add is only available on VFPv4: VFMA.	2015-01-31 14:15:57 -05:00
Benoit Jacob	9f99f61e69	bug #936 , patch 1/3: some cleanup and renaming for consistency.	2015-01-30 17:43:56 -05:00
Gael Guennebaud	ae4644cc68	bug #907 , ARM64: workaround ICE in xcode/clang	2015-01-13 10:03:00 +01:00
Gael Guennebaud	36f7c1337f	bug #907 , ARM64: workaround vreinterpretq_u64_* not defined in xcode/clang	2015-01-13 09:57:37 +01:00
Gael Guennebaud	63974bcb88	Big 907: workaround some missing intrinsics in current NDK's gcc version (ARM64)	2015-01-07 09:44:25 +01:00
Gael Guennebaud	79f4a59ed9	bug #907 : fix compilation with ARM64	2015-01-07 09:41:56 +01:00
Gael Guennebaud	ee06f78679	Introduce unified macros to identify compiler, OS, and architecture. They are all defined in util/Macros.h and prefixed with EIGEN_COMP_, EIGEN_OS_, and EIGEN_ARCH_ respectively.	2014-11-04 21:58:52 +01:00
Konstantinos Margaritis	fae4fd7a26	Added ARMv8 support	2014-10-22 07:39:49 +00:00
Konstantinos Margaritis	b508619392	working 64-bit support in PacketMath.h, Complex.h needed	2014-10-21 18:10:33 +00:00
Jitse Niesen	25bceefb4e	Replace asm by __asm__ (bug #873 )	2014-09-06 11:47:24 +01:00
Gael Guennebaud	0369db12af	bug #871 : fix compilation on ARM/Neon regarding __has_builtin usage	2014-09-01 10:52:58 +02:00
Gael Guennebaud	b47ef1431f	Fix many long to int implicit conversions	2014-07-08 16:47:11 +02:00
Gael Guennebaud	4def7b1fa5	Fix ptranspose overload prototypes for NEON	2014-04-25 11:15:13 +02:00
Gael Guennebaud	3d8d0f6269	Enable vectorization of pack_rhs with a column-major RHS. Rename and generalize Kernel<> to PacketBlock<,N>.	2014-04-25 10:56:18 +02:00
Benoit Steiner	4eb92e5647	Fixed the NEON implementation of predux_max<Packet4i>.	2014-04-23 18:23:07 -07:00
Benoit Steiner	ccb4dec719	Created a NEON version of the ptranspose packet primitives	2014-04-23 18:22:10 -07:00
Gael Guennebaud	934ce93886	merge with default branch	2014-04-22 17:00:38 +02:00
Benoit Steiner	6d6df90c9a	Implemented the pgather/pscatter packet primitives for the arm/NEON architecture	2014-04-17 12:28:01 -07:00
Gael Guennebaud	d5a795f673	New gebp kernel handling up to 3 packets x 4 register-level blocks. Huge speeup on Haswell. This changeset also introduce new vector functions: ploadquad and predux4.	2014-04-16 17:05:11 +02:00
Benoit Steiner	8044b00a7f	bug #782 : Workaround for gcc <= 4.4 compilation error on the NEON PacketMath code.	2014-04-03 23:41:47 +02:00
Gael Guennebaud	10aa14592a	Add a mechanism to recursively access to half-size packet types	2014-03-28 10:18:04 +01:00
Gael Guennebaud	6dc0e59b1e	Fix bug #677 : compilation issue on arm64 which does not have the PLD instruction	2013-10-31 13:52:43 +01:00
Simon Pilgrim	fab0235369	Fix bug #590 : NEON Duplicate lane load	2013-06-23 14:13:21 +02:00
Gael Guennebaud	b3adc4face	Add missing pconj specializations	2013-05-17 17:25:29 +02:00
Benoit Jacob	69124cfca2	Automatic relicensing to MPL2 using Keirs script. Manual fixup follows.	2012-07-13 14:42:47 -04:00
Konstantinos Margaritis	d878cf2227	fix typo	2012-07-04 11:28:59 +03:00

1 2

76 Commits