eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-10-15 09:31:29 +08:00

Author	SHA1	Message	Date
Gustavo Lima Chaves	e763fcd09e	Introducing "vectorized" byte on unpacket_traits structs This is a preparation to a change on gebp_traits, where a new template argument will be introduced to dictate the packet size, so it won't be bound to the current/max packet size only anymore. By having packet types defined early on gebp_traits, one has now to act on packet types, not scalars anymore, for the enum values defined on that class. One approach for reaching the vectorizable/size properties one needs there could be getting the packet's scalar again with unpacket_traits<>, then the size/Vectorizable enum entries from packet_traits<>. It turns out guards like "#ifndef EIGEN_VECTORIZE_AVX512" at AVX/PacketMath.h will hide smaller packet variations of packet_traits<> for some types (and it makes sense to keep that). In other words, one can't go back to the scalar and create a new PacketType, as this will always lead to the maximum packet type for the architecture. The less costly/invasive solution for that, thus, is to add the vectorizable info on every unpacket_traits struct as well.	2018-12-19 14:24:44 -08:00
Gael Guennebaud	c785464430	Add packet sin and cos to Altivec/VSX and NEON	2018-11-30 16:21:33 +01:00
Gael Guennebaud	b131a4db24	bug #1631 : fix compilation with ARM NEON and clang, and cleanup the weird pshiftright_and_cast and pcast_and_shiftleft functions.	2018-11-27 23:45:00 +01:00
Gael Guennebaud	4a347a0054	Unify NEON's pexp with generic implementation	2018-11-26 22:15:44 +01:00
Gael Guennebaud	c24e98e6a8	Unify NEON's plog with generic implementation	2018-11-26 15:02:16 +01:00
Gael Guennebaud	e8ca5166a9	bug #1428 : atempt to make NEON vectorization compilable by MSVC. The workaround is to wrap NEON packet types to make them different c++ types.	2018-04-24 11:19:49 +02:00
Christoph Hertzberg	11ddac57e5	Merged in guillaume_michel/eigen (pull request PR-334) - Add support for NEON plog PacketMath function	2017-10-23 13:22:22 +00:00
Gael Guennebaud	1d59ca2458	Fix compilation with gcc 4.3 and ARM NEON	2017-06-09 13:20:52 +02:00
Benoit Jacob	61160a21d2	ARM prefetch fixes: Implement prefetch on ARM64. Do not clobber cc on ARM32.	2017-03-15 06:57:25 -04:00
Gael Guennebaud	cbbf88c4d7	Use int32_t instead of int in NEON code. Some platforms with 16 bytes int supports ARM NEON.	2017-02-17 14:39:02 +01:00
Benoit Jacob	751e097c57	Use 32 registers on ARM64	2016-12-19 13:44:46 -05:00
Konstantinos Margaritis	ef05463fcf	Merged kmargar/eigen/tip into default, Altivec/VSX port should be working ok now.	2016-07-10 16:11:46 +03:00
Gael Guennebaud	0028049380	bug #1240 : Remove any assumption on NEON vector types.	2016-06-09 23:08:11 +02:00
Sean Templeton	bd21243821	Fix compile errors initializing packets on ARM DS-5 5.20 The ARM DS-5 5.20 compiler fails compiling with the following errors: "src/Core/arch/NEON/PacketMath.h", line 113: Error: #146: too many initializer values Packet4f countdown = EIGEN_INIT_NEON_PACKET4(0, 1, 2, 3); ^ "src/Core/arch/NEON/PacketMath.h", line 118: Error: #146: too many initializer values Packet4i countdown = EIGEN_INIT_NEON_PACKET4(0, 1, 2, 3); ^ "src/Core/arch/NEON/Complex.h", line 30: Error: #146: too many initializer values static uint32x4_t p4ui_CONJ_XOR = EIGEN_INIT_NEON_PACKET4(0x00000000, 0x80000000, 0x00000000, 0x80000000); ^ "src/Core/arch/NEON/Complex.h", line 31: Error: #146: too many initializer values static uint32x2_t p2ui_CONJ_XOR = EIGEN_INIT_NEON_PACKET2(0x00000000, 0x80000000); ^ The vectors are implemented as two doubles, hence the too many initializer values error. Changed the code to use intrinsic load functions which all compilers implementing NEON should have.	2016-06-03 10:51:35 -05:00
Benoit Jacob	40a16282c7	Remove now-unused protate PacketMath func	2016-05-24 11:01:18 -04:00
Konstantinos Margaritis	950158f6d1	add name to copyrights	2016-04-28 14:32:11 -03:00
Benoit Jacob	cd2b667ac8	Add references to filed LLVM bugs	2016-04-08 08:12:47 -04:00
Benoit Jacob	158fea0f5e	bug #1190 - Don't trust __ARM_FEATURE_FMA on Clang/ARM	2016-04-04 16:42:40 -04:00
Benoit Jacob	03f2997a11	bug #1191 - Prevent Clang/ARM from rewriting VMLA into VMUL+VADD	2016-04-04 16:41:47 -04:00
Benoit Jacob	01b5333e44	bug #1186 - vreinterpretq_u64_f64 fails to build on Android/Aarch64/Clang toolchain	2016-03-30 11:02:33 -04:00
Gael Guennebaud	6245591349	Fix prototype of plset and generalize linspace functor.	2015-08-07 19:27:59 +02:00
Gael Guennebaud	ce57dbd937	Let unpacket_traits<> exposes the required alignment and make use of it everywhere	2015-08-07 10:44:01 +02:00
Benoit Jacob	c88e1abaf3	also uninitialized here, see previous cset	2015-05-15 11:34:57 -04:00
Benoit Jacob	807793ec3b	Fix uninitialized var warning. The compiler was clearing the register anyway, so this does not change resulting code	2015-05-15 11:15:53 -04:00
Benoit Jacob	f839099512	Work around an ICE in Clang 3.5 in the iOS toolchain with double NEON intrinsics.	2015-03-03 09:35:22 -05:00
Benoit Jacob	1ec0f4fadf	HalfPacket also needed to be disabled for double, on ARMv8.	2015-03-02 16:08:54 -05:00
Benoit Jacob	2fc3b484d7	remove trailing comma	2015-02-27 11:37:45 -05:00
Benoit Jacob	33669348c4	Disable Packet2f/2i halfpacket support in NEON. I believe that it was erroneously turned on, since Packet2f/2i intrinsics are unimplemented, and code trying to use halfpackets just fails to compile on NEON, as it tries to use the default implementation of pload/pstore and the types don't match.	2015-02-27 11:35:37 -05:00
Benoit Jacob	9bd8a4bab5	bug #955 - Implement a rotating kernel alternative in the 3px4 gebp path This is substantially faster on ARM, where it's important to minimize the number of loads. This is specific to the case where all packet types are of size 4. I made my best attempt to minimize how dirty this is... opinions welcome. Eventually one could have a generic rotated kernel, but it would take some work to get there. Also, on sandy bridge, in my experience, it's not beneficial (even about 1% slower).	2015-02-18 15:03:35 -05:00
Gael Guennebaud	45cbb0bbb1	The usage of DenseIndex is deprecated, so let's replace DenseIndex by Index	2015-02-16 15:05:41 +01:00
Benoit Steiner	cc5d7ff523	Added vectorized implementation of the exponential function for ARM/NEON	2015-02-10 14:02:38 -08:00
Benoit Jacob	5ef95fabee	bug #936 , patch 3/3: Properly detect FMA support on ARM (requires VFPv4) and use it instead of MLA when available, because it's both more accurate, and faster.	2015-01-30 17:45:03 -05:00
Benoit Jacob	0f21613698	bug #936 , patch 2/3: Remove EIGEN_VECTORIZE_FMA, was redundant with EIGEN_HAS_SINGLE_INSTRUCTION_MADD	2015-01-30 17:44:26 -05:00
Benoit Jacob	340b8afb14	bug #936 , patch 1.5/3: rename _FUSED_ macros to _SINGLE_INSTRUCTION_, because this is what they are about. "Fused" means "no intermediate rounding between the mul and the add, only one rounding at the end". Instead, what we are concerned about here is whether a temporary register is needed, i.e. whether the MUL and ADD are separate instructions. Concretely, on ARM NEON, a single-instruction mul-add is always available: VMLA. But a true fused mul-add is only available on VFPv4: VFMA.	2015-01-31 14:15:57 -05:00
Benoit Jacob	9f99f61e69	bug #936 , patch 1/3: some cleanup and renaming for consistency.	2015-01-30 17:43:56 -05:00
Gael Guennebaud	ae4644cc68	bug #907 , ARM64: workaround ICE in xcode/clang	2015-01-13 10:03:00 +01:00
Gael Guennebaud	36f7c1337f	bug #907 , ARM64: workaround vreinterpretq_u64_* not defined in xcode/clang	2015-01-13 09:57:37 +01:00
Gael Guennebaud	63974bcb88	Big 907: workaround some missing intrinsics in current NDK's gcc version (ARM64)	2015-01-07 09:44:25 +01:00
Gael Guennebaud	79f4a59ed9	bug #907 : fix compilation with ARM64	2015-01-07 09:41:56 +01:00
Gael Guennebaud	ee06f78679	Introduce unified macros to identify compiler, OS, and architecture. They are all defined in util/Macros.h and prefixed with EIGEN_COMP_, EIGEN_OS_, and EIGEN_ARCH_ respectively.	2014-11-04 21:58:52 +01:00
Konstantinos Margaritis	fae4fd7a26	Added ARMv8 support	2014-10-22 07:39:49 +00:00
Konstantinos Margaritis	b508619392	working 64-bit support in PacketMath.h, Complex.h needed	2014-10-21 18:10:33 +00:00
Jitse Niesen	25bceefb4e	Replace asm by __asm__ (bug #873 )	2014-09-06 11:47:24 +01:00
Gael Guennebaud	0369db12af	bug #871 : fix compilation on ARM/Neon regarding __has_builtin usage	2014-09-01 10:52:58 +02:00
Gael Guennebaud	b47ef1431f	Fix many long to int implicit conversions	2014-07-08 16:47:11 +02:00
Gael Guennebaud	4def7b1fa5	Fix ptranspose overload prototypes for NEON	2014-04-25 11:15:13 +02:00
Gael Guennebaud	3d8d0f6269	Enable vectorization of pack_rhs with a column-major RHS. Rename and generalize Kernel<> to PacketBlock<,N>.	2014-04-25 10:56:18 +02:00
Benoit Steiner	4eb92e5647	Fixed the NEON implementation of predux_max<Packet4i>.	2014-04-23 18:23:07 -07:00
Benoit Steiner	ccb4dec719	Created a NEON version of the ptranspose packet primitives	2014-04-23 18:22:10 -07:00
Gael Guennebaud	934ce93886	merge with default branch	2014-04-22 17:00:38 +02:00

1 2

85 Commits