eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-10-22 13:01:06 +08:00

Author	SHA1	Message	Date
Benoit Steiner	51be91f15e	Added support for CUDA architectures that don's support for 3.5 capabilities	2015-12-21 08:42:58 -08:00
Benoit Steiner	6d777e1bc7	Fixed a typo.	2015-12-18 19:25:50 -08:00
Gael Guennebaud	3abd8470ca	bug #1140 : remove custom definition and use of _mm256_setr_m128	2015-12-18 14:18:59 +01:00
Gael Guennebaud	ca39b1546e	Merged in ebrevdo/eigen (pull request PR-148) Add special functions to eigen: lgamma, erf, erfc.	2015-12-11 11:52:09 +01:00
Benoit Steiner	9a415fb1e2	Preliminary support for AVX512	2015-12-10 15:34:57 -08:00
Gael Guennebaud	7ad1aaec1d	bug #1103 : fix neon vectorization of pmul(Packet1cd,Packet1cd)	2015-12-10 16:06:33 +01:00
Eugene Brevdo	fa4f933c0f	Add special functions to Eigen: lgamma, erf, erfc. Includes CUDA support and unit tests.	2015-12-07 15:24:49 -08:00
Gael Guennebaud	ae87f094eb	Fix "," in non SSE4 mode	2015-11-05 12:08:36 +01:00
Gael Guennebaud	90323f1751	Fix AVX round/ceil/floor, and fix respective unit test	2015-11-04 22:15:57 +01:00
Gael Guennebaud	3dd24bdf99	Merged in aavenel/eigen (pull request PR-142) Add round, ceil and floor for SSE4.1/AVX (Bug #70)	2015-11-04 18:26:38 +01:00
Benoit Steiner	36cd6daaae	Made the CUDA implementation of ploadt_ro compatible with cuda implementations older than 3.5	2015-11-03 16:36:30 -08:00
Alexandre Avenel	d46e2c10a6	Add round, ceil and floor for SSE4.1/AVX (Bug #70 )	2015-11-01 10:49:27 +01:00
Gael Guennebaud	6163db814c	bug #1085 : workaround gcc default ABI issue	2015-10-10 22:38:55 +02:00
Gael Guennebaud	f047ecc36a	_mm_hadd_epi32 is for SSSE3 only (and not SSE3)	2015-10-07 15:48:35 +02:00
Gael Guennebaud	2c676ddb40	Handle various TODOs in SSE vectorization (remove splitted storeu, enable SSE3 integer vectorization, plus minor tweaks)	2015-10-06 15:43:27 +02:00
Gael Guennebaud	75861f6650	bug #1069 : fix AVX support on MSVC (use of non portable C-style cast)	2015-09-28 10:08:26 +02:00
Benoit Steiner	98f8f0db9a	Added support for predux_mul for CUDA devices	2015-09-08 15:37:25 -07:00
Doug Kwan	5c9ee73eb9	Implement plog and pexp for AltiVec.	2015-07-30 11:12:42 -07:00
Gael Guennebaud	6245591349	Fix prototype of plset and generalize linspace functor.	2015-08-07 19:27:59 +02:00
Gael Guennebaud	e68c7b8368	Include SSE packetmath when AVX is enabled, and enable AVX's sine function only in fast-math mode (as SSE)	2015-08-07 17:40:39 +02:00
Gael Guennebaud	ce57dbd937	Let unpacket_traits<> exposes the required alignment and make use of it everywhere	2015-08-07 10:44:01 +02:00
Gael Guennebaud	9a2447b0c9	Fix shadow warnings triggered by clang	2015-06-09 09:11:12 +02:00
Benoit Jacob	051d5325cc	Abandon blocking size lookup table approach. Not performing as well in real world as in microbenchmark.	2015-05-19 11:03:59 -04:00
Benoit Jacob	c88e1abaf3	also uninitialized here, see previous cset	2015-05-15 11:34:57 -04:00
Benoit Jacob	807793ec3b	Fix uninitialized var warning. The compiler was clearing the register anyway, so this does not change resulting code	2015-05-15 11:15:53 -04:00
Konstantinos Margaritis	dd698e6680	Merged in doug_kwan/eigen (pull request PR-103) Fix bug in pdiv<Packet1cd> which swaps 32-bit halves of a pair of	2015-05-05 20:50:14 +03:00
Benoit Steiner	1dded10cb7	Added a double-precision implementation of the exp() function for AVX.	2015-05-04 10:42:51 -07:00
Benoit Steiner	d3f7915aeb	Pulled latest update from the eigen main codebase	2015-03-24 13:12:14 -07:00
Benoit Steiner	abdbe8562e	Fixed the CUDA packet primitives	2015-03-24 10:45:46 -07:00
Benoit Jacob	dc04f12967	use unsigned short instead of uint16_t which doesn't exist in c++98	2015-03-17 10:31:45 -04:00
Benoit Jacob	35c3a8bb84	Update Nexus 5 lookup table from combining now 2 runs of the benchmark, using the analyze-blocking-sizes partition tool. Gives better worst-case performance.	2015-03-16 11:05:51 -04:00
Benoit Jacob	02babb9c0f	Provide a empirical lookup table for blocking sizes measured on a Nexus 5. Only for float, only for Android on ARM 32bit for now.	2015-03-15 18:13:12 -04:00
Doug Kwan	657407227e	Fix bug in pdiv<Packet1cd> which swaps 32-bit halves of a pair of doubles instead of swapping the doubles.	2015-03-11 15:13:37 -07:00
Benoit Steiner	0196141938	Fixed the optimized AVX implementation of the fast rsqrt function	2015-03-02 13:49:39 -08:00
Benoit Steiner	4fd7f47692	Added an optimized version of rsqrt for SSE and AVX that is used when EIGEN_FAST_MATH is defined.	2015-03-02 09:38:47 -08:00
Benoit Steiner	306fceccbe	Pulled latest updates from trunk	2015-02-27 13:05:26 -08:00
Benoit Steiner	05089aba75	Switch to truncated casting when converting floating point types to integer. This ensures that vectorized casts are consistent with scalar casts	2015-02-27 09:27:30 -08:00
Benoit Steiner	573b377110	Added support for vectorized type casting of tensors	2015-02-27 08:46:04 -08:00
Benoit Jacob	2fc3b484d7	remove trailing comma	2015-02-27 11:37:45 -05:00
Benoit Jacob	33669348c4	Disable Packet2f/2i halfpacket support in NEON. I believe that it was erroneously turned on, since Packet2f/2i intrinsics are unimplemented, and code trying to use halfpackets just fails to compile on NEON, as it tries to use the default implementation of pload/pstore and the types don't match.	2015-02-27 11:35:37 -05:00
Benoit Steiner	f41b1f1666	Added support for fast reciprocal square root computation.	2015-02-26 09:42:41 -08:00
Benoit Steiner	7765039f1c	Marked the CUDA packet primitives as EIGEN_DEVICE_FUNC since they'll end up being executed on the GPU device.	2015-02-19 21:22:51 -08:00
Benoit Jacob	9bd8a4bab5	bug #955 - Implement a rotating kernel alternative in the 3px4 gebp path This is substantially faster on ARM, where it's important to minimize the number of loads. This is specific to the case where all packet types are of size 4. I made my best attempt to minimize how dirty this is... opinions welcome. Eventually one could have a generic rotated kernel, but it would take some work to get there. Also, on sandy bridge, in my experience, it's not beneficial (even about 1% slower).	2015-02-18 15:03:35 -05:00
Benoit Jacob	ccc1277a42	must also disable complex<double> when disabling double vectorization	2015-03-03 10:17:05 -05:00
Benoit Jacob	f839099512	Work around an ICE in Clang 3.5 in the iOS toolchain with double NEON intrinsics.	2015-03-03 09:35:22 -05:00
Benoit Jacob	1ec0f4fadf	HalfPacket also needed to be disabled for double, on ARMv8.	2015-03-02 16:08:54 -05:00
Gael Guennebaud	6f4adc9e94	Add missing install directives for arch/CUDA	2015-02-18 11:40:06 +01:00
Gael Guennebaud	eb563049f7	Remove some dead stores.	2015-02-18 11:26:48 +01:00
Gael Guennebaud	159fb181c2	Disable __m128* wrappers when compiling with AVX and -fabi-version=4	2015-02-17 16:27:20 +01:00
Gael Guennebaud	91ab2489dd	Fix compilation with GCC/AVX (workaround __m128 and __m256 being the same type with default ABI)	2015-02-17 16:08:07 +01:00

... 14 15 16 17 18 ...

1082 Commits