Benoit Steiner
01f7918788
Pulled latest fixes
2015-02-06 05:30:20 -08:00
Gael Guennebaud
b50ffaddf2
merge
2015-02-06 14:27:12 +01:00
Gael Guennebaud
74e460b995
Fix symmetric product
2015-02-06 14:26:24 +01:00
Benoit Steiner
c739102ef9
Pulled the latest changes from the trunk
2015-02-06 05:25:03 -08:00
Benoit Steiner
dcb2a8b184
Added the EIGEN_HAS_CONSTEXPR define
...
Gate the tensor index list code based on the value of EIGEN_HAS_CONSTEXPR
2015-02-06 02:51:59 -08:00
Benoit Jacob
5ef95fabee
bug #936 , patch 3/3: Properly detect FMA support on ARM (requires VFPv4)
...
and use it instead of MLA when available, because it's both more accurate,
and faster.
2015-01-30 17:45:03 -05:00
Benoit Jacob
0f21613698
bug #936 , patch 2/3: Remove EIGEN_VECTORIZE_FMA, was redundant with EIGEN_HAS_SINGLE_INSTRUCTION_MADD
2015-01-30 17:44:26 -05:00
Benoit Jacob
340b8afb14
bug #936 , patch 1.5/3: rename _FUSED_ macros to _SINGLE_INSTRUCTION_,
...
because this is what they are about. "Fused" means "no intermediate rounding
between the mul and the add, only one rounding at the end". Instead,
what we are concerned about here is whether a temporary register is needed,
i.e. whether the MUL and ADD are separate instructions.
Concretely, on ARM NEON, a single-instruction mul-add is always available: VMLA.
But a true fused mul-add is only available on VFPv4: VFMA.
2015-01-31 14:15:57 -05:00
Benoit Jacob
9f99f61e69
bug #936 , patch 1/3: some cleanup and renaming for consistency.
2015-01-30 17:43:56 -05:00
Benoit Jacob
759bd92a85
bug #935 : Add asm comments in GEBP kernels to work around a bug
...
in both GCC and Clang on ARM/NEON, whereby they spill registers,
severely harming performance. The reason why the asm comments
make a difference is that they prevent the compiler from
reordering code across these boundaries, which has the effect
of extending the lifetime of local variables and increasing
register pressure on this register-tight code.
2015-01-30 17:27:56 -05:00
Gael Guennebaud
c6eb84aabc
Enable vectorization of transposeInPlace for PacketSize x PacketSize matrices
2015-01-26 17:09:01 +01:00
Gael Guennebaud
e1f1091fde
Add support for dense ?= diagonal
2015-01-24 10:32:49 +01:00
Gael Guennebaud
279786e987
Fix missing evaluator in outer-product
2015-01-13 10:25:50 +01:00
Gael Guennebaud
ae4644cc68
bug #907 , ARM64: workaround ICE in xcode/clang
2015-01-13 10:03:00 +01:00
Gael Guennebaud
36f7c1337f
bug #907 , ARM64: workaround vreinterpretq_u64_* not defined in xcode/clang
2015-01-13 09:57:37 +01:00
Gael Guennebaud
63974bcb88
Big 907: workaround some missing intrinsics in current NDK's gcc version (ARM64)
2015-01-07 09:44:25 +01:00
Gael Guennebaud
79f4a59ed9
bug #907 : fix compilation with ARM64
2015-01-07 09:41:56 +01:00
Benoit Steiner
9f98650d0a
Ensured that contractions that can be reduced to a matrix vector product work correctly even when the input coefficients aren't aligned.
2015-01-06 09:29:13 -08:00
Gael Guennebaud
f5f6e2c6f4
bug #921 : fix utilization of bitwise operation on enums in first_aligned
2014-12-19 14:41:59 +01:00
Gael Guennebaud
25c7d9164f
bug #920 : fix MSVC 2015 compilation issues
2014-12-18 22:58:15 +01:00
Gael Guennebaud
7dad5f797e
bug #821 : workaround MSVC 2013 issue with using Base::Base::operator=
2014-12-16 13:33:43 +01:00
Gael Guennebaud
4371911861
Remove useless and non standard numext::atanh2 function.
2014-12-08 16:44:34 +01:00
Gael Guennebaud
bea36925db
bug #876 : implement a portable log1p function
2014-12-08 16:26:53 +01:00
Gael Guennebaud
80ed5bd90c
Workaround various "returning reference to temporary" warnings.
2014-12-05 12:49:30 +01:00
Gael Guennebaud
775f7e5fbb
bug #697 : make sure empty classes are at the end in case of multiple inheritence
2014-12-02 14:40:19 +01:00
Gael Guennebaud
a819fa148d
Fix MSVC compilation issue
2014-12-02 14:35:31 +01:00
Gael Guennebaud
b1f9f603a0
Simplify return type of diagonal(Index) (and ease compiler job)
2014-11-28 14:39:47 +01:00
Benoit Steiner
509e4ddc02
Added reduction packet primitives for CUDA
2014-11-19 10:34:11 -08:00
Gael Guennebaud
722916e19d
bug #903 : clean swap API regarding extra enable_if parameters, and add failtests for swap
2014-11-06 09:25:26 +01:00
Gael Guennebaud
c6fefe5d8e
Big 853: replace enable_if in Ref<> ctor by static assertions and add failtests for Ref<>
2014-11-05 16:15:17 +01:00
Gael Guennebaud
ee06f78679
Introduce unified macros to identify compiler, OS, and architecture. They are all defined in util/Macros.h and prefixed with EIGEN_COMP_, EIGEN_OS_, and EIGEN_ARCH_ respectively.
2014-11-04 21:58:52 +01:00
Benoit Steiner
2dde63499c
Generalized the matrix vector product code.
2014-10-31 16:33:51 -07:00
Benoit Steiner
bc99c5f7db
fixed some potential alignment issues.
2014-10-30 18:09:53 -07:00
Benoit Steiner
1946cc4478
Added missing packet primitives for CUDA.
2014-10-30 17:52:32 -07:00
Christoph Hertzberg
883168ed94
Make select CUDA compatible (comparison operators aren't yet, so no test case yet)
2014-10-30 20:16:16 +01:00
Christoph Hertzberg
e5f134006b
EIGEN_UNUSED_VARIABLE works better than casting to void. Make this also usable from CUDA code
2014-10-30 19:59:09 +01:00
Gael Guennebaud
21c0a2ce0c
Move D&C SVD to official SVD module.
2014-10-29 11:29:33 +01:00
Christoph Hertzberg
e2e7ba9f85
bug #898 : add inline hint to const_cast_ptr
2014-10-28 14:49:44 +01:00
Konstantinos Margaritis
fcb3573d17
Merged eigen/eigen into default
2014-10-22 10:42:18 +03:00
Konstantinos Margaritis
fae4fd7a26
Added ARMv8 support
2014-10-22 07:39:49 +00:00
Christoph Hertzberg
cf09c5f687
Prevent CUDA calling a __host__ function from a __host__ __device__ function is not allowed
error.
2014-10-21 20:40:09 +02:00
Konstantinos Margaritis
b508619392
working 64-bit support in PacketMath.h, Complex.h needed
2014-10-21 18:10:33 +00:00
Gael Guennebaud
fe57b2f963
bug #701 : workaround (min) and (max) blocking ADL by introducing numext::mini and numext::maxi internal functions and a EIGEN_NOT_A_MACRO macro.
2014-10-20 15:55:32 +02:00
Christoph Hertzberg
84aaa03182
Addendum to bug #859 : pexp(NaN) for double did not return NaN, also, plog(NaN) did not return NaN.
...
psqrt(NaN) and psqrt(-1) shall return NaN if EIGEN_FAST_MATH==0
2014-10-20 13:13:43 +02:00
Gael Guennebaud
aa5f79206f
Fix bug #859 : pexp(NaN) returned Inf instead of NaN
2014-10-20 11:38:51 +02:00
Gael Guennebaud
8472e697ca
Add lapack interface to JacobiSVD and BDCSVD
2014-10-17 15:31:11 +02:00
Benoit Steiner
bfdd9f3ac9
Made the blocking computation aware of the l3 cache
...
Also optimized the blocking parameters to take into account the number of threads used for a computation
2014-10-15 15:32:59 -07:00
Benoit Steiner
99d75235a9
Misc improvements and cleanups
2014-10-13 17:02:09 -07:00
Christoph Hertzberg
d3f52debc6
Make cuda_basic test compile again by adding lots of EIGEN_DEVICE_FUNC.
...
Although the test passes now, there might still be some missing.
2014-10-13 17:18:26 +02:00
Gael Guennebaud
48d537f59f
Fix indentation
2014-10-09 23:35:26 +02:00