Deanna Hood
e1d6e6c972
Make cube, inverse and abs2 free-functions
2015-03-17 06:25:24 +10:00
Benoit Jacob
eb6929cb19
fix bug in maxsize calculation, which would cause products of size > 2048 to address the lookup table out of bounds
2015-03-16 16:15:47 -04:00
Deanna Hood
fef4e071d7
Rename isinf to isInf
2015-03-17 05:58:47 +10:00
Deanna Hood
46cf9cda32
Add isfinite array support as isFinite
2015-03-17 04:33:12 +10:00
Deanna Hood
1d76ceab55
Remove floor, ceil, round for complex numbers
2015-03-17 02:36:07 +10:00
Deanna Hood
717b7954ce
Update cost of coeff-wise arg call
2015-03-17 02:11:57 +10:00
Deanna Hood
fb68b149cb
Rename isnan to isNaN
2015-03-17 02:04:42 +10:00
Benoit Jacob
35c3a8bb84
Update Nexus 5 lookup table from combining now 2 runs of the benchmark, using the analyze-blocking-sizes partition tool. Gives better worst-case performance.
2015-03-16 11:05:51 -04:00
Benoit Jacob
e274607d7f
fix compilation with GCC 4.8
2015-03-16 10:48:27 -04:00
Benoit Jacob
151b8b95c6
Fix bug in case where EIGEN_TEST_SPECIFIC_BLOCKING_SIZE is defined but false
2015-03-15 19:10:51 -04:00
Benoit Jacob
02babb9c0f
Provide a empirical lookup table for blocking sizes measured on a Nexus 5. Only for float, only for Android on ARM 32bit for now.
2015-03-15 18:13:12 -04:00
Benoit Jacob
3589a9c115
actual_panel_rows computation should always be resilient to parameters not consistent with the known L1 cache size, see comment
2015-03-15 18:12:18 -04:00
Benoit Jacob
1dd3d89818
Fix a unused-var warning
2015-03-15 18:07:19 -04:00
Benoit Jacob
e56aabf205
Refactor computeProductBlockingSizes to make room for the possibility of using lookup tables
2015-03-15 18:05:12 -04:00
Benoit Jacob
488c15615a
organize a little our default cache sizes, and use a saner default L1 outside of x86 (10% faster on Nexus 5)
2015-03-13 14:51:26 -07:00
Gael Guennebaud
1330f8bbd1
bug #973 , improve AVX support by enabling vectorization of Vector4i-like types, and enforcing alignement of Vector4f/Vector2d-like types to preserve compatibility with SSE and future Eigen versions that will vectorize them with AVX enabled.
2015-03-13 21:15:50 +01:00
Gael Guennebaud
d99ab35f9e
Fix internal::random(x,y) for integer types. The previous implementation could return y+1. The new implementation uses rejection sampling to get an unbiased behabior.
2015-03-13 21:12:46 +01:00
Gael Guennebaud
8580eb6808
bug #949 : add static assertion for incompatible scalar types in dense end-user decompositions.
2015-03-13 21:06:20 +01:00
Gael Guennebaud
a9df28c95b
SparseMatrix::insert: switch to a fully uncompressed mode if sequential insertion is not possible (otherwise an arbitrary large amount of memory was preallocated in some cases)
2015-03-13 21:00:21 +01:00
Gael Guennebaud
5ffe29cb9f
Bound pre-allocation to the maximal size representable by StorageIndex and throw bad_alloc if that's not possible.
2015-03-13 20:57:33 +01:00
Gael Guennebaud
2f6f8bf31c
Add missing coeff/coeffRef members to Block<sparse>, and extend unit tests.
2015-03-13 16:24:40 +01:00
Doug Kwan
657407227e
Fix bug in pdiv<Packet1cd> which swaps 32-bit halves of a pair of
...
doubles instead of swapping the doubles.
2015-03-11 15:13:37 -07:00
Deanna Hood
f89fcefa79
Add hyperbolic trigonometric functions from std array support
2015-03-11 13:13:30 +10:00
Deanna Hood
a5e49976f5
Add log10 array support
2015-03-11 08:56:42 +10:00
Deanna Hood
19a71056ae
Allow calling of square(array) in addition to array.square()
2015-03-11 06:59:28 +10:00
Deanna Hood
31fdd67756
Additional unary coeff-wise functors (isnan, round, arg, e.g.)
2015-03-11 06:39:23 +10:00
Gael Guennebaud
fd78874888
Fix compilation of iterative solvers with dense matrices
2015-03-09 21:31:03 +01:00
Gael Guennebaud
d4317a85e8
Add typedefs for return types of SparseMatrixBase::selfadjointView
2015-03-09 21:29:46 +01:00
Gael Guennebaud
9e885fb766
Add unit tests for CG and sparse-LLT for long int as storage-index
2015-03-09 14:33:15 +01:00
Gael Guennebaud
224a1fe4c6
bug #963 : make IncompleteLUT compatible with non-default storage index types.
2015-03-09 13:55:20 +01:00
Gael Guennebaud
0ee391863e
Avoid undeflow when blocking size are tuned manually.
2015-03-06 21:51:09 +01:00
Gael Guennebaud
14a5f135a3
bug #969 : workaround abiguous calls to Ref using enable_if.
2015-03-06 17:51:31 +01:00
Gael Guennebaud
87681e508f
bug #978 : early return for vanishing products
2015-03-06 16:11:22 +01:00
Gael Guennebaud
cd3bbffa73
Improve blocking heuristic: if the lhs fit within L1, then block on the rhs in L1 (allows to keep packed rhs in L1)
2015-03-06 14:31:39 +01:00
Gael Guennebaud
58740ce4c6
Improve product kernel: replace the previous dynamic loop swaping strategy by a more general one:
...
It consists in increasing the actual number of rows of lhs's micro horizontal panel for small depth such that L1 cache is fully exploited.
2015-03-06 10:30:35 +01:00
Gael Guennebaud
4c8b95d5c5
Rename LSCG to LeastSquaresConjugateGradient
2015-03-05 10:16:32 +01:00
Gael Guennebaud
7550107028
Product optimization: implement a dynamic loop-swapping startegy to improve memory accesses to the destination matrix in the case of K-rank-update like products, i.e., for products of the kind: "large x small" * "small x large"
2015-03-05 10:03:46 +01:00
Gael Guennebaud
2dc968e453
bug #824 : improve accuracy of Quaternion::angularDistance using atan2 instead of acos.
2015-03-04 17:03:13 +01:00
Benoit Steiner
0196141938
Fixed the optimized AVX implementation of the fast rsqrt function
2015-03-02 13:49:39 -08:00
Benoit Steiner
4fd7f47692
Added an optimized version of rsqrt for SSE and AVX that is used when EIGEN_FAST_MATH is defined.
2015-03-02 09:38:47 -08:00
Benoit Steiner
fb53384b0f
Improved the default implementation of prsqrt
2015-02-28 01:51:26 -08:00
Benoit Steiner
306fceccbe
Pulled latest updates from trunk
2015-02-27 13:05:26 -08:00
Benoit Steiner
2386fc8528
Added support for 32bit index on a per tensor/tensor expression. This enables us to use 32bit indices to evaluate expressions on GPU faster while keeping the ability to use 64 bit indices to manipulate large tensors on CPU in the same binary.
2015-02-27 12:57:13 -08:00
Benoit Jacob
6466fa63be
Reimplement the selection between rotating and non-rotating kernels
...
using templates instead of macros and if()'s.
That was needed to fix the build of unit tests on ARM, which I had
broken. My bad for not testing earlier.
2015-02-27 15:30:10 -05:00
Benoit Steiner
05089aba75
Switch to truncated casting when converting floating point types to integer. This ensures that vectorized casts are consistent with scalar casts
2015-02-27 09:27:30 -08:00
Benoit Steiner
573b377110
Added support for vectorized type casting of tensors
2015-02-27 08:46:04 -08:00
Benoit Jacob
2fc3b484d7
remove trailing comma
2015-02-27 11:37:45 -05:00
Benoit Jacob
33669348c4
Disable Packet2f/2i halfpacket support in NEON.
...
I believe that it was erroneously turned on, since Packet2f/2i intrinsics are unimplemented,
and code trying to use halfpackets just fails to compile on NEON, as it tries to use the
default implementation of pload/pstore and the types don't match.
2015-02-27 11:35:37 -05:00
Benoit Jacob
b7fc8746e0
Replace a static assert by a runtime one, fixes the build of unit tests on ARM
...
Also safely assert in the non-implemented path that should never be taken in practice,
and would return wrong results.
2015-02-27 10:01:59 -05:00
Benoit Steiner
f41b1f1666
Added support for fast reciprocal square root computation.
2015-02-26 09:42:41 -08:00