Benoit Steiner
c0f2cb016e
Extended support for Tensors:
...
* Added ability to map a region of the memory to a tensor
* Added basic support for unary and binary coefficient wise expressions, such as addition or square root
* Provided an emulation layer to make it possible to compile the code with compilers (such as nvcc) that don't support cxx11.
2014-04-28 10:32:27 -07:00
Gael Guennebaud
450d0c3de0
Make sure that calls to broadcast4 are 16 bytes aligned
2014-04-25 22:25:48 +02:00
Gael Guennebaud
f9d2f3903e
Product kernel: skip loop on columns if there is no remaining rows
2014-04-25 16:54:30 +02:00
Gael Guennebaud
6f64b0b487
Fix sizeof unit test
2014-04-25 14:05:54 +02:00
Gael Guennebaud
c20e3641de
Fix for mixed products
2014-04-25 13:22:34 +02:00
Gael Guennebaud
2dbfd83424
Implement pbroadcast4 on altivec
2014-04-25 02:46:57 -07:00
Gael Guennebaud
c9788d55b9
Disable 3pX4 kernel on Altivec: despite this platform has 32 registers, this version seems significantly slower.
2014-04-25 11:48:22 +02:00
Gael Guennebaud
4def7b1fa5
Fix ptranspose overload prototypes for NEON
2014-04-25 11:15:13 +02:00
Gael Guennebaud
c79bd4b64b
Minor optimizations in product kernel:
...
- use pbroadcast4 (helpful when AVX is not available)
- process all remaining rows at once (significant speedup for small matrices)
2014-04-25 11:06:03 +02:00
Gael Guennebaud
3d8d0f6269
Enable vectorization of pack_rhs with a column-major RHS.
...
Rename and generalize Kernel<*> to PacketBlock<*,N>.
2014-04-25 10:56:18 +02:00
Gael Guennebaud
b0e19db1cf
Enable fused madd for Altivec
2014-04-24 23:17:18 +02:00
Gael Guennebaud
8d85ce88e1
Implement ptranspose on altivec and fix pgather/pscatter
2014-04-24 05:47:53 -07:00
Benoit Steiner
4eb92e5647
Fixed the NEON implementation of predux_max<Packet4i>.
2014-04-23 18:23:07 -07:00
Benoit Steiner
ccb4dec719
Created a NEON version of the ptranspose packet primitives
2014-04-23 18:22:10 -07:00
Gael Guennebaud
82b09fcb91
Add Altivec implementation of pgather/pscatter (not tested)
2014-04-23 13:09:26 +02:00
Gael Guennebaud
ecbd67a15a
Fix EIGEN_MAKE_UNALIGNED_ARRAY_ASSERT macro
2014-04-22 17:03:57 +02:00
Gael Guennebaud
934ce93886
merge with default branch
2014-04-22 17:00:38 +02:00
Gael Guennebaud
5c5231ab71
Workaround gcc's default ABI not being able to distinghish between vector types of different sizes.
2014-04-22 16:03:19 +02:00
Gael Guennebaud
2606abed53
Fix 128bit packet size assumptions in unit tests.
2014-04-18 21:14:40 +02:00
Gael Guennebaud
a7d20038df
Fix alignment assertion.
2014-04-18 17:06:31 +02:00
Gael Guennebaud
3454b4e5f1
Fix calls to lazy products (lazy product does not like matrices with 0 length)
2014-04-18 17:06:03 +02:00
Gael Guennebaud
94684721bd
Smarter block size computation
2014-04-18 15:35:34 +02:00
Gael Guennebaud
1388f4f9fd
Fix typo (was working with clang\!)
2014-04-18 11:43:13 +02:00
Gael Guennebaud
6d665d446b
Fixes for fixed sizes and non vectorizable types
2014-04-17 23:26:34 +02:00
Gael Guennebaud
2c3c95990d
merge
2014-04-17 22:50:49 +02:00
Benoit Steiner
6d6df90c9a
Implemented the pgather/pscatter packet primitives for the arm/NEON architecture
2014-04-17 12:28:01 -07:00
Gael Guennebaud
9746396d1b
Optimize AVX pset1 for complexes and ploaddup
2014-04-17 20:51:04 +02:00
Gael Guennebaud
11fbdcbc38
Fix and optimize mixed products
2014-04-17 16:04:30 +02:00
Gael Guennebaud
0fa8290366
Optimize ploaddup for AVX
2014-04-17 16:02:27 +02:00
Gael Guennebaud
d936ddc3d1
Fallback to lazy products for very small ones.
2014-04-16 23:15:42 +02:00
Gael Guennebaud
de8336a9bc
Enable alloca on MAC OSX
2014-04-16 23:14:58 +02:00
Gael Guennebaud
d5a795f673
New gebp kernel handling up to 3 packets x 4 register-level blocks. Huge speeup on Haswell.
...
This changeset also introduce new vector functions: ploadquad and predux4.
2014-04-16 17:05:11 +02:00
Mark Borgerding
e0dbb68c2f
Check IMKL version for compatibility with Eigen
2014-04-15 13:57:03 -04:00
Gael Guennebaud
3c66bb136b
bug #793 : detect NaN and INF in EigenSolver instead of aborting with an assert.
2014-04-14 22:00:27 +02:00
Gael Guennebaud
7098e6d976
Add isfinite overload for complexes.
2014-04-14 21:57:49 +02:00
Benoit Steiner
feaf7c7e6d
Optimized SSE unaligned loads and stores when compiling a 64bit target with a recent version of gcc (ie gcc 4.8).
2014-04-14 10:44:17 -07:00
Gael Guennebaud
148acf8e4f
bug #790 : fix overflow in real_2x2_jacobi_svd
2014-04-14 13:52:16 +02:00
Gael Guennebaud
0587db8bf5
bug #793 : fix overflow in EigenSolver and add respective regression unit test
2014-04-14 11:43:08 +02:00
Benoit Steiner
1b333c89c9
Updated my previous fix to avoid introducing a compilation warning on ARM platforms.
2014-04-10 17:43:13 -07:00
Benoit Steiner
a1fcf599fa
Silenced a compilation warning produced by nvcc.
2014-04-10 11:19:37 -07:00
Jitse Niesen
a91a7a1964
doc: Add references to Cholesky methods in SelfAdjointView.
2014-04-07 14:14:48 +01:00
Benoit Steiner
b446ff037e
Deleted some dead code.
2014-04-04 14:12:24 -07:00
Christoph Hertzberg
096af59799
Fix bug #784 : Assert if assigning a product to a triangularView does not match the size.
2014-04-04 17:48:37 +02:00
Benoit Steiner
8044b00a7f
bug #782 : Workaround for gcc <= 4.4 compilation error on the NEON PacketMath code.
2014-04-03 23:41:47 +02:00
Gael Guennebaud
8d0441052e
Finally, prefetching seems to help getting more stable performance
2014-03-31 10:42:19 +02:00
Gael Guennebaud
1c0728043a
Workaround alignment warnings
2014-03-30 22:43:47 +02:00
Gael Guennebaud
e497a27ddc
Optimize gebp kernel:
...
1 - increase peeling level along the depth dimention (+5% for large matrices, i.e., >1000)
2 - improve pipelining when dealing with latest rows of the lhs
2014-03-30 21:57:05 +02:00
Benoit Steiner
ad59ade116
Vectorized the loop peeling of the inner loop of the block-panel matrix multiplication code. This speeds up the multiplication of matrices which size is not a multiple of the packet size.
2014-03-28 12:11:23 -07:00
Gael Guennebaud
10aa14592a
Add a mechanism to recursively access to half-size packet types
2014-03-28 10:18:04 +01:00
Gael Guennebaud
8d2bb2c20d
merge with default branch
2014-03-28 09:24:18 +01:00