Benoit Steiner
|
ad59ade116
|
Vectorized the loop peeling of the inner loop of the block-panel matrix multiplication code. This speeds up the multiplication of matrices which size is not a multiple of the packet size.
|
2014-03-28 12:11:23 -07:00 |
|
Gael Guennebaud
|
10aa14592a
|
Add a mechanism to recursively access to half-size packet types
|
2014-03-28 10:18:04 +01:00 |
|
Gael Guennebaud
|
8d2bb2c20d
|
merge with default branch
|
2014-03-28 09:24:18 +01:00 |
|
Gael Guennebaud
|
c94fde118a
|
Enable vectorization of gemv for PacketSize>4 through unaligned loads (still better than no vectorization)
|
2014-03-28 09:11:06 +01:00 |
|
Benoit Steiner
|
51e85c936d
|
Merged latest changes from parent.
|
2014-03-27 18:32:15 -07:00 |
|
Benoit Steiner
|
8a94cb3edd
|
Implemented the SSE version of the gather and scatter packet primitives.
|
2014-03-27 18:29:01 -07:00 |
|
Benoit Steiner
|
7f3162f707
|
Implemented the AVX version of the gather and scatter packet primitives.
|
2014-03-27 17:42:25 -07:00 |
|
Benoit Steiner
|
ee86679096
|
Introduced pscatter/pgather packet primitives. They will be used to optimize the loop peeling code of the block-panel matrix multiplication kernel.
|
2014-03-27 16:03:03 -07:00 |
|
Gael Guennebaud
|
58fe2fc2b2
|
enforce the use of vfmadd231ps for pmadd (gcc and clang stupidely generates the other fmadd variants plus some register moves...)
|
2014-03-27 23:38:50 +01:00 |
|
Benoit Steiner
|
729363114f
|
Fixed compilation error when FMA instructions are enabled.
|
2014-03-27 11:20:41 -07:00 |
|
Benoit Steiner
|
1697d7a179
|
Silenced "unused variable" warnings when compiling with FMA.
|
2014-03-27 11:00:47 -07:00 |
|
Benoit Steiner
|
3e1fe8e416
|
Vectorized the packing of a col-major matrix used as the right hand side argument in a matrix-matrix product when AVX instructions are used. No vectorization takes place when SSE instructions are used, however this doesn't seem to impact performance.
|
2014-03-27 10:38:41 -07:00 |
|
Benoit Steiner
|
b776458ccb
|
Vectorized the packing of a row-major matrix used as the left hand side argument in a matrix-matrix product.
|
2014-03-27 10:02:24 -07:00 |
|
Benoit Steiner
|
c4902a3d01
|
Implemented the AVX version of the ptranspose packet primitive.
|
2014-03-27 09:34:51 -07:00 |
|
Gael Guennebaud
|
052aedd394
|
Implement pcplflip, palign, predux and the likes from AVC/complexes
|
2014-03-27 14:47:00 +01:00 |
|
Gael Guennebaud
|
fb03b56647
|
Fix warning
|
2014-03-27 11:38:35 +01:00 |
|
Benoit Steiner
|
a419cea4a0
|
Created the ptranspose packet primitive that can transpose an array of N packets, where N is the number of words in each packet. This primitive will be used to complete the vectorization of the gemm_pack_lhs and gemm_pack_rhs functions.
Implemented the primitive using SSE instructions.
|
2014-03-26 19:03:07 -07:00 |
|
Benoit Steiner
|
14bc4b9704
|
Made sure that the version of gemm_pack_rhs specialized for row major matrices is vectorized when nr == 2*PacketSize (which is the case for SSE when compiling in 64bit mode).
|
2014-03-26 17:35:18 -07:00 |
|
Benoit Steiner
|
e45a6bed45
|
Specialized the pload1 packet primitive for Packet8f and Packet4d in order to take advantage of the vbroadcastss and vbroadcastsd instructions whenever possible.
|
2014-03-26 15:58:13 -07:00 |
|
Benoit Steiner
|
cc73164aa8
|
Merged latest updates from the parent branch
|
2014-03-26 15:23:59 -07:00 |
|
Gael Guennebaud
|
f0a4c9d5ab
|
Update gebp kernel to process a panle of 4 columns at once for the remaining ones.
|
2014-03-26 23:22:36 +01:00 |
|
Gael Guennebaud
|
8be011e776
|
Remove remaining bits of the dead working buffer
|
2014-03-26 23:14:44 +01:00 |
|
Benoit Steiner
|
a078f442a3
|
Vectorized the multiplication and division of complex numbers using AVX instructions.
|
2014-03-26 15:11:18 -07:00 |
|
Benoit Steiner
|
cf1a7bfbe1
|
Used AVX instructions to vectorize the complex version of the pfirst and ploaddup packet primitives.
Silenced a few compilation warnings.
|
2014-03-26 12:03:31 -07:00 |
|
Gael Guennebaud
|
bc401eb6fa
|
Implement new 1 packet x 8 gebp kernel
|
2014-03-26 18:53:00 +01:00 |
|
Gael Guennebaud
|
b286a1e75c
|
add pbroadcast2/4 generic intrinsics
|
2014-03-26 16:46:36 +01:00 |
|
Benoit Steiner
|
6bf3cc2732
|
Use AVX instructions to vectorize pset1<Packet2cd>, pset1<Packet4cf>, preverse<Packet2cd>, and preverse<Packet4cf>
|
2014-03-25 09:00:43 -07:00 |
|
Benoit Steiner
|
7ae9b0805d
|
Used AVX instructions to vectorize the predux_min<Packet8f>, predux_min<Packet4d>, predux_max<Packet8f>, and predux_max<Packet4d> packet primitives.
|
2014-03-24 13:33:40 -07:00 |
|
Benoit Steiner
|
72707a8664
|
Made sure that EIGEN_ALIGN is defined when EIGEN_DONT_VECTORIZE is set to true to prevent build failures when vectorization is disabled.
|
2014-03-21 11:40:29 -07:00 |
|
Benoit Steiner
|
8a0845ebd7
|
Merged latest changes from the parent
|
2014-03-18 12:58:08 -07:00 |
|
Christoph Hertzberg
|
35a2c9cde7
|
clang does not accept this without template keyword
|
2014-03-14 16:48:29 +01:00 |
|
Gael Guennebaud
|
bb4b67cf39
|
Relax Ref such that Ref<MatrixXf> accepts a RowVectorXf which can be seen as a degenerate MatrixXf(1,N)
|
2014-03-13 18:04:19 +01:00 |
|
Gael Guennebaud
|
0a6c472335
|
A bit of cleaning
|
2014-03-13 15:44:20 +01:00 |
|
Christoph Hertzberg
|
2db792852f
|
Silence stupid parenthesis warnings for old GCC versions (<= 4.6.x)
|
2014-03-13 12:58:57 +01:00 |
|
Gael Guennebaud
|
aceae8314b
|
Resurect EvalBeforeNestingBit to control nested_eval
|
2014-03-12 20:25:36 +01:00 |
|
Gael Guennebaud
|
f74ed34539
|
Fix regressions in redux_evaluator flags and evaluator<Block> flags
|
2014-03-12 18:14:08 +01:00 |
|
Gael Guennebaud
|
5e26b7cf9d
|
Extend evaluation traits debuging info
|
2014-03-12 18:13:18 +01:00 |
|
Gael Guennebaud
|
74b1d79d77
|
merge default and evaluator branches
|
2014-03-12 16:24:25 +01:00 |
|
Gael Guennebaud
|
0b362e0c9a
|
This file is not needed anymore
|
2014-03-12 16:18:54 +01:00 |
|
Gael Guennebaud
|
a6be1952f4
|
Fix a few regression when moving the flags
|
2014-03-12 16:18:34 +01:00 |
|
Christoph Hertzberg
|
2379ccffcb
|
bug #755: CommaInitializer produced wrong assertions in absence of ReturnValueOptimization.
|
2014-03-12 13:48:09 +01:00 |
|
Christoph Hertzberg
|
88aa18df64
|
bug #759: Removed hard-coded double-math from Quaternion::angularDistance.
Some documentation improvements
|
2014-03-12 13:43:19 +01:00 |
|
Gael Guennebaud
|
0bd5671b9e
|
Fix Eigenvalues module
|
2014-03-12 13:35:44 +01:00 |
|
Gael Guennebaud
|
8dd3b716e3
|
Move evaluation related flags from traits to evaluator and fix evaluators of MapBase and Replicate
|
2014-03-12 13:34:11 +01:00 |
|
Gael Guennebaud
|
7eefdb948c
|
Migrate JacobiSVD to Solver
|
2014-03-11 13:43:46 +01:00 |
|
Gael Guennebaud
|
082f7ddc37
|
Port Cholesky module to evaluators
|
2014-03-11 13:33:44 +01:00 |
|
Christoph Hertzberg
|
bbc0ada12a
|
Avoid stupid "enumeral mismatch in conditional expression" warnings in GCC
|
2014-03-11 12:18:32 +01:00 |
|
Gael Guennebaud
|
9be72cda2a
|
Port QR module to Solve/Inverse
|
2014-03-11 11:47:32 +01:00 |
|
Gael Guennebaud
|
ae40583965
|
Fix CoeffReadCost issues
|
2014-03-11 11:47:14 +01:00 |
|
Gael Guennebaud
|
5806e73800
|
It is not clear what XprType::Nested should be, so let's use nested<Xpr>::type as much as possible
|
2014-03-11 11:44:11 +01:00 |
|