This is substantially faster on ARM, where it's important to minimize the number of loads.
This is specific to the case where all packet types are of size 4. I made my best attempt to minimize how dirty this is... opinions welcome.
Eventually one could have a generic rotated kernel, but it would take some work to get there. Also, on sandy bridge, in my experience, it's not beneficial (even about 1% slower).
* add lots of static assertions making it very explicit when all these ops
are supposed to work:
** all ops require the rhs vector to go in the right direction
** all ops already require that the lhs and rhs are of the same kind
(matrix vs vector) otherwise we'd have to do complex work
** multiplicative ops (introduced Kibeom's patch) are restricted to arrays, if only because for matrices they could be ambiguous.
* add a new test, vectorwiseop.cpp.
* these compound-assign operators used to be implemented with for loops:
for(Index j=0; j<subVectors(); ++j)
subVector(j).array() += other.derived().array();
This didn't seem to be needed; replaced by using expressions like operator+ and operator- did.
it was calling the .value() method on an inner product, and that was blocked in bad zero-sized case.
fixed by adding the .value() method to DenseBase for all 1x1 expressions, and allowing coeff accessors in ProductBase for 1x1 expressions.
Removed default parameter from Transform.
Removed the TransformXX typedefs.
Removed references to TransformXX from unit tests and docs.
Assigning Transforms to a sub-group is now forbidden at compile time.
Products should now properly support the Isometry flag.
Fixed alignment checks in MapBase.
* Now completely generic so all standard integer types (like char...) are supported.
** add unit test for that (integer_types).
* NumTraits does no longer inherit numeric_limits
* All math functions are now templated
* Better guard (static asserts) against using certain math functions on integer types.
of ei_matrix_array for size 0
* adapt many xprs to have the right storage order, now that it matters
* add static assert on expressions to check that vector xprs
have the righ storage order
* adapt ei_plain_matrix_type_(column|row)_major
* implement assignment of selfadjointview to matrix
(was before failing to compile) and add nestedExpression() methods
* expand product_symm test
* in ei_gemv_selector, use the PlainObject type instead of a custom Matrix<...> type
* fix VectorBlock and Block mistakes
* introduce a lazy product version of the coefficient based implementation
=> flagged is not used anymore
=> small outer product are now lazy by default (aliasing is really unlikely for outer products)
* support resize() to same size (nop). The case of FFT was another case where that make one's life far easier.
hope that's ok with you Gael. but indeed, i don't use it in the ReturnByValue stuff.
FFT:
* Support MatrixBase (well, in the case with direct memory access such as Map)
* adapt unit test
* previous DiagonalMatrix expression is now DiagonalMatrixWrapper
* DiagonalMatrix class is now for storage
* add the DiagonalMatrixBase class to factorize code of the
two previous classes
* remove Scaling class (it is now a global function)
* add UniformScaling helper class
(don't use it directly, use the Scaling function)
* add the Scaling global function to simplify the creation
of scaling objects
There is still a lot to do, in particular about DiagonalProduct for which
the goal is to get rid of the "if()" in the coeff() function. At least
it is not worse than before ! Also need to uptade the tutorial and add more doc.
* try to be clever in matrix ctors and operator=: be lazy when we can, always allow
to copy rowvector into columnvector, check the template parameters,
try to factor the code better
* add missing copy ctor in UnalignedType
* fix bug in the traits of DiagonalProduct
* renaming: EIGEN_TUNE_FOR_CPU_CACHE_SIZE
* update the dox a little