* previous DiagonalMatrix expression is now DiagonalMatrixWrapper
* DiagonalMatrix class is now for storage
* add the DiagonalMatrixBase class to factorize code of the
two previous classes
* remove Scaling class (it is now a global function)
* add UniformScaling helper class
(don't use it directly, use the Scaling function)
* add the Scaling global function to simplify the creation
of scaling objects
There is still a lot to do, in particular about DiagonalProduct for which
the goal is to get rid of the "if()" in the coeff() function. At least
it is not worse than before ! Also need to uptade the tutorial and add more doc.
to "public:method()" i.e. reimplementing the generic method()
from MatrixBase.
improves compilation speed by 7%, reduces almost by half the call depth
of trivial functions, making gcc errors and application backtraces
nicer...
* introduce packet(int), make use of it in linear vectorized paths
--> completely fixes the slowdown noticed in benchVecAdd.
* generalize coeff(int) to linear-access xprs
* clarify the access flag bits
* rework api dox in Coeffs.h and util/Constants.h
* improve certain expressions's flags, allowing more vectorization
* fix bug in Block: start(int) and end(int) returned dyn*dyn size
* fix bug in Block: just because the Eval type has packet access
doesn't imply the block xpr should have it too.
* added ConjugateReturnType and AdjointReturnType that are type-defined to Derived&
and Transpose<Derived> if the scalar type is not complex: this avoids abusive copies in
the cache friendly Product
was a sign that we were doing something wrong. In fact, having
NestByValue as a special case of Flagged was wrong, and the previous
commit, while not buggy, was inefficient because then when the resulting
NestByValue xpr was nested -- hence copied -- the original xpr which was
already nested by value was copied again; hence instead of 1 copy we got
3 copies.
The solution was to ressuscitate the old Temporary.h (renamed
NestByValue.h) as it was the right approach.