** Much better organization
** Fix a few bugs
** Add the ability to unroll only the inner loop
** Add an unrolled path to the Like1D vectorization. Not well tested.
** Add placeholder for sliced vectorization. Unimplemented.
* Rework of corrected_flags:
** improve rules determining vectorizability
** for vectors, the storage-order is indifferent, so we tweak it
to allow vectorization of row-vectors.
* fix compilation in benchmark, and a warning in Transpose.
to optimize matrix-diag and diag-matrix products without
making Product over complicated.
* compilation fixes in Tridiagonalization and HessenbergDecomposition
in the case of 2x2 matrices.
* added an Orientation2D small class with similar interface than Quaternion
(used by Transform to handle 2D and 3D orientations seamlessly)
* added a couple of features in Transform.
* added ConjugateReturnType and AdjointReturnType that are type-defined to Derived&
and Transpose<Derived> if the scalar type is not complex: this avoids abusive copies in
the cache friendly Product
was a sign that we were doing something wrong. In fact, having
NestByValue as a special case of Flagged was wrong, and the previous
commit, while not buggy, was inefficient because then when the resulting
NestByValue xpr was nested -- hence copied -- the original xpr which was
already nested by value was copied again; hence instead of 1 copy we got
3 copies.
The solution was to ressuscitate the old Temporary.h (renamed
NestByValue.h) as it was the right approach.
finally that's more subtle than just using ei_nested, because when
flagging with NestByValueBit we want to store the expression by value
already, regardless of whether it already had the NestByValueBit set.
* rename temporary() ----> nestByValue()
* move the old Product.h to disabled/, replace by what was ProductWIP.h
* tweak -O and -g flags for tests and examples
* reorder the tests -- basic things go first
* simplifications, e.g. in many methoeds return derived() and count on
implicit casting to the actual return type.
* strip some not-really-useful stuff from the heaviest tests
Triangular class
- full meta-unrolling in Part
- move inverseProduct() to MatrixBase
- compilation fix in ProductWIP: introduce a meta-selector to only do
direct access on types that support it.
- phase out the old Product, remove the WIP_DIRTY stuff.
- misc renaming and fixes
(does not support complex and does not re-use the QR decomposition)
* Rewrite the cache friendly product to have only one instance per scalar type !
This significantly speeds up compilation time and reduces executable size.
The current drawback is that some trivial expressions might be
evaluated like conjugate or negate.
* Renamed "cache optimal" to "cache friendly"
* Added the ability to directly access matrix data of some expressions via:
- the stride()/_stride() methods
- DirectAccessBit flag (replace ReferencableBit)
* Introduce a new highly optimized matrix-matrix product for large
matrices. The code is still highly experimental and it is activated
only if you define EIGEN_WIP_PRODUCT at compile time.
Currently the third dimension of the product must be a factor of
the packet size (x4 for floats) and the right handed side matrix
must be column major.
Moreover, currently c = a*b; actually computes c += a*b !!
Therefore, the code is provided for experimentation purpose only !
These limitations will be fixed soon or later to become the default
product implementation.
Currently only the following platform/operations are supported:
- SSE2 compatible architecture
- compiler compatible with intel's SSE2 intrinsics
- float, double and int data types
- fixed size matrices with a storage major dimension multiple of 4 (or 2 for double)
- scalar-matrix product, component wise: +,-,*,min,max
- matrix-matrix product only if the left matrix is vectorizable and column major
or the right matrix is vectorizable and row major, e.g.:
a.transpose() * b is not vectorized with the default column major storage.
To use it you must define EIGEN_VECTORIZE and EIGEN_INTEL_PLATFORM.
in ei_xpr_copy and operator=, respectively.
* added Matrix::lazyAssign() when EvalBeforeAssigningBit must be skipped
(mainly internal use only)
* all expressions are now stored by const reference
* added Temporary xpr: .temporary() must be called on any temporary expression
not directly returned by a function (mainly internal use only)
* moved all functors in the Functors.h header
* added some preliminaries stuff for the explicit vectorization
when to evaluate arguments and when to meta-unroll.
-use it in Product to determine when to eval args. not yet used
to determine when to unroll. for now, not used anywhere else but
that'll follow.
-fix badness of my last commit
internal classes: AaBb -> ei_aa_bb
IntAtRunTimeIfDynamic -> ei_int_if_dynamic
unify UNROLLING_LIMIT (there was no reason to have operator= use
a higher limit)
etc...
Finally the importing macro is named EIGEN_BASIC_PUBLIC_INTERFACE
because it does not only import the ei_traits, it also makes the base class
a friend, etc.
template parameter "Scalar" is removed. This is achieved by introducting a
template <typename Derived> struct Scalar to achieve a forward-declaration of
the Scalar typedefs.
* functor templates are not template template parameter anymore
(this allows to make templated functors !)
* Main page: extented compiler discussion
* A small hack to support gcc 3.4 and 4.0 (see the main page)
* Fix a cast type issue in Cast
* Various doxygen updates (mainly Cwise stuff and added doxygen groups
in MatrixBase to split the huge memeber list, still not perfect though)
* Updated Gael's email address
- finally get the Eval stuff right. get back to having Eval as
a subclass of Matrix with limited functionality, and then,
add a typedef MatrixType to get the actual matrix type.
- add swap(), findBiggestCoeff()
- bugfix by Ramon in Transpose
- new demo: doc/echelon.cpp
dimension. The advantage is that evaluating a dynamic-sized block in a fixed-size
matrix no longer causes a dynamic memory allocation. Other new thing:
IntAtRunTimeIfDynamic allows storing an integer at zero cost if it is known at
compile time.
column-major order, even if storage is row-major. Benchmark showed that adapting
the traversal order to the storage order brought no benefit.
Also do some cleanup after Gael's big patch.