Currently only the following platform/operations are supported:
- SSE2 compatible architecture
- compiler compatible with intel's SSE2 intrinsics
- float, double and int data types
- fixed size matrices with a storage major dimension multiple of 4 (or 2 for double)
- scalar-matrix product, component wise: +,-,*,min,max
- matrix-matrix product only if the left matrix is vectorizable and column major
or the right matrix is vectorizable and row major, e.g.:
a.transpose() * b is not vectorized with the default column major storage.
To use it you must define EIGEN_VECTORIZE and EIGEN_INTEL_PLATFORM.
- make use of CoeffReadCost to determine when to unroll the loops,
for now only in Product.h and in OperatorEquals.h
performance remains the same: generally still not as good as before the
big changes.
ei_xpr_copy to evaluate args when needed. Had to introduce an ugly
trick with ei_unref as when the XprCopy type is a reference one can't
directly access member typedefs such as Scalar.
in ei_xpr_copy and operator=, respectively.
* added Matrix::lazyAssign() when EvalBeforeAssigningBit must be skipped
(mainly internal use only)
* all expressions are now stored by const reference
* added Temporary xpr: .temporary() must be called on any temporary expression
not directly returned by a function (mainly internal use only)
* moved all functors in the Functors.h header
* added some preliminaries stuff for the explicit vectorization
to disable eigen's asserts without disabling one's own program's
asserts. Notice that Eigen code should now use ei_assert()
instead of assert().
* Remove findBiggestCoeff() as it's now almost redundant.
* Improve echelon.cpp: inner for loop replaced by xprs.
* remove useless "(*this)." here and there. I think they were
first introduced by automatic search&replace.
* fix compilation in Visitor.h (issue triggered by echelon.cpp)
* improve comment on swap().
* added cache efficient matrix-matrix product.
- provides a huge speed-up for large matrices.
- currently it is enabled when an explicit unrolling is not possible.
internal classes: AaBb -> ei_aa_bb
IntAtRunTimeIfDynamic -> ei_int_if_dynamic
unify UNROLLING_LIMIT (there was no reason to have operator= use
a higher limit)
etc...
Finally the importing macro is named EIGEN_BASIC_PUBLIC_INTERFACE
because it does not only import the ei_traits, it also makes the base class
a friend, etc.
template parameter "Scalar" is removed. This is achieved by introducting a
template <typename Derived> struct Scalar to achieve a forward-declaration of
the Scalar typedefs.
column-major order, even if storage is row-major. Benchmark showed that adapting
the traversal order to the storage order brought no benefit.
Also do some cleanup after Gael's big patch.