88 Commits

Author SHA1 Message Date
Benoit Jacob
8b4945a5a2 add some static asserts, use them, fix gcc 4.3 warning in Product.h. 2008-07-19 00:25:41 +00:00
Gael Guennebaud
22a816ade8 * Fix a couple of issues related to the recent cache friendly products
* Improve the efficiency of matrix*vector in unaligned cases
* Trivial fixes in the destructors of MatrixStorage
* Removed the matrixNorm in test/product.cpp (twice faster and
  that assumed the matrix product was ok while checking that !!)
2008-07-19 00:09:01 +00:00
Gael Guennebaud
99a625243f Optimization: added super efficient rowmajor * vector product (and vector * colmajor).
It basically performs 4 dot products at once reducing loads of the vector and improving
instructions scheduling. With 3 cache friendly algorithms, we now handle all product
configurations with outstanding perf for large matrices.
2008-07-13 01:22:54 +00:00
Gael Guennebaud
861d18d553 * Optimization: added a specialization of Block for xpr with DirectAccessBit
* some simplifications and fixes in cache friendly products
2008-07-12 22:59:34 +00:00
Gael Guennebaud
b7bd1b3446 Add a *very efficient* evaluation path for both col-major matrix * vector
and vector * row-major products. Currently, it is enabled only is the matrix
has DirectAccessBit flag and the product is "large enough".
Added the respective unit tests in test/product/cpp.
2008-07-12 12:12:02 +00:00
Gael Guennebaud
c9b046d5d5 * added optimized paths for matrix-vector and vector-matrix products
(using either a cache friendly strategy or re-using dot-product
  vectorized implementation)
* add LinearAccessBit to Transpose
2008-07-09 22:30:18 +00:00
Benoit Jacob
a9d319d44f * do the ActualPacketAccesBit change as discussed on list
* add comment in Product.h about CanVectorizeInner
* fix typo in test/product.cpp
2008-07-04 12:43:55 +00:00
Gael Guennebaud
8463b7d3f4 * fix compilation issue in Product
* added some tests for product and swap
* overload .swap() for dynamic-sized matrix of same size
2008-07-02 16:05:33 +00:00
Gael Guennebaud
027818d739 * added innerSize / outerSize functions to MatrixBase
* added complete implementation of sparse matrix product
  (with a little glue in Eigen/Core)
* added an exhaustive bench of sparse products including GMM++ and MTL4
  => Eigen outperforms in all transposed/density configurations !
2008-06-28 23:07:14 +00:00
Gael Guennebaud
e5d301dc96 various work on the Sparse module:
* added some glue to Eigen/Core (SparseBit, ei_eval, Matrix)
* add two new sparse matrix types:
   HashMatrix: based on std::map (for random writes)
   LinkedVectorMatrix: array of linked vectors
   (for outer coherent writes, e.g. to transpose a matrix)
* add a SparseSetter class to easily set/update any kind of matrices, e.g.:
   { SparseSetter<MatrixType,RandomAccessPattern> wrapper(mymatrix);
     for (...) wrapper->coeffRef(rand(),rand()) = rand(); }
* automatic shallow copy for RValue
* and a lot of mess !
plus:
* remove the remaining ArrayBit related stuff
* don't use alloca in product for very large memory allocation
2008-06-26 23:22:26 +00:00
Benoit Jacob
c5bd1703cb change derived classes methods from "private:_method()"
to "public:method()" i.e. reimplementing the generic method()
from MatrixBase.
improves compilation speed by 7%, reduces almost by half the call depth
of trivial functions, making gcc errors and application backtraces
nicer...
2008-06-26 20:08:16 +00:00
Benoit Jacob
25ba9f377c * add bench/benchVecAdd.cpp by Gael, fix crash (ei_pload on non-aligned)
* introduce packet(int), make use of it in linear vectorized paths
  --> completely fixes the slowdown noticed in benchVecAdd.
* generalize coeff(int) to linear-access xprs
* clarify the access flag bits
* rework api dox in Coeffs.h and util/Constants.h
* improve certain expressions's flags, allowing more vectorization
* fix bug in Block: start(int) and end(int) returned dyn*dyn size
* fix bug in Block: just because the Eval type has packet access
  doesn't imply the block xpr should have it too.
2008-06-26 16:06:41 +00:00
Benoit Jacob
8a967fb17c * implement slice vectorization. Because it uses unaligned
packet access, it is not certain that it will bring a performance
  improvement: benchmarking needed.
* improve logic choosing slice vectorization.
* fix typo in SSE packet math, causing crash in unaligned case.
* fix bug in Product, causing crash in unaligned case.
* add TEST_SSE3 CMake option.
2008-06-22 15:02:05 +00:00
Gael Guennebaud
fb4a151982 * more cleaning in Product
* make Matrix2f (and similar) vectorized using linear path
* fix a couple of warnings and compilation issues with ICC and gcc 3.3/3.4
  (cannot get Transform compiles with gcc 3.3/3.4, see the FIXME)
2008-06-19 23:00:51 +00:00
Gael Guennebaud
82c3cea1d5 * refactoring of Product:
* use ProductReturnType<>::Type to get the correct Product xpr type
  * Product is no longer instanciated for xpr types which are evaluated
  * vectorization of "a.transpose() * b" for the normal product (small and fixed-size matrix)
  * some cleanning
* removed ArrayBase
2008-06-19 17:33:57 +00:00
Gael Guennebaud
5dbfed1902 fix two bugs dicovered by the previous commit. 2008-06-16 16:39:58 +00:00
Benoit Jacob
bb1f4e44f1 * Block: row and column expressions in the inner direction
now have the Like1D flag.

* Big renaming:
  packetCoeff ---> packet
  VectorizableBit ---> PacketAccessBit
  Like1DArrayBit ---> LinearAccessBit
2008-06-16 14:54:31 +00:00
Benoit Jacob
c905b31b42 * Big rework of Assign.h:
** Much better organization
** Fix a few bugs
** Add the ability to unroll only the inner loop
** Add an unrolled path to the Like1D vectorization. Not well tested.
** Add placeholder for sliced vectorization. Unimplemented.

* Rework of corrected_flags:
** improve rules determining vectorizability
** for vectors, the storage-order is indifferent, so we tweak it
   to allow vectorization of row-vectors.

* fix compilation in benchmark, and a warning in Transpose.
2008-06-16 10:49:44 +00:00
Gael Guennebaud
0ee6b08128 * split Product to a DiagonalProduct template specialization
to optimize matrix-diag and diag-matrix products without
  making Product over complicated.
* compilation fixes in Tridiagonalization and HessenbergDecomposition
  in the case of 2x2 matrices.
* added an Orientation2D small class with similar interface than Quaternion
  (used by Transform to handle 2D and 3D orientations seamlessly)
* added a couple of features in Transform.
2008-06-15 11:54:18 +00:00
Gael Guennebaud
fbbd8afe30 Started a Transform class in the Geometry module to represent
homography.
Fix indentation in Quaternion.h
2008-06-15 08:33:44 +00:00
Benoit Jacob
c90c77051f * make the _Flags template parameter of Matrix default to the corrected
flags. This ensures that unless explicitly messed up otherwise,
  a Matrix type is equal to its own Eval type. This seriously reduces
  the number of types instantiated. Measured +13% compile speed, -7%
  binary size.

* Improve doc of Matrix template parameters.
2008-06-13 07:53:45 +00:00
Gael Guennebaud
6998037930 * move some compile time "if" to their respective unroller (assign and dot)
* fix a couple of compilation issues when unrolling is disabled
* reduce default unrolling limit to a more reasonable value
2008-06-07 01:07:48 +00:00
Gael Guennebaud
a0cff1a295 fix eigenvectors computations :) 2008-06-03 18:03:55 +00:00
Benoit Jacob
dc5fd8dfff meagre outcome for so much time spent!
* fix inverse() bug discovered by Gael's test
* fix warnings introduced by the new Diagonal stuff
* update Doxyfile to v1.5.6
2008-06-01 03:36:49 +00:00
Gael Guennebaud
fcf4457b78 added optimized matrix times diagonal matrix product via Diagonal flag shortcut. 2008-05-31 21:35:11 +00:00
Gael Guennebaud
c9fb248c36 simply a bit the basic product moving dynamic loops
to the corresponding special case of the unrollers.
the latter ones are therefore re-named *product_impl.
2008-05-31 15:06:26 +00:00
Gael Guennebaud
f5e599e489 * replace compile-time-if by meta-selector in Assign.h
as it speed up compilation.
* fix minor typo introduced in the previous commit
2008-05-31 14:42:07 +00:00
Gael Guennebaud
e2ac5d244e Added ArrayBit to get the ability to manipulate a Matrix like a simple scalar.
In particular this flag changes the behavior of operator* to a coeff wise product.
2008-05-29 22:33:07 +00:00
Gael Guennebaud
c1559d3079 * updated the assignement operator macro so that overloads
in MatrixBase work
* removed product_selector and cleaned Product.h a bit
* cleaned Assign.h a bit
2008-05-28 22:56:19 +00:00
Gael Guennebaud
8711e26c8a * change Flagged to take into account NestByValue only
* bugfix in Assign and cache friendly product (weird that worked before)
* improved argument evaluation in Product
2008-05-28 22:11:47 +00:00
Gael Guennebaud
73084dc754 * added _*coeffRef members in NestedByValue
* added ConjugateReturnType and AdjointReturnType that are type-defined to Derived&
  and Transpose<Derived> if the scalar type is not complex: this avoids abusive copies in
  the cache friendly Product
2008-05-28 09:09:18 +00:00
Benoit Jacob
f54760c889 hehe, the complicated nesting scheme in Flagged in the previous commit
was a sign that we were doing something wrong. In fact, having
NestByValue as a special case of Flagged was wrong, and the previous
commit, while not buggy, was inefficient because then when the resulting
NestByValue xpr was nested -- hence copied -- the original xpr which was
already nested by value was copied again; hence instead of 1 copy we got
3 copies.
The solution was to ressuscitate the old Temporary.h (renamed
NestByValue.h) as it was the right approach.
2008-05-28 05:14:16 +00:00
Benoit Jacob
aebecae510 * find the proper way of nesting the expression in Flagged:
finally that's more subtle than just using ei_nested, because when
  flagging with NestByValueBit we want to store the expression by value
  already, regardless of whether it already had the NestByValueBit set.
* rename temporary() ----> nestByValue()
* move the old Product.h to disabled/, replace by what was ProductWIP.h
* tweak -O and -g flags for tests and examples
* reorder the tests -- basic things go first
* simplifications, e.g. in many methoeds return derived() and count on
  implicit casting to the actual return type.
* strip some not-really-useful stuff from the heaviest tests
2008-05-28 04:38:16 +00:00
Benoit Jacob
5da60897ab Introduce generic Flagged xpr, remove already Lazy.h and Temporary.h
Rename DefaultLostFlagMask --> HerediraryBits
2008-05-14 08:20:15 +00:00
Benoit Jacob
678f18fce4 put inline keywords everywhere appropriate. So we don't need anymore to pass
-finline-limit=1000 to gcc to get good performance. By the way some cleanup.
2008-05-12 17:34:46 +00:00
Gael Guennebaud
45cda6704a * Draft of a eigenvalues solver
(does not support complex and does not re-use the QR decomposition)

* Rewrite the cache friendly product to have only one instance per scalar type !
  This significantly speeds up compilation time and reduces executable size.
  The current drawback is that some trivial expressions might be
  evaluated like conjugate or negate.

* Renamed "cache optimal" to "cache friendly"

* Added the ability to directly access matrix data of some expressions via:
  - the stride()/_stride() methods
  - DirectAccessBit flag (replace ReferencableBit)
2008-05-12 10:23:09 +00:00
Gael Guennebaud
bf5326c3ca * Added ReferencableBit flag to known if coeffRef is available.
(needed by the new product implementation)
* Make the packet* members template to support aligned and unaligned
  access. This makes Block vectorizable. Combined with ReferencableBit,
  we should be able to determine at runtime (in some specific cases) if
  an aligned vectorization is possible or not.
* Improved the new product implementation to robustly handle all cases,
  it now passes all the tests.
* Renamed the packet version ei_predux to ei_preduxp to avoid name collision.
2008-05-08 08:12:52 +00:00
Gael Guennebaud
46fa4c713f * Started support for unaligned vectorization.
* Introduce a new highly optimized matrix-matrix product for large
  matrices. The code is still highly experimental and it is activated
  only if you define EIGEN_WIP_PRODUCT at compile time.
  Currently the third dimension of the product must be a factor of
  the packet size (x4 for floats) and the right handed side matrix
  must be column major.
  Moreover, currently c = a*b; actually computes c += a*b !!
  Therefore, the code is provided for experimentation purpose only !
  These limitations will be fixed soon or later to become the default
  product implementation.
2008-05-05 10:23:29 +00:00
Benoit Jacob
8c6007f80e * Patch by Konstantinos Margaritis: AltiVec vectorization.
* Fix several warnings, temporarily disable determinant test.
2008-05-03 12:21:23 +00:00
Gael Guennebaud
0545df2149 slighly improved the cache friendly product to use mul-add only 2008-05-03 10:01:30 +00:00
Gael Guennebaud
a6655dd91a added packet mul-add function (ei_pmad) and updated Product to use it.
this change nothing for current SSE architecture but might be helpful
for altivec/cell and up comming AMD processors.
2008-05-03 00:45:08 +00:00
Gael Guennebaud
102e029dad Removed ei_pload1, use posix_memalign to allocate aligned memory,
and make Product ok when only one side is vectorizable (and the product
is still vectorized)
2008-05-02 13:30:12 +00:00
Benoit Jacob
890a8de962 Make products always eval into expressions. Improves performance
in benchmark. Still not as fasts as explicit eval(), strangely.
2008-05-02 08:53:23 +00:00
Gael Guennebaud
02f1615d2a Enable vectorization of product with dynamic matrices,
extended cache optimal product to work in any row/column
major situations, and a few bugfixes (forgot to add the
Cholesky header, vectorization of CwiseBinary)
2008-05-01 13:53:05 +00:00
Gael Guennebaud
4c92150676 Added Triangular expression to extract upper or lower (strictly or not)
part of a matrix. Triangular also provide an optimised method for forward
and backward substitution. Further optimizations regarding assignments and
products might come later.

Updated determinant() to take into account triangular matrices.

Started the QR module with a QR decompostion algorithm.
Help needed to build a QR algorithm (eigen solver) based on it.
2008-04-26 18:26:05 +00:00
Gael Guennebaud
6f2c72fb53 Various fixes in:
- vector to vector assign
 - PartialRedux
 - Vectorization criteria of Product
 - returned type of normalized
 - SSE integer mul
2008-04-25 23:10:37 +00:00
Gael Guennebaud
a451835bce Make the explicit vectorization much more flexible:
- support dynamic sizes
 - support arbitrary matrix size when the matrix can be seen as a 1D array
   (except for fixed size matrices where the size in Bytes must be a factor of 16,
    this is to allow compact storage of a vector of matrices)
Note that the explict vectorization is still experimental and far to be completely tested.
2008-04-25 15:46:18 +00:00
Gael Guennebaud
9385793f71 Fix a couple of issue with the vectorization. In particular, default ei_p* functions
are provided to handle not suported types seemlessly.

Added a generic null-ary expression with null-ary functors. They replace
Zero, Ones, Identity and Random.
2008-04-24 18:35:39 +00:00
Benoit Jacob
6ae037dfb5 give up on OpenMP... for now 2008-04-18 07:57:46 +00:00
Benoit Jacob
ea3ccb1e8c * Start of the LU module, with matrix inversion already there and
fully optimized.
* Even if LargeBit is set, only parallelize for large enough objects
  (controlled by EIGEN_PARALLELIZATION_TRESHOLD).
2008-04-14 08:20:24 +00:00