87 Commits

Author SHA1 Message Date
Gael Guennebaud
842c4f8bfa Several compilation fixes for MSVC and NVCC, basically:
- added explicit enum to int conversion where needed
- if a function is not defined as declared and the return type is "tricky"
  then the type must be typedefined somewhere. A "tricky return type" can be:
  * a template class with a default parameter which depends on another template parameter
  * a nested template class, or type of a nested template class
2008-07-29 16:33:07 +00:00
Gael Guennebaud
b7bd1b3446 Add a *very efficient* evaluation path for both col-major matrix * vector
and vector * row-major products. Currently, it is enabled only is the matrix
has DirectAccessBit flag and the product is "large enough".
Added the respective unit tests in test/product/cpp.
2008-07-12 12:12:02 +00:00
Benoit Jacob
2b53fd4d53 some performance fixes in Assign.h reported by Gael. Some doc update in
Cwise.
2008-07-10 16:15:55 +00:00
Benoit Jacob
a9d319d44f * do the ActualPacketAccesBit change as discussed on list
* add comment in Product.h about CanVectorizeInner
* fix typo in test/product.cpp
2008-07-04 12:43:55 +00:00
Gael Guennebaud
027818d739 * added innerSize / outerSize functions to MatrixBase
* added complete implementation of sparse matrix product
  (with a little glue in Eigen/Core)
* added an exhaustive bench of sparse products including GMM++ and MTL4
  => Eigen outperforms in all transposed/density configurations !
2008-06-28 23:07:14 +00:00
Benoit Jacob
e27b2b95cf * rework Map, allow vectorization
* rework PacketMath and DummyPacketMath, make these actual template
specializations instead of just overriding by non-template inline
functions
* introduce ei_ploadt and ei_pstoret, make use of them in Map and Matrix
* remove Matrix::map() methods, use Map constructors instead.
2008-06-27 01:22:35 +00:00
Benoit Jacob
25ba9f377c * add bench/benchVecAdd.cpp by Gael, fix crash (ei_pload on non-aligned)
* introduce packet(int), make use of it in linear vectorized paths
  --> completely fixes the slowdown noticed in benchVecAdd.
* generalize coeff(int) to linear-access xprs
* clarify the access flag bits
* rework api dox in Coeffs.h and util/Constants.h
* improve certain expressions's flags, allowing more vectorization
* fix bug in Block: start(int) and end(int) returned dyn*dyn size
* fix bug in Block: just because the Eval type has packet access
  doesn't imply the block xpr should have it too.
2008-06-26 16:06:41 +00:00
Gael Guennebaud
ac9aa47bbc optimize linear vectorization both in Assign and Sum (optimal amortized perf) 2008-06-23 15:50:28 +00:00
Benoit Jacob
dc9206cec5 split sum away from redux and vectorize it.
(could come back to redux after it has been vectorized,
and could serve as a starting point for that)
also make the abs2 functor vectorizable (for real types).
2008-06-23 10:32:48 +00:00
Benoit Jacob
8a967fb17c * implement slice vectorization. Because it uses unaligned
packet access, it is not certain that it will bring a performance
  improvement: benchmarking needed.
* improve logic choosing slice vectorization.
* fix typo in SSE packet math, causing crash in unaligned case.
* fix bug in Product, causing crash in unaligned case.
* add TEST_SSE3 CMake option.
2008-06-22 15:02:05 +00:00
Gael Guennebaud
e735692e37 move "enum" back to "const int" int ei_assign_impl: in fact, casting
enums to int is enough to get compile time constants with ICC.
2008-06-20 07:10:50 +00:00
Gael Guennebaud
fb4a151982 * more cleaning in Product
* make Matrix2f (and similar) vectorized using linear path
* fix a couple of warnings and compilation issues with ICC and gcc 3.3/3.4
  (cannot get Transform compiles with gcc 3.3/3.4, see the FIXME)
2008-06-19 23:00:51 +00:00
Benoit Jacob
bb1f4e44f1 * Block: row and column expressions in the inner direction
now have the Like1D flag.

* Big renaming:
  packetCoeff ---> packet
  VectorizableBit ---> PacketAccessBit
  Like1DArrayBit ---> LinearAccessBit
2008-06-16 14:54:31 +00:00
Benoit Jacob
9857764ae7 aaargh. 2008-06-16 11:20:29 +00:00
Benoit Jacob
478bfaf228 fix bug in computation of unrolling limit: div instead of mul 2008-06-16 11:18:59 +00:00
Benoit Jacob
c905b31b42 * Big rework of Assign.h:
** Much better organization
** Fix a few bugs
** Add the ability to unroll only the inner loop
** Add an unrolled path to the Like1D vectorization. Not well tested.
** Add placeholder for sliced vectorization. Unimplemented.

* Rework of corrected_flags:
** improve rules determining vectorizability
** for vectors, the storage-order is indifferent, so we tweak it
   to allow vectorization of row-vectors.

* fix compilation in benchmark, and a warning in Transpose.
2008-06-16 10:49:44 +00:00
Gael Guennebaud
6998037930 * move some compile time "if" to their respective unroller (assign and dot)
* fix a couple of compilation issues when unrolling is disabled
* reduce default unrolling limit to a more reasonable value
2008-06-07 01:07:48 +00:00
Gael Guennebaud
48262b9734 added a static assertion mechanism
(see notes in Core/util/StaticAssert.h for details)
2008-06-04 11:16:11 +00:00
Gael Guennebaud
f5e599e489 * replace compile-time-if by meta-selector in Assign.h
as it speed up compilation.
* fix minor typo introduced in the previous commit
2008-05-31 14:42:07 +00:00
Gael Guennebaud
c1559d3079 * updated the assignement operator macro so that overloads
in MatrixBase work
* removed product_selector and cleaned Product.h a bit
* cleaned Assign.h a bit
2008-05-28 22:56:19 +00:00
Gael Guennebaud
8711e26c8a * change Flagged to take into account NestByValue only
* bugfix in Assign and cache friendly product (weird that worked before)
* improved argument evaluation in Product
2008-05-28 22:11:47 +00:00
Benoit Jacob
953efdbfe7 - introduce Part and Extract classes, splitting and extending the former
Triangular class
- full meta-unrolling in Part
- move inverseProduct() to MatrixBase
- compilation fix in ProductWIP: introduce a meta-selector to only do
  direct access on types that support it.
- phase out the old Product, remove the WIP_DIRTY stuff.
- misc renaming and fixes
2008-05-27 05:47:30 +00:00
Gael Guennebaud
4317fad869 * Added several cast to int of the enums (needed for some compilers)
* Fix a mistake in CwiseNullary.
* Added a CoreDeclarions header that declares only the forward declarations
  and related basic stuffs.
2008-05-12 18:09:30 +00:00
Benoit Jacob
678f18fce4 put inline keywords everywhere appropriate. So we don't need anymore to pass
-finline-limit=1000 to gcc to get good performance. By the way some cleanup.
2008-05-12 17:34:46 +00:00
Benoit Jacob
3562b01105 * Give Konstantinos a copyright line
* Fix compilation of Inverse.h with vectorisation
* Introduce EIGEN_GNUC_AT_LEAST(x,y) macro doing future-proof (e.g. gcc v5.0) check
* Only use ProductWIP if vectorisation is enabled
* rename EIGEN_ALWAYS_INLINE -> EIGEN_INLINE with fall-back to inline keyword
* some cleanup/indentation
2008-05-12 08:12:40 +00:00
Gael Guennebaud
46fa4c713f * Started support for unaligned vectorization.
* Introduce a new highly optimized matrix-matrix product for large
  matrices. The code is still highly experimental and it is activated
  only if you define EIGEN_WIP_PRODUCT at compile time.
  Currently the third dimension of the product must be a factor of
  the packet size (x4 for floats) and the right handed side matrix
  must be column major.
  Moreover, currently c = a*b; actually computes c += a*b !!
  Therefore, the code is provided for experimentation purpose only !
  These limitations will be fixed soon or later to become the default
  product implementation.
2008-05-05 10:23:29 +00:00
Benoit Jacob
890a8de962 Make products always eval into expressions. Improves performance
in benchmark. Still not as fasts as explicit eval(), strangely.
2008-05-02 08:53:23 +00:00
Gael Guennebaud
02f1615d2a Enable vectorization of product with dynamic matrices,
extended cache optimal product to work in any row/column
major situations, and a few bugfixes (forgot to add the
Cholesky header, vectorization of CwiseBinary)
2008-05-01 13:53:05 +00:00
Gael Guennebaud
1ec2d21ca5 Fixed a couple of issues introduced in previous commits.
Added a test for Triangular.
2008-04-26 20:28:27 +00:00
Gael Guennebaud
b4c974d059 Added triangular assignement, e.g.:
m.upper() = a+b;
only updates the upper triangular part of m.
Note that:
 m = (a+b).upper();
updates all coefficients of m (but half of the additions
will be skiped)

Updated back/forward substitution to better use Eigen's capability.
2008-04-26 19:20:26 +00:00
Gael Guennebaud
6f2c72fb53 Various fixes in:
- vector to vector assign
 - PartialRedux
 - Vectorization criteria of Product
 - returned type of normalized
 - SSE integer mul
2008-04-25 23:10:37 +00:00
Gael Guennebaud
a451835bce Make the explicit vectorization much more flexible:
- support dynamic sizes
 - support arbitrary matrix size when the matrix can be seen as a 1D array
   (except for fixed size matrices where the size in Bytes must be a factor of 16,
    this is to allow compact storage of a vector of matrices)
Note that the explict vectorization is still experimental and far to be completely tested.
2008-04-25 15:46:18 +00:00
Benoit Jacob
6ae037dfb5 give up on OpenMP... for now 2008-04-18 07:57:46 +00:00
Benoit Jacob
ea3ccb1e8c * Start of the LU module, with matrix inversion already there and
fully optimized.
* Even if LargeBit is set, only parallelize for large enough objects
  (controlled by EIGEN_PARALLELIZATION_TRESHOLD).
2008-04-14 08:20:24 +00:00
Benoit Jacob
dcebc46cdc - cleaner use of OpenMP (no code duplication anymore)
using a macro and _Pragma.
- use OpenMP also in cacheOptimalProduct and in the
  vectorized paths as well
- kill the vector assignment unroller. implement in
  operator= the logic for assigning a row-vector in
  a col-vector.
- CMakeLists support for building tests/examples
  with -fopenmp and/or -msse2
- updates in bench/, especially replace identity()
  by ones() which prevents underflows from perturbing
  bench results.
2008-04-11 14:28:42 +00:00
Benoit Jacob
7bee90a62a Merge Gael's experimental OpenMP parallelization support into Assign.h. 2008-04-11 08:18:47 +00:00
Benoit Jacob
9d8876ce82 * rename XprCopy -> Nested
* rename OperatorEquals -> Assign
* move Util.h and FwDecl.h to a util/ subdir
2008-04-10 09:01:28 +00:00