89 Commits

Author SHA1 Message Date
Gael Guennebaud
4fa40367e9 * Big change in Block and Map:
- added a MapBase base xpr on top of which Map and the specialization
    of Block are implemented
  - MapBase forces both aligned loads (and aligned stores, see below) in expressions
    such as "x.block(...) += other_expr"
* Significant vectorization improvement:
 - added a AlignedBit flag meaning the first coeff/packet is aligned,
   this allows to not generate extra code to deal with the first unaligned part
 - removed all unaligned stores when no unrolling
 - removed unaligned loads in Sum when the input as the DirectAccessBit flag
* Some code simplification in CacheFriendly product
* Some minor documentation improvements
2008-08-09 18:41:24 +00:00
Benoit Jacob
c94be35bc8 introduce copyCoeff and copyPacket methods in MatrixBase, used by
Assign, in preparation for new Swap impl reusing Assign code.
remove last remnant of old Inverse class in Transform.
2008-08-05 18:00:23 +00:00
Gael Guennebaud
842c4f8bfa Several compilation fixes for MSVC and NVCC, basically:
- added explicit enum to int conversion where needed
- if a function is not defined as declared and the return type is "tricky"
  then the type must be typedefined somewhere. A "tricky return type" can be:
  * a template class with a default parameter which depends on another template parameter
  * a nested template class, or type of a nested template class
2008-07-29 16:33:07 +00:00
Gael Guennebaud
b7bd1b3446 Add a *very efficient* evaluation path for both col-major matrix * vector
and vector * row-major products. Currently, it is enabled only is the matrix
has DirectAccessBit flag and the product is "large enough".
Added the respective unit tests in test/product/cpp.
2008-07-12 12:12:02 +00:00
Benoit Jacob
2b53fd4d53 some performance fixes in Assign.h reported by Gael. Some doc update in
Cwise.
2008-07-10 16:15:55 +00:00
Benoit Jacob
a9d319d44f * do the ActualPacketAccesBit change as discussed on list
* add comment in Product.h about CanVectorizeInner
* fix typo in test/product.cpp
2008-07-04 12:43:55 +00:00
Gael Guennebaud
027818d739 * added innerSize / outerSize functions to MatrixBase
* added complete implementation of sparse matrix product
  (with a little glue in Eigen/Core)
* added an exhaustive bench of sparse products including GMM++ and MTL4
  => Eigen outperforms in all transposed/density configurations !
2008-06-28 23:07:14 +00:00
Benoit Jacob
e27b2b95cf * rework Map, allow vectorization
* rework PacketMath and DummyPacketMath, make these actual template
specializations instead of just overriding by non-template inline
functions
* introduce ei_ploadt and ei_pstoret, make use of them in Map and Matrix
* remove Matrix::map() methods, use Map constructors instead.
2008-06-27 01:22:35 +00:00
Benoit Jacob
25ba9f377c * add bench/benchVecAdd.cpp by Gael, fix crash (ei_pload on non-aligned)
* introduce packet(int), make use of it in linear vectorized paths
  --> completely fixes the slowdown noticed in benchVecAdd.
* generalize coeff(int) to linear-access xprs
* clarify the access flag bits
* rework api dox in Coeffs.h and util/Constants.h
* improve certain expressions's flags, allowing more vectorization
* fix bug in Block: start(int) and end(int) returned dyn*dyn size
* fix bug in Block: just because the Eval type has packet access
  doesn't imply the block xpr should have it too.
2008-06-26 16:06:41 +00:00
Gael Guennebaud
ac9aa47bbc optimize linear vectorization both in Assign and Sum (optimal amortized perf) 2008-06-23 15:50:28 +00:00
Benoit Jacob
dc9206cec5 split sum away from redux and vectorize it.
(could come back to redux after it has been vectorized,
and could serve as a starting point for that)
also make the abs2 functor vectorizable (for real types).
2008-06-23 10:32:48 +00:00
Benoit Jacob
8a967fb17c * implement slice vectorization. Because it uses unaligned
packet access, it is not certain that it will bring a performance
  improvement: benchmarking needed.
* improve logic choosing slice vectorization.
* fix typo in SSE packet math, causing crash in unaligned case.
* fix bug in Product, causing crash in unaligned case.
* add TEST_SSE3 CMake option.
2008-06-22 15:02:05 +00:00
Gael Guennebaud
e735692e37 move "enum" back to "const int" int ei_assign_impl: in fact, casting
enums to int is enough to get compile time constants with ICC.
2008-06-20 07:10:50 +00:00
Gael Guennebaud
fb4a151982 * more cleaning in Product
* make Matrix2f (and similar) vectorized using linear path
* fix a couple of warnings and compilation issues with ICC and gcc 3.3/3.4
  (cannot get Transform compiles with gcc 3.3/3.4, see the FIXME)
2008-06-19 23:00:51 +00:00
Benoit Jacob
bb1f4e44f1 * Block: row and column expressions in the inner direction
now have the Like1D flag.

* Big renaming:
  packetCoeff ---> packet
  VectorizableBit ---> PacketAccessBit
  Like1DArrayBit ---> LinearAccessBit
2008-06-16 14:54:31 +00:00
Benoit Jacob
9857764ae7 aaargh. 2008-06-16 11:20:29 +00:00
Benoit Jacob
478bfaf228 fix bug in computation of unrolling limit: div instead of mul 2008-06-16 11:18:59 +00:00
Benoit Jacob
c905b31b42 * Big rework of Assign.h:
** Much better organization
** Fix a few bugs
** Add the ability to unroll only the inner loop
** Add an unrolled path to the Like1D vectorization. Not well tested.
** Add placeholder for sliced vectorization. Unimplemented.

* Rework of corrected_flags:
** improve rules determining vectorizability
** for vectors, the storage-order is indifferent, so we tweak it
   to allow vectorization of row-vectors.

* fix compilation in benchmark, and a warning in Transpose.
2008-06-16 10:49:44 +00:00
Gael Guennebaud
6998037930 * move some compile time "if" to their respective unroller (assign and dot)
* fix a couple of compilation issues when unrolling is disabled
* reduce default unrolling limit to a more reasonable value
2008-06-07 01:07:48 +00:00
Gael Guennebaud
48262b9734 added a static assertion mechanism
(see notes in Core/util/StaticAssert.h for details)
2008-06-04 11:16:11 +00:00
Gael Guennebaud
f5e599e489 * replace compile-time-if by meta-selector in Assign.h
as it speed up compilation.
* fix minor typo introduced in the previous commit
2008-05-31 14:42:07 +00:00
Gael Guennebaud
c1559d3079 * updated the assignement operator macro so that overloads
in MatrixBase work
* removed product_selector and cleaned Product.h a bit
* cleaned Assign.h a bit
2008-05-28 22:56:19 +00:00
Gael Guennebaud
8711e26c8a * change Flagged to take into account NestByValue only
* bugfix in Assign and cache friendly product (weird that worked before)
* improved argument evaluation in Product
2008-05-28 22:11:47 +00:00
Benoit Jacob
953efdbfe7 - introduce Part and Extract classes, splitting and extending the former
Triangular class
- full meta-unrolling in Part
- move inverseProduct() to MatrixBase
- compilation fix in ProductWIP: introduce a meta-selector to only do
  direct access on types that support it.
- phase out the old Product, remove the WIP_DIRTY stuff.
- misc renaming and fixes
2008-05-27 05:47:30 +00:00
Gael Guennebaud
4317fad869 * Added several cast to int of the enums (needed for some compilers)
* Fix a mistake in CwiseNullary.
* Added a CoreDeclarions header that declares only the forward declarations
  and related basic stuffs.
2008-05-12 18:09:30 +00:00
Benoit Jacob
678f18fce4 put inline keywords everywhere appropriate. So we don't need anymore to pass
-finline-limit=1000 to gcc to get good performance. By the way some cleanup.
2008-05-12 17:34:46 +00:00
Benoit Jacob
3562b01105 * Give Konstantinos a copyright line
* Fix compilation of Inverse.h with vectorisation
* Introduce EIGEN_GNUC_AT_LEAST(x,y) macro doing future-proof (e.g. gcc v5.0) check
* Only use ProductWIP if vectorisation is enabled
* rename EIGEN_ALWAYS_INLINE -> EIGEN_INLINE with fall-back to inline keyword
* some cleanup/indentation
2008-05-12 08:12:40 +00:00
Gael Guennebaud
46fa4c713f * Started support for unaligned vectorization.
* Introduce a new highly optimized matrix-matrix product for large
  matrices. The code is still highly experimental and it is activated
  only if you define EIGEN_WIP_PRODUCT at compile time.
  Currently the third dimension of the product must be a factor of
  the packet size (x4 for floats) and the right handed side matrix
  must be column major.
  Moreover, currently c = a*b; actually computes c += a*b !!
  Therefore, the code is provided for experimentation purpose only !
  These limitations will be fixed soon or later to become the default
  product implementation.
2008-05-05 10:23:29 +00:00
Benoit Jacob
890a8de962 Make products always eval into expressions. Improves performance
in benchmark. Still not as fasts as explicit eval(), strangely.
2008-05-02 08:53:23 +00:00
Gael Guennebaud
02f1615d2a Enable vectorization of product with dynamic matrices,
extended cache optimal product to work in any row/column
major situations, and a few bugfixes (forgot to add the
Cholesky header, vectorization of CwiseBinary)
2008-05-01 13:53:05 +00:00
Gael Guennebaud
1ec2d21ca5 Fixed a couple of issues introduced in previous commits.
Added a test for Triangular.
2008-04-26 20:28:27 +00:00
Gael Guennebaud
b4c974d059 Added triangular assignement, e.g.:
m.upper() = a+b;
only updates the upper triangular part of m.
Note that:
 m = (a+b).upper();
updates all coefficients of m (but half of the additions
will be skiped)

Updated back/forward substitution to better use Eigen's capability.
2008-04-26 19:20:26 +00:00
Gael Guennebaud
6f2c72fb53 Various fixes in:
- vector to vector assign
 - PartialRedux
 - Vectorization criteria of Product
 - returned type of normalized
 - SSE integer mul
2008-04-25 23:10:37 +00:00
Gael Guennebaud
a451835bce Make the explicit vectorization much more flexible:
- support dynamic sizes
 - support arbitrary matrix size when the matrix can be seen as a 1D array
   (except for fixed size matrices where the size in Bytes must be a factor of 16,
    this is to allow compact storage of a vector of matrices)
Note that the explict vectorization is still experimental and far to be completely tested.
2008-04-25 15:46:18 +00:00
Benoit Jacob
6ae037dfb5 give up on OpenMP... for now 2008-04-18 07:57:46 +00:00
Benoit Jacob
ea3ccb1e8c * Start of the LU module, with matrix inversion already there and
fully optimized.
* Even if LargeBit is set, only parallelize for large enough objects
  (controlled by EIGEN_PARALLELIZATION_TRESHOLD).
2008-04-14 08:20:24 +00:00
Benoit Jacob
dcebc46cdc - cleaner use of OpenMP (no code duplication anymore)
using a macro and _Pragma.
- use OpenMP also in cacheOptimalProduct and in the
  vectorized paths as well
- kill the vector assignment unroller. implement in
  operator= the logic for assigning a row-vector in
  a col-vector.
- CMakeLists support for building tests/examples
  with -fopenmp and/or -msse2
- updates in bench/, especially replace identity()
  by ones() which prevents underflows from perturbing
  bench results.
2008-04-11 14:28:42 +00:00
Benoit Jacob
7bee90a62a Merge Gael's experimental OpenMP parallelization support into Assign.h. 2008-04-11 08:18:47 +00:00
Benoit Jacob
9d8876ce82 * rename XprCopy -> Nested
* rename OperatorEquals -> Assign
* move Util.h and FwDecl.h to a util/ subdir
2008-04-10 09:01:28 +00:00