34 Commits

Author SHA1 Message Date
Gael Guennebaud
440664cd5d temporary fix of the pèrevious commit 2008-08-24 15:27:05 +00:00
Gael Guennebaud
ba100998bf * split Meta.h to Meta.h (generic meta programming) and XprHelper.h (relates to eigen mechanism)
* added a meta.cpp unit test
* EIGEN_TUNE_FOR_L2_CACHE_SIZE now represents L2 block size in Bytes (whence the ei_meta_sqrt...)
* added a CustomizeEigen.dox page
* added a TOC to QuickStartGuide.dox
2008-08-24 15:15:32 +00:00
Gael Guennebaud
f2f48b6560 * remove LargeBit and related stuff
* replaced the Flags template parameter of Matrix by StorageOrder
  and move it back to the 4th position such that we don't have to
  worry about the two Max* template parameters
* extended EIGEN_USING_MATRIX_TYPEDEFS with the ei_* math functions
2008-08-23 17:11:44 +00:00
Gael Guennebaud
a6d387a359 Various compilation fixes for MSVC 9. All tests compile but some
still fail at runtime in ei_aligned_free() (even without vectorization).
2008-08-19 11:06:40 +00:00
Gael Guennebaud
4fa40367e9 * Big change in Block and Map:
- added a MapBase base xpr on top of which Map and the specialization
    of Block are implemented
  - MapBase forces both aligned loads (and aligned stores, see below) in expressions
    such as "x.block(...) += other_expr"
* Significant vectorization improvement:
 - added a AlignedBit flag meaning the first coeff/packet is aligned,
   this allows to not generate extra code to deal with the first unaligned part
 - removed all unaligned stores when no unrolling
 - removed unaligned loads in Sum when the input as the DirectAccessBit flag
* Some code simplification in CacheFriendly product
* Some minor documentation improvements
2008-08-09 18:41:24 +00:00
Gael Guennebaud
842c4f8bfa Several compilation fixes for MSVC and NVCC, basically:
- added explicit enum to int conversion where needed
- if a function is not defined as declared and the return type is "tricky"
  then the type must be typedefined somewhere. A "tricky return type" can be:
  * a template class with a default parameter which depends on another template parameter
  * a nested template class, or type of a nested template class
2008-07-29 16:33:07 +00:00
Benoit Jacob
f5791eeb70 the big Array/Cwise rework as discussed on the mailing list. The new API
can be seen in Eigen/src/Core/Cwise.h.
2008-07-08 00:49:10 +00:00
Gael Guennebaud
027818d739 * added innerSize / outerSize functions to MatrixBase
* added complete implementation of sparse matrix product
  (with a little glue in Eigen/Core)
* added an exhaustive bench of sparse products including GMM++ and MTL4
  => Eigen outperforms in all transposed/density configurations !
2008-06-28 23:07:14 +00:00
Benoit Jacob
e27b2b95cf * rework Map, allow vectorization
* rework PacketMath and DummyPacketMath, make these actual template
specializations instead of just overriding by non-template inline
functions
* introduce ei_ploadt and ei_pstoret, make use of them in Map and Matrix
* remove Matrix::map() methods, use Map constructors instead.
2008-06-27 01:22:35 +00:00
Gael Guennebaud
e5d301dc96 various work on the Sparse module:
* added some glue to Eigen/Core (SparseBit, ei_eval, Matrix)
* add two new sparse matrix types:
   HashMatrix: based on std::map (for random writes)
   LinkedVectorMatrix: array of linked vectors
   (for outer coherent writes, e.g. to transpose a matrix)
* add a SparseSetter class to easily set/update any kind of matrices, e.g.:
   { SparseSetter<MatrixType,RandomAccessPattern> wrapper(mymatrix);
     for (...) wrapper->coeffRef(rand(),rand()) = rand(); }
* automatic shallow copy for RValue
* and a lot of mess !
plus:
* remove the remaining ArrayBit related stuff
* don't use alloca in product for very large memory allocation
2008-06-26 23:22:26 +00:00
Benoit Jacob
25ba9f377c * add bench/benchVecAdd.cpp by Gael, fix crash (ei_pload on non-aligned)
* introduce packet(int), make use of it in linear vectorized paths
  --> completely fixes the slowdown noticed in benchVecAdd.
* generalize coeff(int) to linear-access xprs
* clarify the access flag bits
* rework api dox in Coeffs.h and util/Constants.h
* improve certain expressions's flags, allowing more vectorization
* fix bug in Block: start(int) and end(int) returned dyn*dyn size
* fix bug in Block: just because the Eval type has packet access
  doesn't imply the block xpr should have it too.
2008-06-26 16:06:41 +00:00
Gael Guennebaud
fb4a151982 * more cleaning in Product
* make Matrix2f (and similar) vectorized using linear path
* fix a couple of warnings and compilation issues with ICC and gcc 3.3/3.4
  (cannot get Transform compiles with gcc 3.3/3.4, see the FIXME)
2008-06-19 23:00:51 +00:00
Gael Guennebaud
82c3cea1d5 * refactoring of Product:
* use ProductReturnType<>::Type to get the correct Product xpr type
  * Product is no longer instanciated for xpr types which are evaluated
  * vectorization of "a.transpose() * b" for the normal product (small and fixed-size matrix)
  * some cleanning
* removed ArrayBase
2008-06-19 17:33:57 +00:00
Gael Guennebaud
5dbfed1902 fix two bugs dicovered by the previous commit. 2008-06-16 16:39:58 +00:00
Benoit Jacob
bb1f4e44f1 * Block: row and column expressions in the inner direction
now have the Like1D flag.

* Big renaming:
  packetCoeff ---> packet
  VectorizableBit ---> PacketAccessBit
  Like1DArrayBit ---> LinearAccessBit
2008-06-16 14:54:31 +00:00
Benoit Jacob
c905b31b42 * Big rework of Assign.h:
** Much better organization
** Fix a few bugs
** Add the ability to unroll only the inner loop
** Add an unrolled path to the Like1D vectorization. Not well tested.
** Add placeholder for sliced vectorization. Unimplemented.

* Rework of corrected_flags:
** improve rules determining vectorizability
** for vectors, the storage-order is indifferent, so we tweak it
   to allow vectorization of row-vectors.

* fix compilation in benchmark, and a warning in Transpose.
2008-06-16 10:49:44 +00:00
Benoit Jacob
c90c77051f * make the _Flags template parameter of Matrix default to the corrected
flags. This ensures that unless explicitly messed up otherwise,
  a Matrix type is equal to its own Eval type. This seriously reduces
  the number of types instantiated. Measured +13% compile speed, -7%
  binary size.

* Improve doc of Matrix template parameters.
2008-06-13 07:53:45 +00:00
Gael Guennebaud
48262b9734 added a static assertion mechanism
(see notes in Core/util/StaticAssert.h for details)
2008-06-04 11:16:11 +00:00
Gael Guennebaud
fcf4457b78 added optimized matrix times diagonal matrix product via Diagonal flag shortcut. 2008-05-31 21:35:11 +00:00
Gael Guennebaud
e2ac5d244e Added ArrayBit to get the ability to manipulate a Matrix like a simple scalar.
In particular this flag changes the behavior of operator* to a coeff wise product.
2008-05-29 22:33:07 +00:00
Benoit Jacob
486fdb26a1 many small fixes and documentation improvements,
this should be alpha5.
2008-05-29 03:12:30 +00:00
Benoit Jacob
f54760c889 hehe, the complicated nesting scheme in Flagged in the previous commit
was a sign that we were doing something wrong. In fact, having
NestByValue as a special case of Flagged was wrong, and the previous
commit, while not buggy, was inefficient because then when the resulting
NestByValue xpr was nested -- hence copied -- the original xpr which was
already nested by value was copied again; hence instead of 1 copy we got
3 copies.
The solution was to ressuscitate the old Temporary.h (renamed
NestByValue.h) as it was the right approach.
2008-05-28 05:14:16 +00:00
Benoit Jacob
953efdbfe7 - introduce Part and Extract classes, splitting and extending the former
Triangular class
- full meta-unrolling in Part
- move inverseProduct() to MatrixBase
- compilation fix in ProductWIP: introduce a meta-selector to only do
  direct access on types that support it.
- phase out the old Product, remove the WIP_DIRTY stuff.
- misc renaming and fixes
2008-05-27 05:47:30 +00:00
Gael Guennebaud
94e1629a1b * improved product performance:
- fallback to normal product for small dynamic matrices
 - overloaded "c += (a * b).lazy()" to avoid the expensive and useless temporary and setZero()
   in such very common cases.
* fix a couple of issues with the flags
2008-05-22 14:51:25 +00:00
Gael Guennebaud
c6789a279c Fix compilation issues with MSVC and NVCC.
Added a few typedef of complex return types in MatrixBase (Needed by MSVC)
2008-05-15 09:40:11 +00:00
Benoit Jacob
5da60897ab Introduce generic Flagged xpr, remove already Lazy.h and Temporary.h
Rename DefaultLostFlagMask --> HerediraryBits
2008-05-14 08:20:15 +00:00
Gael Guennebaud
4317fad869 * Added several cast to int of the enums (needed for some compilers)
* Fix a mistake in CwiseNullary.
* Added a CoreDeclarions header that declares only the forward declarations
  and related basic stuffs.
2008-05-12 18:09:30 +00:00
Gael Guennebaud
64c49de7ba * split PacketMath.h to SSE and Altivec specific files
* improved the flexibility of the new product implementation,
  now all sizes seems to be properly handled.
2008-05-05 17:19:47 +00:00
Gael Guennebaud
a451835bce Make the explicit vectorization much more flexible:
- support dynamic sizes
 - support arbitrary matrix size when the matrix can be seen as a 1D array
   (except for fixed size matrices where the size in Bytes must be a factor of 16,
    this is to allow compact storage of a vector of matrices)
Note that the explict vectorization is still experimental and far to be completely tested.
2008-04-25 15:46:18 +00:00
Gael Guennebaud
9385793f71 Fix a couple of issue with the vectorization. In particular, default ei_p* functions
are provided to handle not suported types seemlessly.

Added a generic null-ary expression with null-ary functors. They replace
Zero, Ones, Identity and Random.
2008-04-24 18:35:39 +00:00
Benoit Jacob
acfd6f3bda - add _packetCoeff() to Inverse, allowing vectorization.
- let Inverse take template parameter MatrixType instead
  of ExpressionType, in order to reduce executable code size
  when taking inverses of xpr's.
- introduce ei_corrected_matrix_flags : the flags template
  parameter to the Matrix class is only a suggestion. This
  is also useful in ei_eval.
2008-04-16 07:18:27 +00:00
Benoit Jacob
9789c04467 when evaluating an xpr, the result can now be vectorizable
even if the xpr itself wasn't vectorizable.
2008-04-14 08:55:12 +00:00
Benoit Jacob
ea3ccb1e8c * Start of the LU module, with matrix inversion already there and
fully optimized.
* Even if LargeBit is set, only parallelize for large enough objects
  (controlled by EIGEN_PARALLELIZATION_TRESHOLD).
2008-04-14 08:20:24 +00:00
Benoit Jacob
ca448d2537 split those files in util/
some more renaming
2008-04-10 09:41:13 +00:00