Benoit Steiner
513e357b48
Added support for prefetching on cuda devices
2015-07-17 15:35:16 -07:00
Gael Guennebaud
25a98be948
bug #80 : merge with d_hood branch on adding more coefficient-wise unary array functors
2015-06-10 15:52:05 +02:00
Deanna Hood
f52b78491c
Remove packet isNaN, isInf, isFinite
2015-03-17 09:26:24 +10:00
Deanna Hood
fef4e071d7
Rename isinf to isInf
2015-03-17 05:58:47 +10:00
Deanna Hood
46cf9cda32
Add isfinite array support as isFinite
2015-03-17 04:33:12 +10:00
Deanna Hood
fb68b149cb
Rename isnan to isNaN
2015-03-17 02:04:42 +10:00
Deanna Hood
f89fcefa79
Add hyperbolic trigonometric functions from std array support
2015-03-11 13:13:30 +10:00
Deanna Hood
a5e49976f5
Add log10 array support
2015-03-11 08:56:42 +10:00
Deanna Hood
31fdd67756
Additional unary coeff-wise functors (isnan, round, arg, e.g.)
2015-03-11 06:39:23 +10:00
Benoit Steiner
fb53384b0f
Improved the default implementation of prsqrt
2015-02-28 01:51:26 -08:00
Benoit Steiner
306fceccbe
Pulled latest updates from trunk
2015-02-27 13:05:26 -08:00
Benoit Jacob
6466fa63be
Reimplement the selection between rotating and non-rotating kernels
...
using templates instead of macros and if()'s.
That was needed to fix the build of unit tests on ARM, which I had
broken. My bad for not testing earlier.
2015-02-27 15:30:10 -05:00
Benoit Steiner
573b377110
Added support for vectorized type casting of tensors
2015-02-27 08:46:04 -08:00
Benoit Jacob
b7fc8746e0
Replace a static assert by a runtime one, fixes the build of unit tests on ARM
...
Also safely assert in the non-implemented path that should never be taken in practice,
and would return wrong results.
2015-02-27 10:01:59 -05:00
Benoit Steiner
f41b1f1666
Added support for fast reciprocal square root computation.
2015-02-26 09:42:41 -08:00
Benoit Jacob
9bd8a4bab5
bug #955 - Implement a rotating kernel alternative in the 3px4 gebp path
...
This is substantially faster on ARM, where it's important to minimize the number of loads.
This is specific to the case where all packet types are of size 4. I made my best attempt to minimize how dirty this is... opinions welcome.
Eventually one could have a generic rotated kernel, but it would take some work to get there. Also, on sandy bridge, in my experience, it's not beneficial (even about 1% slower).
2015-02-18 15:03:35 -05:00
Gael Guennebaud
45cbb0bbb1
The usage of DenseIndex is deprecated, so let's replace DenseIndex by Index
2015-02-16 15:05:41 +01:00
Benoit Steiner
c739102ef9
Pulled the latest changes from the trunk
2015-02-06 05:25:03 -08:00
Gael Guennebaud
ee06f78679
Introduce unified macros to identify compiler, OS, and architecture. They are all defined in util/Macros.h and prefixed with EIGEN_COMP_, EIGEN_OS_, and EIGEN_ARCH_ respectively.
2014-11-04 21:58:52 +01:00
Gael Guennebaud
fe57b2f963
bug #701 : workaround (min) and (max) blocking ADL by introducing numext::mini and numext::maxi internal functions and a EIGEN_NOT_A_MACRO macro.
2014-10-20 15:55:32 +02:00
Benoit Steiner
99d75235a9
Misc improvements and cleanups
2014-10-13 17:02:09 -07:00
Benoit Steiner
5cc23199be
More tests to validate the const-correctness of the tensor code.
2014-10-02 10:30:44 -07:00
Benoit Steiner
16047c8d4a
Pulled in the latest changes from the Eigen trunk
2014-08-13 22:25:29 -07:00
Gael Guennebaud
b47ef1431f
Fix many long to int implicit conversions
2014-07-08 16:47:11 +02:00
Chen-Pang He
b9ee880f07
chmod -x Eigen/src/Core/GenericPacketMath.h
2014-07-07 21:28:00 +08:00
Roger Martin
eb49100de9
Add component-wise atan() function (see bug #80 ).
2014-06-19 14:55:14 +01:00
Benoit Steiner
29aebf96e6
Created the pblend packet primitive and implemented it using SSE and AVX instructions.
2014-06-06 20:18:44 -07:00
Gael Guennebaud
3d8d0f6269
Enable vectorization of pack_rhs with a column-major RHS.
...
Rename and generalize Kernel<*> to PacketBlock<*,N>.
2014-04-25 10:56:18 +02:00
Gael Guennebaud
d5a795f673
New gebp kernel handling up to 3 packets x 4 register-level blocks. Huge speeup on Haswell.
...
This changeset also introduce new vector functions: ploadquad and predux4.
2014-04-16 17:05:11 +02:00
Gael Guennebaud
10aa14592a
Add a mechanism to recursively access to half-size packet types
2014-03-28 10:18:04 +01:00
Benoit Steiner
8a94cb3edd
Implemented the SSE version of the gather and scatter packet primitives.
2014-03-27 18:29:01 -07:00
Benoit Steiner
ee86679096
Introduced pscatter/pgather packet primitives. They will be used to optimize the loop peeling code of the block-panel matrix multiplication kernel.
2014-03-27 16:03:03 -07:00
Benoit Steiner
a419cea4a0
Created the ptranspose packet primitive that can transpose an array of N packets, where N is the number of words in each packet. This primitive will be used to complete the vectorization of the gemm_pack_lhs and gemm_pack_rhs functions.
...
Implemented the primitive using SSE instructions.
2014-03-26 19:03:07 -07:00
Gael Guennebaud
b286a1e75c
add pbroadcast2/4 generic intrinsics
2014-03-26 16:46:36 +01:00
Gael Guennebaud
01fd880424
Revert previous change and introduce a new workaround regarding gcc generating a shufps instruction instead of the more efficient pshufd instruction.
...
The trick consists in introducing a new pload1 function to be used in low level product kernels for which bug #203 does not apply.
Indeed, it turned out that using inline assembly prevents gcc of doing a good job at instructtion reordering.
2014-03-20 16:03:46 +01:00
Gael Guennebaud
2f593ee67c
merge with main branch
2013-07-17 13:21:35 +02:00
Gael Guennebaud
155fa0ca83
Add missing namespace prefix in pconj
2013-07-03 11:36:12 +02:00
Gael Guennebaud
64054ee396
Add nvcc support for normalize, initializers, and fuzzy comparisons
2013-06-05 15:38:33 +02:00
Gael Guennebaud
9cd2d14005
merge with default branch
2013-04-19 11:21:39 +02:00
Gael Guennebaud
9a4caf2b0f
Extend internal doc of ploaddup and palign
2013-04-17 09:17:34 +02:00
Gael Guennebaud
12439e1249
Port SelfCwiseBinaryOp and Dot.h to nvcc, fix portability issue with std::min/max
2013-04-05 16:35:49 +02:00
Gael Guennebaud
a76fbbf397
Fix bug #314 :
...
- remove most of the metaprogramming kung fu in MathFunctions.h (only keep functions that differs from the std)
- remove the overloads for array expression that were in the std namespace
2012-11-06 15:25:50 +01:00
Benoit Jacob
69124cfca2
Automatic relicensing to MPL2 using Keirs script. Manual fixup follows.
2012-07-13 14:42:47 -04:00
Jitse Niesen
3c412183b2
Get rid of include directives inside namespace blocks (bug #339 ).
2012-04-15 11:06:28 +01:00
Gael Guennebaud
9c86ee2695
fix static inline versus inline static issues (the former is the correct order)
2012-01-31 12:58:52 +01:00
Gael Guennebaud
22bff949c8
protect calls to min and max with parentheses to make Eigen compatible with default windows.h
...
(transplanted from 49b6e9143e1d74441924c0c313536e263e12a55c
)
2011-07-21 11:19:36 +02:00
Gael Guennebaud
87ac09daa8
Simplify the use of custom scalar types, the rule is to never directly call a standard math function using std:: but rather put a using std::foo before and simply call foo:
...
using std::max;
max(a,b);
2011-05-25 08:41:45 +02:00
Jitse Niesen
a96c849c20
Document enums in Constants.h (bug #248 ).
...
To get the links to work, I also had to document the Eigen namespace.
Unfortunately, this means that the word Eigen is linked whenever it appears
in the docs.
2011-05-03 17:08:14 +01:00
Gael Guennebaud
c8e1b679fa
re-enable fast pset1-pstore by introducing a new higher level pstore1 function
2011-03-02 10:55:44 +01:00
Gael Guennebaud
6e01780541
fix a couple of issues with pcplxflip
2011-02-23 17:51:40 +03:00