eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-09-18 12:23:13 +08:00

Author	SHA1	Message	Date
Gael Guennebaud	fd78874888	Fix compilation of iterative solvers with dense matrices	2015-03-09 21:31:03 +01:00
Gael Guennebaud	d4317a85e8	Add typedefs for return types of SparseMatrixBase::selfadjointView	2015-03-09 21:29:46 +01:00
Gael Guennebaud	9e885fb766	Add unit tests for CG and sparse-LLT for long int as storage-index	2015-03-09 14:33:15 +01:00
Gael Guennebaud	224a1fe4c6	bug #963 : make IncompleteLUT compatible with non-default storage index types.	2015-03-09 13:55:20 +01:00
Gael Guennebaud	0ee391863e	Avoid undeflow when blocking size are tuned manually.	2015-03-06 21:51:09 +01:00
Gael Guennebaud	14a5f135a3	bug #969 : workaround abiguous calls to Ref using enable_if.	2015-03-06 17:51:31 +01:00
Gael Guennebaud	87681e508f	bug #978 : early return for vanishing products	2015-03-06 16:11:22 +01:00
Gael Guennebaud	cd3bbffa73	Improve blocking heuristic: if the lhs fit within L1, then block on the rhs in L1 (allows to keep packed rhs in L1)	2015-03-06 14:31:39 +01:00
Gael Guennebaud	58740ce4c6	Improve product kernel: replace the previous dynamic loop swaping strategy by a more general one: It consists in increasing the actual number of rows of lhs's micro horizontal panel for small depth such that L1 cache is fully exploited.	2015-03-06 10:30:35 +01:00
Gael Guennebaud	4c8b95d5c5	Rename LSCG to LeastSquaresConjugateGradient	2015-03-05 10:16:32 +01:00
Gael Guennebaud	7550107028	Product optimization: implement a dynamic loop-swapping startegy to improve memory accesses to the destination matrix in the case of K-rank-update like products, i.e., for products of the kind: "large x small" * "small x large"	2015-03-05 10:03:46 +01:00
Gael Guennebaud	2dc968e453	bug #824 : improve accuracy of Quaternion::angularDistance using atan2 instead of acos.	2015-03-04 17:03:13 +01:00
Benoit Steiner	0196141938	Fixed the optimized AVX implementation of the fast rsqrt function	2015-03-02 13:49:39 -08:00
Benoit Steiner	4fd7f47692	Added an optimized version of rsqrt for SSE and AVX that is used when EIGEN_FAST_MATH is defined.	2015-03-02 09:38:47 -08:00
Benoit Steiner	fb53384b0f	Improved the default implementation of prsqrt	2015-02-28 01:51:26 -08:00
Benoit Steiner	306fceccbe	Pulled latest updates from trunk	2015-02-27 13:05:26 -08:00
Benoit Steiner	2386fc8528	Added support for 32bit index on a per tensor/tensor expression. This enables us to use 32bit indices to evaluate expressions on GPU faster while keeping the ability to use 64 bit indices to manipulate large tensors on CPU in the same binary.	2015-02-27 12:57:13 -08:00
Benoit Jacob	6466fa63be	Reimplement the selection between rotating and non-rotating kernels using templates instead of macros and if()'s. That was needed to fix the build of unit tests on ARM, which I had broken. My bad for not testing earlier.	2015-02-27 15:30:10 -05:00
Benoit Steiner	05089aba75	Switch to truncated casting when converting floating point types to integer. This ensures that vectorized casts are consistent with scalar casts	2015-02-27 09:27:30 -08:00
Benoit Steiner	573b377110	Added support for vectorized type casting of tensors	2015-02-27 08:46:04 -08:00
Benoit Jacob	2fc3b484d7	remove trailing comma	2015-02-27 11:37:45 -05:00
Benoit Jacob	33669348c4	Disable Packet2f/2i halfpacket support in NEON. I believe that it was erroneously turned on, since Packet2f/2i intrinsics are unimplemented, and code trying to use halfpackets just fails to compile on NEON, as it tries to use the default implementation of pload/pstore and the types don't match.	2015-02-27 11:35:37 -05:00
Benoit Jacob	b7fc8746e0	Replace a static assert by a runtime one, fixes the build of unit tests on ARM Also safely assert in the non-implemented path that should never be taken in practice, and would return wrong results.	2015-02-27 10:01:59 -05:00
Benoit Steiner	f41b1f1666	Added support for fast reciprocal square root computation.	2015-02-26 09:42:41 -08:00
Gael Guennebaud	bcf9bb5c1f	Avoid packing rhs multiple-times when blocking on the lhs only.	2015-02-26 17:01:33 +01:00
Gael Guennebaud	4ec3f04b3a	Make sure that the block size computation is tested by our unit test.	2015-02-26 17:00:36 +01:00
Gael Guennebaud	a8ad8887bf	Implement a more generic blocking-size selection algorithm. See explanations inlines. It performs extremely well on Haswell. The main issue is to reliably and quickly find the actual cache size to be used for our 2nd level of blocking, that is: max(l2,l3/nb_core_sharing_l3)	2015-02-26 16:04:35 +01:00
Gael Guennebaud	400becc591	Fix typos in block-size testing code, and set peeling on k to 8.	2015-02-26 15:57:06 +01:00
Benoit Jacob	692136350b	So I extensively measured the impact of the offset in this prefetch. I tried offset values from 0 to 128 (on this float* pointer, so implicitly times 4 bytes). On x86, I tested a Sandy Bridge with AVX with 12M cache and a Haswell with AVX+FMA with 6M cache on MatrixXf sizes up to 2400. I could not see any significant impact of this offset. On Nexus 5, the offset has a slight effect: values around 32 (times sizeof float) are worst. Anything else is the same: the current 64 (8*pk), or... 0. So let's just go with 0! Note that we needed a fix anyway for not accounting for the value of RhsProgress. 0 nicely avoids the issue altogether!	2015-02-25 12:37:14 -05:00
Christoph Hertzberg	531fa9de77	bug #970 : Add EIGEN_DEVICE_FUNC to RValue functions, in case Cuda supports RValue-references.	2015-02-24 21:03:28 +01:00
Benoit Jacob	26275b250a	Fix my recent prefetch changes: - the first prefetch is actually harmful on Haswell with FMA, but it is the most beneficial on ARM. - the second prefetch... I was very stupid and multiplied by sizeof(scalar) and offset of a scalar* pointer. The old offset was 64; pk = 8, so 64=pk*8. So this effectively restores the older offset. Actually, there were two prefetches here, one with offset 48 and one with offset 64. I could not confirm any benefit from this strange 48 offset on either the haswell or my ARM device.	2015-02-23 16:55:17 -05:00
Christoph Hertzberg	052b6b40f1	Fix two trivial warnings	2015-02-22 12:40:51 +01:00
Christoph Hertzberg	ecbf2a6656	log1p is defined only for real Scalars in C++11	2015-02-21 19:58:24 +01:00
Gael Guennebaud	3cf642baa3	Fix compilation of unit tests disabling assertion cheking	2015-02-21 14:13:48 +01:00
Gael Guennebaud	2da1594750	Fix doc of Ref<>	2015-02-20 11:52:22 +01:00
Gael Guennebaud	b192e29eae	In C++11 destructors do not throw by default (fix CommaInitializer unit test)	2015-02-20 09:28:34 +01:00
Benoit Steiner	ab41652d81	Pulled latest changes from trunk	2015-02-19 21:23:37 -08:00
Benoit Steiner	7765039f1c	Marked the CUDA packet primitives as EIGEN_DEVICE_FUNC since they'll end up being executed on the GPU device.	2015-02-19 21:22:51 -08:00
Gael Guennebaud	a66f5fc2fd	Fix regression with C++11 support of lambda: now internal::result_of falls back to std::result_of in C++11.	2015-02-19 23:32:12 +01:00
Gael Guennebaud	1b7e12847d	Fix some calls to result_of on binary functors as unary ones.	2015-02-19 23:30:41 +01:00
Gael Guennebaud	0f4dd15dfc	Declare const some const variables	2015-02-19 23:28:57 +01:00
Gael Guennebaud	829dddd0fd	Add support for C++11 result_of/lambdas	2015-02-19 15:18:37 +01:00
Benoit Jacob	db05f2d01e	rotating kernel: avoid compiling anything outside of ARM	2015-02-18 15:43:52 -05:00
Benoit Jacob	0ed00d5438	remove a newly introduced redundant typedef - sorry.	2015-02-18 15:05:01 -05:00
Benoit Jacob	9bd8a4bab5	bug #955 - Implement a rotating kernel alternative in the 3px4 gebp path This is substantially faster on ARM, where it's important to minimize the number of loads. This is specific to the case where all packet types are of size 4. I made my best attempt to minimize how dirty this is... opinions welcome. Eventually one could have a generic rotated kernel, but it would take some work to get there. Also, on sandy bridge, in my experience, it's not beneficial (even about 1% slower).	2015-02-18 15:03:35 -05:00
Hauke Heibel	ee27d50633	Fixed template parameter.	2015-02-18 18:51:08 +01:00
Gael Guennebaud	73a24de424	merge	2015-02-18 15:51:00 +01:00
Gael Guennebaud	63eb0f6fe6	Clean a bit computeProductBlockingSizes (use Index type, remove CEIL macro)	2015-02-18 15:49:05 +01:00
Benoit Jacob	4a3e6c8be1	bug #958 - Allow testing specific blocking sizes This is only a debugging/testing patch. It allows testing specific product blocking sizes, typically to study the impact on performance. Example usage: int testk, testm, testn; #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZES #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_K testk #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_M testm #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_N testn #include <Eigen/Core>	2015-02-18 09:43:55 -05:00
Gael Guennebaud	c7bb1e8ea8	Fix a regression when using OpenMP, and fix bug #714 : the number of threads might be lower than the number of requested ones	2015-02-18 15:19:23 +01:00

... 10 11 12 13 14 ...

4237 Commits