eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-10-17 10:31:28 +08:00

Author	SHA1	Message	Date
Benoit Steiner	e96c77668d	Merged in rmlarsen/eigen2 (pull request PR-292) Adds a fast memcpy function to Eigen.	2017-01-25 00:14:04 +00:00
Rasmus Munk Larsen	3be5ee2352	Update copy helper to use fast_memcpy.	2017-01-24 14:22:49 -08:00
Rasmus Munk Larsen	e6b1020221	Adds a fast memcpy function to Eigen. This takes advantage of the following: 1. For small fixed sizes, the compiler generates inline code for memcpy, which is much faster. 2. My colleague eriche at googl dot com discovered that for large sizes, memmove is significantly faster than memcpy (at least on Linux with GCC or Clang). See benchmark numbers measured on a Haswell (HP Z440) workstation here: https://docs.google.com/a/google.com/spreadsheets/d/1jLs5bKzXwhpTySw65MhG1pZpsIwkszZqQTjwrd_n0ic/pubhtml This is of course surprising since memcpy is a less constrained version of memmove. This stackoverflow thread contains some speculation as to the causes: http://stackoverflow.com/questions/22793669/poor-memcpy-performance-on-linux Below are numbers for copying and slicing tensors using the multithreaded TensorDevice. The numbers show significant improvements for memcpy of very small blocks and for memcpy of large blocks single threaded (we were already able to saturate memory bandwidth for >1 threads before on large blocks). The "slicingSmallPieces" benchmark also shows small consistent improvements, since memcpy cost is a fair portion of that particular computation. The benchmarks operate on NxN matrices, and the names are of the form BM_$OP_${NUMTHREADS}T/${N}. Measured improvements in wall clock time: Run on rmlarsen3.mtv (12 X 3501 MHz CPUs); 2017-01-20T11:26:31.493023454-08:00 CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_memcpy_1T/2 3.48 2.39 +31.3% BM_memcpy_1T/8 12.3 6.51 +47.0% BM_memcpy_1T/64 371 383 -3.2% BM_memcpy_1T/512 66922 66720 +0.3% BM_memcpy_1T/4k 9892867 6849682 +30.8% BM_memcpy_1T/5k 14951099 10332856 +30.9% BM_memcpy_2T/2 3.50 2.46 +29.7% BM_memcpy_2T/8 12.3 7.66 +37.7% BM_memcpy_2T/64 371 376 -1.3% BM_memcpy_2T/512 66652 66788 -0.2% BM_memcpy_2T/4k 6145012 6117776 +0.4% BM_memcpy_2T/5k 9181478 9010942 +1.9% BM_memcpy_4T/2 3.47 2.47 +31.0% BM_memcpy_4T/8 12.3 6.67 +45.8 BM_memcpy_4T/64 374 376 -0.5% BM_memcpy_4T/512 67833 68019 -0.3% BM_memcpy_4T/4k 5057425 5188253 -2.6% BM_memcpy_4T/5k 7555638 7779468 -3.0% BM_memcpy_6T/2 3.51 2.50 +28.8% BM_memcpy_6T/8 12.3 7.61 +38.1% BM_memcpy_6T/64 373 378 -1.3% BM_memcpy_6T/512 66871 66774 +0.1% BM_memcpy_6T/4k 5112975 5233502 -2.4% BM_memcpy_6T/5k 7614180 7772246 -2.1% BM_memcpy_8T/2 3.47 2.41 +30.5% BM_memcpy_8T/8 12.4 10.5 +15.3% BM_memcpy_8T/64 372 388 -4.3% BM_memcpy_8T/512 67373 66588 +1.2% BM_memcpy_8T/4k 5148462 5254897 -2.1% BM_memcpy_8T/5k 7660989 7799058 -1.8% BM_memcpy_12T/2 3.50 2.40 +31.4% BM_memcpy_12T/8 12.4 7.55 +39.1 BM_memcpy_12T/64 374 378 -1.1% BM_memcpy_12T/512 67132 66683 +0.7% BM_memcpy_12T/4k 5185125 5292920 -2.1% BM_memcpy_12T/5k 7717284 7942684 -2.9% BM_slicingSmallPieces_1T/2 47.3 47.5 +0.4% BM_slicingSmallPieces_1T/8 53.6 52.3 +2.4% BM_slicingSmallPieces_1T/64 491 476 +3.1% BM_slicingSmallPieces_1T/512 21734 18814 +13.4% BM_slicingSmallPieces_1T/4k 394660 396760 -0.5% BM_slicingSmallPieces_1T/5k 218722 209244 +4.3% BM_slicingSmallPieces_2T/2 80.7 79.9 +1.0% BM_slicingSmallPieces_2T/8 54.2 53.1 +2.0 BM_slicingSmallPieces_2T/64 497 477 +4.0% BM_slicingSmallPieces_2T/512 21732 18822 +13.4% BM_slicingSmallPieces_2T/4k 392885 390490 +0.6% BM_slicingSmallPieces_2T/5k 221988 208678 +6.0% BM_slicingSmallPieces_4T/2 80.8 80.1 +0.9% BM_slicingSmallPieces_4T/8 54.1 53.2 +1.7% BM_slicingSmallPieces_4T/64 493 476 +3.4% BM_slicingSmallPieces_4T/512 21702 18758 +13.6% BM_slicingSmallPieces_4T/4k 393962 404023 -2.6% BM_slicingSmallPieces_4T/5k 249667 211732 +15.2% BM_slicingSmallPieces_6T/2 80.5 80.1 +0.5% BM_slicingSmallPieces_6T/8 54.4 53.4 +1.8% BM_slicingSmallPieces_6T/64 488 478 +2.0% BM_slicingSmallPieces_6T/512 21719 18841 +13.3% BM_slicingSmallPieces_6T/4k 394950 397583 -0.7% BM_slicingSmallPieces_6T/5k 223080 210148 +5.8% BM_slicingSmallPieces_8T/2 81.2 80.4 +1.0% BM_slicingSmallPieces_8T/8 58.1 53.5 +7.9% BM_slicingSmallPieces_8T/64 489 480 +1.8% BM_slicingSmallPieces_8T/512 21586 18798 +12.9% BM_slicingSmallPieces_8T/4k 394592 400165 -1.4% BM_slicingSmallPieces_8T/5k 219688 208301 +5.2% BM_slicingSmallPieces_12T/2 80.2 79.8 +0.7% BM_slicingSmallPieces_12T/8 54.4 53.4 +1.8 BM_slicingSmallPieces_12T/64 488 476 +2.5% BM_slicingSmallPieces_12T/512 21931 18831 +14.1% BM_slicingSmallPieces_12T/4k 393962 396541 -0.7% BM_slicingSmallPieces_12T/5k 218803 207965 +5.0%	2017-01-24 13:55:18 -08:00
Rasmus Munk Larsen	7b6aaa3440	Fix NaN propagation for AVX512.	2017-01-24 13:37:08 -08:00
Rasmus Munk Larsen	5e144bbaa4	Make NaN propagatation consistent between the pmax/pmin and std::max/std::min. This makes the NaN propagation consistent between the scalar and vectorized code paths of Eigen's scalar_max_op and scalar_min_op. See #1373 for details.	2017-01-24 13:32:50 -08:00
Gael Guennebaud	ba3f977946	bug #1376 : add missing assertion on size mismatch with compound assignment operators (e.g., mat += mat.col(j))	2017-01-23 22:06:08 +01:00
Gael Guennebaud	b0db4eff36	bug #1382 : move using std::size_t/ptrdiff_t to Eigen's namespace (still better than the global namespace!)	2017-01-23 22:03:57 +01:00
Gael Guennebaud	ca79c1545a	Add std:: namespace prefix to all (hopefully) instances if size_t/ptrdfiff_t	2017-01-23 22:02:53 +01:00
Gael Guennebaud	4b607b5692	Use Index instead of size_t	2017-01-23 22:00:33 +01:00
Gael Guennebaud	0fe278f7be	bug #1379 : fix compilation in sparsediagonaldense with openmp	2017-01-21 23:27:01 +01:00
Gael Guennebaud	22a172751e	bug #1378 : fix doc (DiagonalIndex vs Diagonal)	2017-01-21 22:09:59 +01:00
Benoit Steiner	924600a0e8	Made sure that enabling avx2 instructions enables avx and sse instructions as well.	2017-01-19 09:54:48 -08:00
Gael Guennebaud	655ba783f8	Defer set-to-zero in triangular = product so that no aliasing issue occur in the common: A.triangularView() = BA.sefladjointView()B.adjoint() case that used to work in 3.2.	2017-01-17 18:03:35 +01:00
Gael Guennebaud	ad3eef7608	Add link to SO	2017-01-09 13:01:39 +01:00
Gael Guennebaud	831fffe874	Add missing doc of SparseView	2017-01-06 18:01:29 +01:00
Gael Guennebaud	e383d6159a	MSVC 2015 has all we want about c++11 and MSVC 2017 fails on binder1st/binder2nd	2017-01-06 15:44:13 +01:00
Gael Guennebaud	2299717fd5	Fix and workaround several doxygen issues/warnings	2017-01-04 23:27:33 +01:00
Gael Guennebaud	ee6f7f6c0c	Add doc for sparse triangular solve functions	2017-01-04 23:10:36 +01:00
Gael Guennebaud	a0a36ad0ef	bug #1336 : workaround doxygen failing to include numerous members of MatriBase in Matrix	2017-01-04 22:02:39 +01:00
Gael Guennebaud	29a1a58113	Document selfadjointView	2017-01-04 22:01:50 +01:00
Gael Guennebaud	8702562177	bug #1370 : add doc for StorageIndex	2017-01-03 11:25:41 +01:00
Gael Guennebaud	575c078759	bug #1370 : rename _Index to _StorageIndex in SparseMatrix, and add a warning in the doc regarding the 3.2 to 3.3 change of SparseMatrix::Index	2017-01-03 11:19:14 +01:00
Valentin Roussellet	d3c5525c23	Added += and + operators to inner iterators Fix #1340 #1340	2016-12-28 18:29:30 +01:00
Gael Guennebaud	5c27962453	Move common cwise-unary method from MatrixBase/ArrayBase to the common DenseBase class.	2017-01-02 22:27:07 +01:00
Gael Guennebaud	8d7810a476	bug #1365 : fix another type mismatch warning (sync is set from and compared to an Index)	2016-12-28 23:35:43 +01:00
Gael Guennebaud	97812ff0d3	bug #1369 : fix type mismatch warning. Returned values of omp thread id and numbers are int, o let's use int instead of Index here.	2016-12-28 23:29:35 +01:00
Gael Guennebaud	7713e20fd2	Fix compilation	2016-12-27 22:04:58 +01:00
Gael Guennebaud	ab69a7f6d1	Cleanup because trait<CwiseBinaryOp>::Flags now expose the correct storage order	2016-12-27 16:55:47 +01:00
Gael Guennebaud	d32a43e33a	Make sure that traits<CwiseBinaryOp>::Flags reports the correct storage order so that methods like .outerSize()/.innerSize() work properly.	2016-12-27 16:35:45 +01:00
Gael Guennebaud	7136267461	Add missing .outer() member to iterators of evaluators of cwise sparse binary expression	2016-12-27 16:34:30 +01:00
Gael Guennebaud	fe0ee72390	Fix check of storage order mismatch for "sparse cwiseop sparse".	2016-12-27 16:33:19 +01:00
Gael Guennebaud	6b8f637ab1	Harmless typo	2016-12-27 16:31:17 +01:00
Benoit Steiner	354baa0fb1	Avoid using horizontal adds since they're not very efficient.	2016-12-21 20:55:07 -08:00
Benoit Steiner	d7825b6707	Use native AVX512 types instead of Eigen Packets whenever possible.	2016-12-21 20:06:18 -08:00
Gael Guennebaud	c6882a72ed	Merged in joaoruileal/eigen (pull request PR-276) Minor improvements to Umfpack support	2016-12-21 21:39:48 +01:00
Joao Rui Leal	c8c89b5e19	renamed methods umfpackReportControl(), umfpackReportInfo(), and umfpackReportStatus() from UmfPackLU to printUmfpackControl(), printUmfpackInfo(), and printUmfpackStatus()	2016-12-21 09:16:28 +00:00
Joao Rui Leal	95b804c0fe	it is now possible to change Umfpack control settings before factorizations; added access to the report functions of Umfpack	2016-12-19 10:45:59 +00:00
Gael Guennebaud	8c0e701504	bug #1360 : fix sign issue with pmull on altivec	2016-12-18 22:13:19 +00:00
Gael Guennebaud	fc94258e77	Fix unused warning	2016-12-18 22:11:48 +00:00
ermak	d60cca32e5	Transformation methods added to ParametrizedLine class.	2016-12-17 00:45:13 +07:00
Benoit Steiner	9e03dfb452	Made sure EIGEN_HAS_C99_MATH is defined when compiling OpenCL code	2016-12-17 09:23:37 -08:00
Rafael Guglielmetti	8f11df2667	NumTraits.h: For the values 'ReadCost, AddCost and MulCost', information about value Eigen::HugeCost	2016-12-16 09:07:12 +00:00
Benoit Steiner	1324ffef2f	Reenabled the use of constexpr on OpenCL devices	2016-12-15 06:49:38 -08:00
Gael Guennebaud	5d00fdf0e8	bug #1363 : fix mingw's ABI issue	2016-12-15 11:58:31 +01:00
Gael Guennebaud	11b492e993	bug #1358 : fix compilation for sparse += sparse.selfadjointView();	2016-12-14 17:53:47 +01:00
Gael Guennebaud	e67397bfa7	bug #1359 : fix compilation of col_major_sparse.row() *= scalar (used to work in 3.2.9 though the expression is not really writable)	2016-12-14 17:05:26 +01:00
Gael Guennebaud	98d7458275	bug #1359 : fix sparse /=scalar and *=scalar implementation. InnerIterators must be obtained from an evaluator.	2016-12-14 17:03:13 +01:00
Gael Guennebaud	c817ce3ba3	bug #1361 : fix compilation issue in mat=perm.inverse()	2016-12-13 23:10:27 +01:00
Benoit Steiner	6811e6cf49	Merged in srvasude/eigen/fix_cuda_exp (pull request PR-268) Fix expm1 CUDA implementation (do not shadow exp CUDA implementation).	2016-12-08 05:14:11 -08:00
Angelos Mantzaflaris	7694684992	Remove superfluous const's (can cause warnings on some Intel compilers) (grafted from e236d3443c79f38aa721d95e64c275abbb5df10f )	2016-12-07 00:37:48 +01:00

1 2 3 4 5 ...

5206 Commits