Benoit Steiner
e96c77668d
Merged in rmlarsen/eigen2 (pull request PR-292)
...
Adds a fast memcpy function to Eigen.
2017-01-25 00:14:04 +00:00
Rasmus Munk Larsen
3be5ee2352
Update copy helper to use fast_memcpy.
2017-01-24 14:22:49 -08:00
Rasmus Munk Larsen
e6b1020221
Adds a fast memcpy function to Eigen. This takes advantage of the following:
...
1. For small fixed sizes, the compiler generates inline code for memcpy, which is much faster.
2. My colleague eriche at googl dot com discovered that for large sizes, memmove is significantly faster than memcpy (at least on Linux with GCC or Clang). See benchmark numbers measured on a Haswell (HP Z440) workstation here: https://docs.google.com/a/google.com/spreadsheets/d/1jLs5bKzXwhpTySw65MhG1pZpsIwkszZqQTjwrd_n0ic/pubhtml This is of course surprising since memcpy is a less constrained version of memmove. This stackoverflow thread contains some speculation as to the causes: http://stackoverflow.com/questions/22793669/poor-memcpy-performance-on-linux
Below are numbers for copying and slicing tensors using the multithreaded TensorDevice. The numbers show significant improvements for memcpy of very small blocks and for memcpy of large blocks single threaded (we were already able to saturate memory bandwidth for >1 threads before on large blocks). The "slicingSmallPieces" benchmark also shows small consistent improvements, since memcpy cost is a fair portion of that particular computation.
The benchmarks operate on NxN matrices, and the names are of the form BM_$OP_${NUMTHREADS}T/${N}.
Measured improvements in wall clock time:
Run on rmlarsen3.mtv (12 X 3501 MHz CPUs); 2017-01-20T11:26:31.493023454-08:00
CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_memcpy_1T/2 3.48 2.39 +31.3%
BM_memcpy_1T/8 12.3 6.51 +47.0%
BM_memcpy_1T/64 371 383 -3.2%
BM_memcpy_1T/512 66922 66720 +0.3%
BM_memcpy_1T/4k 9892867 6849682 +30.8%
BM_memcpy_1T/5k 14951099 10332856 +30.9%
BM_memcpy_2T/2 3.50 2.46 +29.7%
BM_memcpy_2T/8 12.3 7.66 +37.7%
BM_memcpy_2T/64 371 376 -1.3%
BM_memcpy_2T/512 66652 66788 -0.2%
BM_memcpy_2T/4k 6145012 6117776 +0.4%
BM_memcpy_2T/5k 9181478 9010942 +1.9%
BM_memcpy_4T/2 3.47 2.47 +31.0%
BM_memcpy_4T/8 12.3 6.67 +45.8
BM_memcpy_4T/64 374 376 -0.5%
BM_memcpy_4T/512 67833 68019 -0.3%
BM_memcpy_4T/4k 5057425 5188253 -2.6%
BM_memcpy_4T/5k 7555638 7779468 -3.0%
BM_memcpy_6T/2 3.51 2.50 +28.8%
BM_memcpy_6T/8 12.3 7.61 +38.1%
BM_memcpy_6T/64 373 378 -1.3%
BM_memcpy_6T/512 66871 66774 +0.1%
BM_memcpy_6T/4k 5112975 5233502 -2.4%
BM_memcpy_6T/5k 7614180 7772246 -2.1%
BM_memcpy_8T/2 3.47 2.41 +30.5%
BM_memcpy_8T/8 12.4 10.5 +15.3%
BM_memcpy_8T/64 372 388 -4.3%
BM_memcpy_8T/512 67373 66588 +1.2%
BM_memcpy_8T/4k 5148462 5254897 -2.1%
BM_memcpy_8T/5k 7660989 7799058 -1.8%
BM_memcpy_12T/2 3.50 2.40 +31.4%
BM_memcpy_12T/8 12.4 7.55 +39.1
BM_memcpy_12T/64 374 378 -1.1%
BM_memcpy_12T/512 67132 66683 +0.7%
BM_memcpy_12T/4k 5185125 5292920 -2.1%
BM_memcpy_12T/5k 7717284 7942684 -2.9%
BM_slicingSmallPieces_1T/2 47.3 47.5 +0.4%
BM_slicingSmallPieces_1T/8 53.6 52.3 +2.4%
BM_slicingSmallPieces_1T/64 491 476 +3.1%
BM_slicingSmallPieces_1T/512 21734 18814 +13.4%
BM_slicingSmallPieces_1T/4k 394660 396760 -0.5%
BM_slicingSmallPieces_1T/5k 218722 209244 +4.3%
BM_slicingSmallPieces_2T/2 80.7 79.9 +1.0%
BM_slicingSmallPieces_2T/8 54.2 53.1 +2.0
BM_slicingSmallPieces_2T/64 497 477 +4.0%
BM_slicingSmallPieces_2T/512 21732 18822 +13.4%
BM_slicingSmallPieces_2T/4k 392885 390490 +0.6%
BM_slicingSmallPieces_2T/5k 221988 208678 +6.0%
BM_slicingSmallPieces_4T/2 80.8 80.1 +0.9%
BM_slicingSmallPieces_4T/8 54.1 53.2 +1.7%
BM_slicingSmallPieces_4T/64 493 476 +3.4%
BM_slicingSmallPieces_4T/512 21702 18758 +13.6%
BM_slicingSmallPieces_4T/4k 393962 404023 -2.6%
BM_slicingSmallPieces_4T/5k 249667 211732 +15.2%
BM_slicingSmallPieces_6T/2 80.5 80.1 +0.5%
BM_slicingSmallPieces_6T/8 54.4 53.4 +1.8%
BM_slicingSmallPieces_6T/64 488 478 +2.0%
BM_slicingSmallPieces_6T/512 21719 18841 +13.3%
BM_slicingSmallPieces_6T/4k 394950 397583 -0.7%
BM_slicingSmallPieces_6T/5k 223080 210148 +5.8%
BM_slicingSmallPieces_8T/2 81.2 80.4 +1.0%
BM_slicingSmallPieces_8T/8 58.1 53.5 +7.9%
BM_slicingSmallPieces_8T/64 489 480 +1.8%
BM_slicingSmallPieces_8T/512 21586 18798 +12.9%
BM_slicingSmallPieces_8T/4k 394592 400165 -1.4%
BM_slicingSmallPieces_8T/5k 219688 208301 +5.2%
BM_slicingSmallPieces_12T/2 80.2 79.8 +0.7%
BM_slicingSmallPieces_12T/8 54.4 53.4 +1.8
BM_slicingSmallPieces_12T/64 488 476 +2.5%
BM_slicingSmallPieces_12T/512 21931 18831 +14.1%
BM_slicingSmallPieces_12T/4k 393962 396541 -0.7%
BM_slicingSmallPieces_12T/5k 218803 207965 +5.0%
2017-01-24 13:55:18 -08:00
Rasmus Munk Larsen
7b6aaa3440
Fix NaN propagation for AVX512.
2017-01-24 13:37:08 -08:00
Rasmus Munk Larsen
5e144bbaa4
Make NaN propagatation consistent between the pmax/pmin and std::max/std::min. This makes the NaN propagation consistent between the scalar and vectorized code paths of Eigen's scalar_max_op and scalar_min_op.
...
See #1373 for details.
2017-01-24 13:32:50 -08:00
Gael Guennebaud
d83db761a2
Add support for std::integral_constant
2017-01-24 16:28:12 +01:00
Gael Guennebaud
bc10201854
Add test for multiple symbols
2017-01-24 16:27:51 +01:00
Gael Guennebaud
c43d254d13
Fix seq().reverse() in c++98
2017-01-24 11:36:43 +01:00
Gael Guennebaud
ddd83f82d8
Add support for "SymbolicExpr op fix<N>" in C++98/11 mode.
2017-01-24 10:54:42 +01:00
Gael Guennebaud
228fef1b3a
Extended the set of arithmetic operators supported by FixedInt (-,+,*,/,%,&,|)
2017-01-24 10:53:51 +01:00
Gael Guennebaud
bb52f74e62
Add internal doc
2017-01-24 10:13:35 +01:00
Gael Guennebaud
41c523a0ab
Rename fix_t to FixedInt
2017-01-24 09:39:49 +01:00
Gael Guennebaud
ba3f977946
bug #1376 : add missing assertion on size mismatch with compound assignment operators (e.g., mat += mat.col(j))
2017-01-23 22:06:08 +01:00
Gael Guennebaud
b0db4eff36
bug #1382 : move using std::size_t/ptrdiff_t to Eigen's namespace (still better than the global namespace!)
2017-01-23 22:03:57 +01:00
Gael Guennebaud
ca79c1545a
Add std:: namespace prefix to all (hopefully) instances if size_t/ptrdfiff_t
2017-01-23 22:02:53 +01:00
Gael Guennebaud
4b607b5692
Use Index instead of size_t
2017-01-23 22:00:33 +01:00
Gael Guennebaud
0fe278f7be
bug #1379 : fix compilation in sparse*diagonal*dense with openmp
2017-01-21 23:27:01 +01:00
Gael Guennebaud
22a172751e
bug #1378 : fix doc (DiagonalIndex vs Diagonal)
2017-01-21 22:09:59 +01:00
Gael Guennebaud
4d302a080c
Recover compile-time size from seq(A,B) when A and B are fixed values. (c++11 only)
2017-01-19 20:34:18 +01:00
Gael Guennebaud
54f3fbee24
Exploit fixed values in seq and reverse with C++98 compatibility
2017-01-19 19:57:32 +01:00
Gael Guennebaud
7691723e34
Add support for fixed-value in symbolic expression, c++11 only for now.
2017-01-19 19:25:29 +01:00
Benoit Steiner
924600a0e8
Made sure that enabling avx2 instructions enables avx and sse instructions as well.
2017-01-19 09:54:48 -08:00
Gael Guennebaud
e84ed7b6ef
Remove dead code
2017-01-18 23:18:28 +01:00
Gael Guennebaud
f3ccbe0419
Add a Symbolic::FixedExpr helper expression to make sure the compiler fully optimize the usage of last and end.
2017-01-18 23:16:32 +01:00
Gael Guennebaud
15471432fe
Add a .reverse() member to ArithmeticSequence.
2017-01-18 11:35:27 +01:00
Gael Guennebaud
e4f8dd860a
Add missing operator*
2017-01-18 10:49:01 +01:00
Gael Guennebaud
198507141b
Update all block expressions to accept compile-time sizes passed by fix<N> or fix<N>(n)
2017-01-18 09:43:58 +01:00
Gael Guennebaud
5484ddd353
Merge the generic and dynamic overloads of block()
2017-01-17 22:11:46 +01:00
Gael Guennebaud
655ba783f8
Defer set-to-zero in triangular = product so that no aliasing issue occur in the common:
...
A.triangularView() = B*A.sefladjointView()*B.adjoint()
case that used to work in 3.2.
2017-01-17 18:03:35 +01:00
Gael Guennebaud
5e36ec3b6f
Fix regression when passing enums to operator()
2017-01-17 17:10:16 +01:00
Gael Guennebaud
4f36dcfda8
Add a generic block() method compatible with Eigen::fix
2017-01-17 11:34:28 +01:00
Gael Guennebaud
71e5b71356
Add a get_runtime_value helper to deal with pointer-to-function hack,
...
plus some refactoring to make the internals more consistent.
2017-01-17 11:33:57 +01:00
Gael Guennebaud
23bfcfc15f
Add missing overload of get_compile_time for c++98/11
2017-01-17 10:30:21 +01:00
Gael Guennebaud
edff32c2c2
Disambiguate the two versions of fix for doxygen
2017-01-17 10:29:33 +01:00
Gael Guennebaud
4989922be2
Add support for symbolic expressions as arguments of operator()
2017-01-16 22:21:23 +01:00
Gael Guennebaud
12e22a2844
typos in doc
2017-01-16 16:31:19 +01:00
Gael Guennebaud
e70c4c97fa
Typo
2017-01-16 16:20:16 +01:00
Gael Guennebaud
a9232af845
Introduce a variable_or_fixed<N> proxy returned by fix<N>(val) to pass both a compile-time and runtime fallback value in case N means "runtime".
...
This mechanism is used by the seq/seqN functions. The proxy object is immediately converted to pure compile-time (as fix<N>) or pure runtime (i.e., an Index) to avoid redundant template instantiations.
2017-01-16 16:17:01 +01:00
Gael Guennebaud
6e97698161
Introduce a EIGEN_HAS_CXX14 macro
2017-01-16 16:13:37 +01:00
Mehdi Goli
e46e722381
Adding Tensor ReverseOp; TensorStriding; TensorConversionOp; Modifying Tensor Contractsycl to be located in any place in the expression tree.
2017-01-16 13:58:49 +00:00
Luke Iwanski
23778a15d8
Reverting unintentional change to Eigen/Geometry
2017-01-16 11:05:56 +00:00
Fraser Cormack
8245d3c7ad
Fix case-sensitivity of file include
2017-01-12 12:13:18 +00:00
Gael Guennebaud
752bd92ba5
Large code refactoring:
...
- generalize some utilities and move them to Meta (size(), array_size())
- move handling of all and single indices to IndexedViewHelper.h
- several cleanup changes
2017-01-11 17:24:02 +01:00
Gael Guennebaud
f93d1c58e0
Make get_compile_time compatible with variable_if_dynamic
2017-01-11 17:08:59 +01:00
Gael Guennebaud
c020d307a6
Make variable_if_dynamic<T> implicitely convertible to T
2017-01-11 17:08:05 +01:00
Gael Guennebaud
43c617e2ee
merge
2017-01-11 14:33:37 +01:00
Gael Guennebaud
b1dc0fa813
Move fix and symbolic to their own file, and improve doxygen compatibility
2017-01-11 14:28:28 +01:00
Gael Guennebaud
04397f17e2
Add 1D overloads of operator()
2017-01-11 13:17:09 +01:00
Gael Guennebaud
1b5570988b
Add doc to seq, seqN, ArithmeticSequence, operator(), etc.
2017-01-10 22:58:58 +01:00
Gael Guennebaud
17eac60446
Factorize const and non-const version of the generic operator() method.
2017-01-10 21:45:55 +01:00