5336 Commits

Author SHA1 Message Date
Gael Guennebaud
6a5fe86098 Complete rewrite of column-major-matrix * vector product to deliver higher performance of modern CPU.
The previous code has been optimized for Intel core2 for which unaligned loads/stores were prohibitively expensive.
This new version exhibits much higher instruction independence (better pipelining) and explicitly leverage FMA.
According to my benchmark, on Haswell this new kernel is always faster than the previous one, and sometimes even twice as fast.
Even higher performance could be achieved with a better blocking size heuristic and, perhaps, with explicit prefetching.
We should also check triangular product/solve to optimally exploit this new kernel (working on vertical panel of 4 columns is probably not optimal anymore).
2016-12-03 21:14:14 +01:00
Christoph Hertzberg
22f7d398e2 bug #1355: Fixed wrong line-endings on two files 2016-12-02 11:22:05 +01:00
Gael Guennebaud
27873008d4 Clean up SparseCore module regarding ReverseInnerIterator 2016-12-01 21:55:10 +01:00
Angelos Mantzaflaris
8c24723a09 typo UIntPtr
(grafted from b6f04a2dd4d68fe1858524709813a5df5b9a085b
)
2016-12-01 21:25:58 +01:00
Angelos Mantzaflaris
aeba0d8655 fix two warnings(unused typedef, unused variable) and a typo
(grafted from a9aa3bcf50d55b63c8adb493a06c903ec34251c6
)
2016-12-01 21:23:43 +01:00
Gael Guennebaud
181138a1cb fix member order 2016-12-01 17:06:20 +01:00
Gael Guennebaud
9f297d57ae Merged in rmlarsen/eigen (pull request PR-256)
Add a default constructor for the "fake" __half class when not using the __half class provided by CUDA.
2016-12-01 15:27:33 +00:00
Benoit Steiner
7ff26ddcbb Merged eigen/eigen into default 2016-12-01 07:13:17 -08:00
Gael Guennebaud
037b46762d Fix misleading-indentation warnings. 2016-12-01 16:05:42 +01:00
Mehdi Goli
79aa2b784e Adding sycl backend for TensorPadding.h; disbaling __unit128 for sycl in TensorIntDiv.h; disabling cashsize for sycl in tensorDeviceDefault.h; adding sycl backend for StrideSliceOP ; removing sycl compiler warning for creating an array of size 0 in CXX11Meta.h; cleaning up the sycl backend code. 2016-12-01 13:02:27 +00:00
Benoit Steiner
fd1dc3363e Merged eigen/eigen into default 2016-11-30 20:16:17 -08:00
Gael Guennebaud
8df272af88 Fix slection of product implementation for dynamic size matrices with fixed max size. 2016-11-30 22:21:33 +01:00
Gael Guennebaud
c927af60ed Fix a performance regression in (mat*mat)*vec for which mat*mat was evaluated multiple times. 2016-11-30 17:59:13 +01:00
Gael Guennebaud
ab4ef5e66e bug #1351: fix compilation of random with old compilers 2016-11-30 17:37:53 +01:00
Rasmus Munk Larsen
a0329f64fb Add a default constructor for the "fake" __half class when not using the
__half class provided by CUDA.
2016-11-29 13:18:09 -08:00
Benoit Steiner
9f8fbd9434 Merged eigen/eigen into default 2016-11-26 11:28:25 -08:00
Mehdi Goli
7318daf887 Fixing LLVM error on TensorMorphingSycl.h on GPU; fixing int64_t crash for tensor_broadcast_sycl on GPU; adding get_sycl_supported_devices() on syclDevice.h. 2016-11-25 16:19:07 +00:00
Benoit Steiner
3be1afca11 Disabled the "remove the call to 'std::abs' since unsigned values cannot be negative" warning introduced in clang 3.5 2016-11-23 18:49:51 -08:00
Mehdi Goli
b8cc5635d5 Removing unsupported device from test case; cleaning the tensor device sycl. 2016-11-23 16:30:41 +00:00
Gael Guennebaud
e340866c81 Fix compilation with gcc and old ABI version 2016-11-23 14:04:57 +01:00
Gael Guennebaud
a91de27e98 Fix compilation issue with MSVC:
MSVC always messes up with shadowed template arguments, for instance in:
  struct B { typedef float T; }
  template<typename T> struct A : B {
    T g;
  };
The type of A<double>::g will be float and not double.
2016-11-23 12:24:48 +01:00
Gael Guennebaud
74637fa4e3 Optimize predux<Packet8f> (AVX) 2016-11-22 21:57:52 +01:00
Gael Guennebaud
178c084856 Disable usage of SSE3 _mm_hadd_ps that is extremely slow. 2016-11-22 21:53:14 +01:00
Gael Guennebaud
7dd894e40e Optimize predux<Packet4d> (AVX) 2016-11-22 21:41:30 +01:00
Gael Guennebaud
f3fb0a1940 Disable usage of SSE3 haddpd that is extremely slow. 2016-11-22 16:58:31 +01:00
Gael Guennebaud
6a84246a6a Fix regression in assigment of sparse block to spasre block. 2016-11-21 21:46:42 +01:00
Benoit Steiner
ed839c5851 Enable the use of constant expressions with clang >= 3.6 2016-11-20 10:34:49 -08:00
Gael Guennebaud
465ede0f20 Fix compilation issue in mat = permutation (regression introduced in 8193ffb3d38b56c9295f204dc57dc6bac74f58aa
)
2016-11-20 09:41:37 +01:00
Benoit Steiner
81151bd474 Fixed merge conflicts 2016-11-19 19:12:59 -08:00
Benoit Steiner
1bdf1b9ce0 Merged in benoitsteiner/opencl (pull request PR-253)
OpenCL improvements
2016-11-19 04:44:43 +00:00
Benoit Steiner
8649e16c2a Enable EIGEN_HAS_C99_MATH when building with the latest version of Visual Studio 2016-11-18 14:18:34 -08:00
Gael Guennebaud
164414c563 Merged in ChunW/eigen (pull request PR-252)
Workaround for error in VS2012 with /clr
2016-11-18 21:07:29 +00:00
Luke Iwanski
5159675c33 Added isnan, isfinite and isinf for SYCL device. Plus test for that. 2016-11-18 16:01:48 +00:00
Gael Guennebaud
8193ffb3d3 bug #1343: fix compilation regression in mat+=selfadjoint_view.
Generic EigenBase2EigenBase assignment was incomplete.
2016-11-18 10:17:34 +01:00
Gael Guennebaud
cebff7e3a2 bug #1343: fix compilation regression in array = matrix_product 2016-11-18 10:09:33 +01:00
Benoit Steiner
7c30078b9f Merged eigen/eigen into default 2016-11-17 22:53:37 -08:00
Chun Wang
0d0948c3b9 Workaround for error in VS2012 with /clr 2016-11-17 17:54:27 -05:00
Konstantinos Margaritis
672aa97d4d implement float/std::complex<float> for ZVector as well, minor fixes to ZVector 2016-11-17 13:27:33 -05:00
Luke Iwanski
c5130dedbe Specialised basic math functions for SYCL device. 2016-11-17 11:47:13 +00:00
Benoit Steiner
f2e8b73256 Enable the use of AVX512 instruction by default 2016-11-16 21:28:04 -08:00
Gael Guennebaud
7b09e4dd8c bump default branch to 3.3.90 2016-11-16 22:20:58 +01:00
Benoit Steiner
dff9a049c4 Optimized the computation of exp, sqrt, ceil anf floor for fp16 on Pascal GPUs 2016-11-16 09:01:51 -08:00
Gael Guennebaud
0ee92aa38e Optimize sparse<bool> && sparse<bool> to use the same path as for coeff-wise products. 2016-11-14 18:47:41 +01:00
Gael Guennebaud
2e334f5da0 bug #426: move operator && and || to MatrixBase and SparseMatrixBase. 2016-11-14 18:47:02 +01:00
Gael Guennebaud
a048aba14c Merged in olesalscheider/eigen (pull request PR-248)
Make sure not to call numext::maxi on expression templates
2016-11-14 13:25:53 +00:00
Gael Guennebaud
eedb87f4ba Fix regression in SparseMatrix::ReverseInnerIterator 2016-11-14 14:05:53 +01:00
Niels Ole Salscheider
51fef87408 Make sure not to call numext::maxi on expression templates 2016-11-12 12:20:57 +01:00
Gael Guennebaud
eeac81b8c0 bump to 3.3.0 2016-11-10 13:55:14 +01:00
Gael Guennebaud
e80bc2ddb0 Fix printing of sparse expressions 2016-11-10 10:35:32 +01:00
Benoit Steiner
db3903498d Merged in benoitsteiner/opencl (pull request PR-246)
Improved support for OpenCL
2016-11-08 22:28:44 +00:00