526 Commits

Author SHA1 Message Date
Jeremy Barnes
91678f489a Cleaned up double-defined macro from last commit 2016-01-10 22:44:45 -05:00
Jeremy Barnes
403a7cb6c3 Alternative way of forcing instantiation of device kernels without
causing warnings or requiring device to device kernel invocations.

This allows Tensorflow to work on SM 3.0 (ie, Amazon EC2) machines.
2016-01-10 22:39:13 -05:00
Benoit Steiner
e76904af1b Simplified the dispatch code. 2016-01-08 16:50:57 -08:00
Benoit Steiner
d726e864ac Made it possible to use array of size 0 on CUDA devices 2016-01-08 16:38:14 -08:00
Benoit Steiner
3358dfd5dd Reworked the dispatch of optimized cuda reduction kernels to workaround a nvcc bug that prevented the code from compiling in optimized mode in some cases 2016-01-08 16:28:53 -08:00
Benoit Steiner
53749ff415 Prevent nvcc from miscompiling the cuda metakernel. Unfortunately this reintroduces some compulation warnings but it's much better than having to deal with random assertion failures. 2016-01-08 13:53:40 -08:00
Benoit Steiner
6639b7d6e8 Removed a couple of partial specialization that confuse nvcc and result in errors such as this:
error: more than one partial specialization matches the template argument list of class "Eigen::internal::get<3, Eigen::internal::numeric_list<std::size_t, 1UL, 1UL, 1UL, 1UL>>"
            "Eigen::internal::get<n, Eigen::internal::numeric_list<T, a, as...>>"
            "Eigen::internal::get<n, Eigen::internal::numeric_list<T, as...>>"
2016-01-07 18:45:19 -08:00
Benoit Steiner
0cb2ca5de2 Fixed a typo. 2016-01-06 18:50:28 -08:00
Benoit Steiner
213459d818 Optimized the performance of broadcasting of scalars. 2016-01-06 18:47:45 -08:00
Benoit Steiner
cfff40b1d4 Improved the performance of reductions on CUDA devices 2016-01-04 17:25:00 -08:00
Benoit Steiner
515dee0baf Added a 'divup' util to compute the floor of the quotient of two integers 2016-01-04 16:29:26 -08:00
Gael Guennebaud
978c379ed7 Add missing ctor from uint 2015-12-30 12:52:38 +01:00
Eugene Brevdo
f7362772e3 Add digamma for CPU + CUDA. Includes tests. 2015-12-24 21:15:38 -08:00
Benoit Steiner
bdcbc66a5c Don't attempt to vectorize mean reductions of integers since we can't use
SSE or AVX instructions to divide 2 integers.
2015-12-22 17:51:55 -08:00
Benoit Steiner
a1e08fb2a5 Optimized the configuration of the outer reduction cuda kernel 2015-12-22 16:30:10 -08:00
Benoit Steiner
9c7d96697b Added missing define 2015-12-22 16:11:07 -08:00
Benoit Steiner
e7e6d01810 Made sure the optimized gpu reduction code is actually compiled. 2015-12-22 15:07:33 -08:00
Benoit Steiner
b5d2078c4a Optimized outer reduction on GPUs. 2015-12-22 15:06:17 -08:00
Benoit Steiner
1c3e78319d Added missing const 2015-12-21 15:05:01 -08:00
Benoit Steiner
1b82969559 Add alignment requirement for local buffer used by the slicing op. 2015-12-18 14:36:35 -08:00
Benoit Steiner
75a7fa1919 Doubled the speed of full reductions on GPUs. 2015-12-18 14:07:31 -08:00
Benoit Steiner
8dd17cbe80 Fixed a clang compilation warning triggered by the use of arrays of size 0. 2015-12-17 14:00:33 -08:00
Benoit Steiner
4aac55f684 Silenced some compilation warnings triggered by nvcc 2015-12-17 13:39:01 -08:00
Benoit Steiner
40e6250fc3 Made it possible to run tensor chipping operations on CUDA devices 2015-12-17 13:29:08 -08:00
Benoit Steiner
17352e2792 Made the entire TensorFixedSize api callable from a CUDA kernel. 2015-12-14 15:20:31 -08:00
Benoit Steiner
75e19fc7ca Marked the tensor constructors as EIGEN_DEVICE_FUNC: This makes it possible to call them from a CUDA kernel. 2015-12-14 15:12:55 -08:00
Gael Guennebaud
ca39b1546e Merged in ebrevdo/eigen (pull request PR-148)
Add special functions to eigen: lgamma, erf, erfc.
2015-12-11 11:52:09 +01:00
Benoit Steiner
6af52a1227 Fixed a typo in the constructor of tensors of rank 5. 2015-12-10 23:31:12 -08:00
Benoit Steiner
8e00ea9a92 Fixed the coefficient accessors use for the 2d and 3d case when compiling without cxx11 support. 2015-12-10 22:45:10 -08:00
Eugene Brevdo
fa4f933c0f Add special functions to Eigen: lgamma, erf, erfc.
Includes CUDA support and unit tests.
2015-12-07 15:24:49 -08:00
Benoit Steiner
7dfe75f445 Fixed compilation warnings 2015-12-07 08:12:30 -08:00
Benoit Steiner
f4ca8ad917 Use signed integers instead of unsigned ones more consistently in the codebase. 2015-12-04 18:14:16 -08:00
Benoit Steiner
490d26e4c1 Use integers instead of std::size_t to encode the number of dimensions in the Tensor class since most of the code currently already use integers. 2015-12-04 10:15:11 -08:00
Benoit Steiner
d20efc974d Made it possible to use the sigmoid functor within a CUDA kernel. 2015-12-04 09:38:15 -08:00
Benoit Steiner
029052d276 Deleted redundant code 2015-12-03 17:08:47 -08:00
Mark Borgerding
7ddcf97da7 added scalar_sign_op (both real,complex) 2015-11-24 17:15:07 -05:00
Benoit Steiner
44848ac39b Fixed a bug in TensorArgMax.h 2015-11-23 15:58:47 -08:00
Benoit Steiner
547a8608e5 Fixed the implementation of Eigen::internal::count_leading_zeros for MSVC.
Also updated the code to silence bogux warnings generated by nvcc when compilining this function.
2015-11-23 12:17:45 -08:00
Benoit Steiner
562078780a Don't create more cuda blocks than necessary 2015-11-23 11:00:10 -08:00
Benoit Steiner
df31ca3b9e Made it possible to refer t oa GPUDevice from code compile with a regular C++ compiler 2015-11-23 10:03:53 -08:00
Benoit Steiner
1e04059012 Deleted unused variable. 2015-11-23 08:36:54 -08:00
Benoit Steiner
9fa65d3838 Split TensorDeviceType.h in 3 files to make it more manageable 2015-11-20 17:42:50 -08:00
Benoit Steiner
a367804856 Added option to force the usage of the Eigen array class instead of the std::array class. 2015-11-20 12:41:40 -08:00
Benoit Steiner
383d1cc2ed Added proper support for fast 64bit integer division on CUDA 2015-11-20 11:09:46 -08:00
Benoit Steiner
f37a5f1c53 Fixed compilation error triggered by nvcc 2015-11-19 14:34:26 -08:00
Benoit Steiner
f8df393165 Added support for 128bit integers on CUDA devices. 2015-11-19 13:57:27 -08:00
Benoit Steiner
1dd444ea71 Avoid using the version of TensorIntDiv optimized for 32-bit integers when the divisor can be equal to one since it isn't supported. 2015-11-18 11:37:58 -08:00
Benoit Steiner
f1fbd74db9 Added sanity check 2015-11-13 09:07:27 -08:00
Benoit Steiner
7815b84be4 Fixed a compilation warning 2015-11-12 20:16:59 -08:00
Benoit Steiner
10a91930cc Fixed a compilation warning triggered by nvcc 2015-11-12 20:10:52 -08:00