Jeremy Barnes
91678f489a
Cleaned up double-defined macro from last commit
2016-01-10 22:44:45 -05:00
Jeremy Barnes
403a7cb6c3
Alternative way of forcing instantiation of device kernels without
...
causing warnings or requiring device to device kernel invocations.
This allows Tensorflow to work on SM 3.0 (ie, Amazon EC2) machines.
2016-01-10 22:39:13 -05:00
Benoit Steiner
e76904af1b
Simplified the dispatch code.
2016-01-08 16:50:57 -08:00
Benoit Steiner
d726e864ac
Made it possible to use array of size 0 on CUDA devices
2016-01-08 16:38:14 -08:00
Benoit Steiner
3358dfd5dd
Reworked the dispatch of optimized cuda reduction kernels to workaround a nvcc bug that prevented the code from compiling in optimized mode in some cases
2016-01-08 16:28:53 -08:00
Benoit Steiner
53749ff415
Prevent nvcc from miscompiling the cuda metakernel. Unfortunately this reintroduces some compulation warnings but it's much better than having to deal with random assertion failures.
2016-01-08 13:53:40 -08:00
Benoit Steiner
6639b7d6e8
Removed a couple of partial specialization that confuse nvcc and result in errors such as this:
...
error: more than one partial specialization matches the template argument list of class "Eigen::internal::get<3, Eigen::internal::numeric_list<std::size_t, 1UL, 1UL, 1UL, 1UL>>"
"Eigen::internal::get<n, Eigen::internal::numeric_list<T, a, as...>>"
"Eigen::internal::get<n, Eigen::internal::numeric_list<T, as...>>"
2016-01-07 18:45:19 -08:00
Benoit Steiner
0cb2ca5de2
Fixed a typo.
2016-01-06 18:50:28 -08:00
Benoit Steiner
213459d818
Optimized the performance of broadcasting of scalars.
2016-01-06 18:47:45 -08:00
Benoit Steiner
cfff40b1d4
Improved the performance of reductions on CUDA devices
2016-01-04 17:25:00 -08:00
Benoit Steiner
515dee0baf
Added a 'divup' util to compute the floor of the quotient of two integers
2016-01-04 16:29:26 -08:00
Gael Guennebaud
978c379ed7
Add missing ctor from uint
2015-12-30 12:52:38 +01:00
Eugene Brevdo
f7362772e3
Add digamma for CPU + CUDA. Includes tests.
2015-12-24 21:15:38 -08:00
Benoit Steiner
bdcbc66a5c
Don't attempt to vectorize mean reductions of integers since we can't use
...
SSE or AVX instructions to divide 2 integers.
2015-12-22 17:51:55 -08:00
Benoit Steiner
a1e08fb2a5
Optimized the configuration of the outer reduction cuda kernel
2015-12-22 16:30:10 -08:00
Benoit Steiner
9c7d96697b
Added missing define
2015-12-22 16:11:07 -08:00
Benoit Steiner
e7e6d01810
Made sure the optimized gpu reduction code is actually compiled.
2015-12-22 15:07:33 -08:00
Benoit Steiner
b5d2078c4a
Optimized outer reduction on GPUs.
2015-12-22 15:06:17 -08:00
Benoit Steiner
1c3e78319d
Added missing const
2015-12-21 15:05:01 -08:00
Benoit Steiner
1b82969559
Add alignment requirement for local buffer used by the slicing op.
2015-12-18 14:36:35 -08:00
Benoit Steiner
75a7fa1919
Doubled the speed of full reductions on GPUs.
2015-12-18 14:07:31 -08:00
Benoit Steiner
8dd17cbe80
Fixed a clang compilation warning triggered by the use of arrays of size 0.
2015-12-17 14:00:33 -08:00
Benoit Steiner
4aac55f684
Silenced some compilation warnings triggered by nvcc
2015-12-17 13:39:01 -08:00
Benoit Steiner
40e6250fc3
Made it possible to run tensor chipping operations on CUDA devices
2015-12-17 13:29:08 -08:00
Benoit Steiner
17352e2792
Made the entire TensorFixedSize api callable from a CUDA kernel.
2015-12-14 15:20:31 -08:00
Benoit Steiner
75e19fc7ca
Marked the tensor constructors as EIGEN_DEVICE_FUNC: This makes it possible to call them from a CUDA kernel.
2015-12-14 15:12:55 -08:00
Gael Guennebaud
ca39b1546e
Merged in ebrevdo/eigen (pull request PR-148)
...
Add special functions to eigen: lgamma, erf, erfc.
2015-12-11 11:52:09 +01:00
Benoit Steiner
6af52a1227
Fixed a typo in the constructor of tensors of rank 5.
2015-12-10 23:31:12 -08:00
Benoit Steiner
8e00ea9a92
Fixed the coefficient accessors use for the 2d and 3d case when compiling without cxx11 support.
2015-12-10 22:45:10 -08:00
Eugene Brevdo
fa4f933c0f
Add special functions to Eigen: lgamma, erf, erfc.
...
Includes CUDA support and unit tests.
2015-12-07 15:24:49 -08:00
Benoit Steiner
7dfe75f445
Fixed compilation warnings
2015-12-07 08:12:30 -08:00
Benoit Steiner
f4ca8ad917
Use signed integers instead of unsigned ones more consistently in the codebase.
2015-12-04 18:14:16 -08:00
Benoit Steiner
490d26e4c1
Use integers instead of std::size_t to encode the number of dimensions in the Tensor class since most of the code currently already use integers.
2015-12-04 10:15:11 -08:00
Benoit Steiner
d20efc974d
Made it possible to use the sigmoid functor within a CUDA kernel.
2015-12-04 09:38:15 -08:00
Benoit Steiner
029052d276
Deleted redundant code
2015-12-03 17:08:47 -08:00
Mark Borgerding
7ddcf97da7
added scalar_sign_op (both real,complex)
2015-11-24 17:15:07 -05:00
Benoit Steiner
44848ac39b
Fixed a bug in TensorArgMax.h
2015-11-23 15:58:47 -08:00
Benoit Steiner
547a8608e5
Fixed the implementation of Eigen::internal::count_leading_zeros for MSVC.
...
Also updated the code to silence bogux warnings generated by nvcc when compilining this function.
2015-11-23 12:17:45 -08:00
Benoit Steiner
562078780a
Don't create more cuda blocks than necessary
2015-11-23 11:00:10 -08:00
Benoit Steiner
df31ca3b9e
Made it possible to refer t oa GPUDevice from code compile with a regular C++ compiler
2015-11-23 10:03:53 -08:00
Benoit Steiner
1e04059012
Deleted unused variable.
2015-11-23 08:36:54 -08:00
Benoit Steiner
9fa65d3838
Split TensorDeviceType.h in 3 files to make it more manageable
2015-11-20 17:42:50 -08:00
Benoit Steiner
a367804856
Added option to force the usage of the Eigen array class instead of the std::array class.
2015-11-20 12:41:40 -08:00
Benoit Steiner
383d1cc2ed
Added proper support for fast 64bit integer division on CUDA
2015-11-20 11:09:46 -08:00
Benoit Steiner
f37a5f1c53
Fixed compilation error triggered by nvcc
2015-11-19 14:34:26 -08:00
Benoit Steiner
f8df393165
Added support for 128bit integers on CUDA devices.
2015-11-19 13:57:27 -08:00
Benoit Steiner
1dd444ea71
Avoid using the version of TensorIntDiv optimized for 32-bit integers when the divisor can be equal to one since it isn't supported.
2015-11-18 11:37:58 -08:00
Benoit Steiner
f1fbd74db9
Added sanity check
2015-11-13 09:07:27 -08:00
Benoit Steiner
7815b84be4
Fixed a compilation warning
2015-11-12 20:16:59 -08:00
Benoit Steiner
10a91930cc
Fixed a compilation warning triggered by nvcc
2015-11-12 20:10:52 -08:00