Rasmus Munk Larsen
|
f519fca72b
|
Reduce overhead for small tensors and cheap ops by short-circuiting the const computation and block size calculation in parallelFor.
|
2016-05-17 16:06:00 -07:00 |
|
Benoit Steiner
|
09653e1f82
|
Improved the portability of the tensor code
|
2016-05-11 23:29:09 -07:00 |
|
Benoit Steiner
|
4ede059de1
|
Properly gate the use of half2.
|
2016-05-10 17:04:01 -07:00 |
|
Benoit Steiner
|
4670d7d5ce
|
Improved the performance of full reductions on GPU:
Before:
BM_fullReduction/10 200000 11751 8.51 MFlops/s
BM_fullReduction/80 5000 523385 12.23 MFlops/s
BM_fullReduction/640 50 36179326 11.32 MFlops/s
BM_fullReduction/4K 1 2173517195 11.50 MFlops/s
After:
BM_fullReduction/10 500000 5987 16.70 MFlops/s
BM_fullReduction/80 200000 10636 601.73 MFlops/s
BM_fullReduction/640 50000 58428 7010.31 MFlops/s
BM_fullReduction/4K 1000 2006106 12461.95 MFlops/s
|
2016-05-09 17:09:54 -07:00 |
|
Rasmus Munk Larsen
|
07ac4f7e02
|
Eigen Tensor cost model part 2: Thread scheduling for standard evaluators and reductions. The cost model is turned off by default.
|
2016-04-14 18:28:23 -07:00 |
|
Rasmus Munk Larsen
|
235e83aba6
|
Eigen cost model part 1. This implements a basic recursive framework to estimate the cost of evaluating tensor expressions.
|
2016-04-14 13:57:35 -07:00 |
|
Benoit Steiner
|
1bc81f7889
|
Fixed compilation warnings on arm
|
2016-03-28 09:21:04 -07:00 |
|
Benoit Steiner
|
41434a8a85
|
Avoid unnecessary conversions
|
2016-03-23 16:52:38 -07:00 |
|
Benoit Steiner
|
92693b50eb
|
Fixed compilation warning
|
2016-03-23 16:40:36 -07:00 |
|
Benoit Steiner
|
002cf0d1c9
|
Use a single Barrier instead of a collection of Notifications to reduce the thread synchronization overhead
|
2016-03-22 15:24:23 -07:00 |
|
Benoit Steiner
|
3149b5b148
|
Avoid implicit cast
|
2016-03-09 17:35:17 -08:00 |
|
Benoit Steiner
|
f05fb449b8
|
Avoid unnecessary conversion from 32bit int to 64bit unsigned int
|
2016-03-09 15:27:45 -08:00 |
|
Benoit Steiner
|
46177c8d64
|
Replace std::vector with our own implementation, as using the stl when compiling with nvcc and avx enabled leads to many issues.
|
2016-03-08 16:37:27 -08:00 |
|
Benoit Steiner
|
6d6413f768
|
Simplified the full reduction code
|
2016-03-08 16:02:00 -08:00 |
|
Benoit Steiner
|
e09eb835db
|
Decoupled the packet type definition from the definition of the tensor ops. All the vectorization is now defined in the tensor evaluators. This will make it possible to relialably support devices with different packet types in the same compilation unit.
|
2016-03-08 12:07:33 -08:00 |
|
Benoit Steiner
|
b2075cb7a2
|
Made the signature of the inner and outer reducers consistent
|
2016-02-29 10:53:38 -08:00 |
|
Benoit Steiner
|
3284842045
|
Optimized the performance of narrow reductions on CUDA devices
|
2016-02-29 10:48:16 -08:00 |
|
Benoit Steiner
|
c36c09169e
|
Fixed a typo in the reduction code that could prevent large full reductionsx from running properly on old cuda devices.
|
2016-02-24 17:07:25 -08:00 |
|
Benoit Steiner
|
e80ed948e1
|
Fixed a number of compilation warnings generated by the cuda tests
|
2016-01-31 20:09:41 -08:00 |
|
Benoit Steiner
|
c5d25bf1d0
|
Fixed a couple of compilation warnings.
|
2016-01-28 23:15:45 -08:00 |
|
Benoit Steiner
|
291069e885
|
Fixed some compilation problems with nvcc + clang
|
2016-01-27 15:37:03 -08:00 |
|
Benoit Steiner
|
5b7713dd33
|
Record whether the underlying tensor storage can be accessed directly during the evaluation of an expression.
|
2016-01-19 17:05:10 -08:00 |
|
Benoit Steiner
|
9f013a9d86
|
Properly record the rank of reduced tensors in the tensor traits.
|
2016-01-13 14:24:37 -08:00 |
|
Benoit Steiner
|
2c3b13eded
|
Merged in jeremy_barnes/eigen/shader-model-3.0 (pull request PR-152)
Alternative way of forcing instantiation of device kernels without causing warnings or requiring device to device kernel invocations.
|
2016-01-11 11:43:37 -08:00 |
|
Benoit Steiner
|
2ccb1c8634
|
Fixed a bug in the dispatch of optimized reduction kernels.
|
2016-01-11 10:36:37 -08:00 |
|
Benoit Steiner
|
780623261e
|
Re-enabled the optimized reduction CUDA code.
|
2016-01-11 09:07:14 -08:00 |
|
Jeremy Barnes
|
403a7cb6c3
|
Alternative way of forcing instantiation of device kernels without
causing warnings or requiring device to device kernel invocations.
This allows Tensorflow to work on SM 3.0 (ie, Amazon EC2) machines.
|
2016-01-10 22:39:13 -05:00 |
|
Benoit Steiner
|
e76904af1b
|
Simplified the dispatch code.
|
2016-01-08 16:50:57 -08:00 |
|
Benoit Steiner
|
3358dfd5dd
|
Reworked the dispatch of optimized cuda reduction kernels to workaround a nvcc bug that prevented the code from compiling in optimized mode in some cases
|
2016-01-08 16:28:53 -08:00 |
|
Benoit Steiner
|
cfff40b1d4
|
Improved the performance of reductions on CUDA devices
|
2016-01-04 17:25:00 -08:00 |
|
Benoit Steiner
|
b5d2078c4a
|
Optimized outer reduction on GPUs.
|
2015-12-22 15:06:17 -08:00 |
|
Benoit Steiner
|
4aac55f684
|
Silenced some compilation warnings triggered by nvcc
|
2015-12-17 13:39:01 -08:00 |
|
Benoit Steiner
|
8037826367
|
Simplified more of the IndexList code.
|
2015-11-12 17:19:45 -08:00 |
|
Benoit Steiner
|
e9ecfad796
|
Started to make the IndexList code compile by more compilers
|
2015-11-12 16:41:14 -08:00 |
|
Benoit Steiner
|
5cb18e5b5e
|
Fixed CUDA compilation errors
|
2015-11-11 14:36:33 -08:00 |
|
Benoit Steiner
|
d573efe303
|
Code cleanup
|
2015-11-06 14:54:28 -08:00 |
|
Benoit Steiner
|
c75a19f815
|
Misc fixes to full reductions
|
2015-11-05 14:21:20 -08:00 |
|
Benoit Steiner
|
beedd9630d
|
Updated the reduction code so that full reductions now return a tensor of rank 0.
|
2015-11-04 13:57:36 -08:00 |
|
Gael Guennebaud
|
aec4814370
|
Many files were missing in previous changeset.
|
2015-07-29 11:11:23 +02:00 |
|
Benoit Steiner
|
7d41e97fa9
|
Silenced a number of compilation warnings
|
2015-06-29 14:47:40 -07:00 |
|
Benoit Steiner
|
db9dbbda32
|
Improved performance of full reduction by 2 order of magnitude on CPU and 3 orders of magnitude on GPU
|
2015-06-29 14:06:32 -07:00 |
|
Benoit Steiner
|
0e5fed74e7
|
Worked around some constexpr related bugs in nvcc 7
|
2015-05-28 10:14:38 -07:00 |
|
Benoit Steiner
|
6620aaa4b3
|
Silenced a few compilation warnings generated by nvcc
|
2015-02-10 14:34:42 -08:00 |
|
Benoit Steiner
|
114e863f08
|
Silcenced a few compilation warnings
|
2015-02-10 12:20:24 -08:00 |
|
Benoit Steiner
|
dcb2a8b184
|
Added the EIGEN_HAS_CONSTEXPR define
Gate the tensor index list code based on the value of EIGEN_HAS_CONSTEXPR
|
2015-02-06 02:51:59 -08:00 |
|
Benoit Steiner
|
590f4b0aa3
|
Silenced some compilation warnings
|
2015-01-30 19:46:30 -08:00 |
|
Benoit Steiner
|
9dfdbd7e56
|
mproved the performance of tensor reductions that preserve the inner most dimension(s).
|
2015-01-27 14:15:31 -08:00 |
|
Benoit Steiner
|
8f4b8d204b
|
Improved the performance of tensor reductions
Added the ability to generate random numbers following a normal distribution
Created a test to validate the ability to generate random numbers.
|
2015-01-14 10:19:33 -08:00 |
|
Benoit Steiner
|
ae697b471c
|
Silenced a few compilation warnings
Generalized a TensorMap constructor
|
2014-10-16 14:52:50 -07:00 |
|
Benoit Steiner
|
7caaf6453b
|
Added support for tensor reductions and concatenations
|
2014-10-01 20:38:22 -07:00 |
|