Benoit Steiner
|
01c55d37e6
|
Deleted unused variable.
|
2016-01-11 15:53:19 -08:00 |
|
Benoit Steiner
|
0504c56ea7
|
Silenced a nvcc compilation warning
|
2016-01-11 15:49:21 -08:00 |
|
Benoit Steiner
|
b523771a24
|
Silenced several compilation warnings triggered by nvcc.
|
2016-01-11 14:25:43 -08:00 |
|
Benoit Steiner
|
2c3b13eded
|
Merged in jeremy_barnes/eigen/shader-model-3.0 (pull request PR-152)
Alternative way of forcing instantiation of device kernels without causing warnings or requiring device to device kernel invocations.
|
2016-01-11 11:43:37 -08:00 |
|
Benoit Steiner
|
780623261e
|
Re-enabled the optimized reduction CUDA code.
|
2016-01-11 09:07:14 -08:00 |
|
Jeremy Barnes
|
403a7cb6c3
|
Alternative way of forcing instantiation of device kernels without
causing warnings or requiring device to device kernel invocations.
This allows Tensorflow to work on SM 3.0 (ie, Amazon EC2) machines.
|
2016-01-10 22:39:13 -05:00 |
|
Benoit Steiner
|
53749ff415
|
Prevent nvcc from miscompiling the cuda metakernel. Unfortunately this reintroduces some compulation warnings but it's much better than having to deal with random assertion failures.
|
2016-01-08 13:53:40 -08:00 |
|
Benoit Steiner
|
cfff40b1d4
|
Improved the performance of reductions on CUDA devices
|
2016-01-04 17:25:00 -08:00 |
|
Benoit Steiner
|
a1e08fb2a5
|
Optimized the configuration of the outer reduction cuda kernel
|
2015-12-22 16:30:10 -08:00 |
|
Benoit Steiner
|
9c7d96697b
|
Added missing define
|
2015-12-22 16:11:07 -08:00 |
|
Benoit Steiner
|
e7e6d01810
|
Made sure the optimized gpu reduction code is actually compiled.
|
2015-12-22 15:07:33 -08:00 |
|
Benoit Steiner
|
b5d2078c4a
|
Optimized outer reduction on GPUs.
|
2015-12-22 15:06:17 -08:00 |
|
Benoit Steiner
|
75a7fa1919
|
Doubled the speed of full reductions on GPUs.
|
2015-12-18 14:07:31 -08:00 |
|
Benoit Steiner
|
d573efe303
|
Code cleanup
|
2015-11-06 14:54:28 -08:00 |
|