Benoit Steiner
|
7ce932edd3
|
Small cleanup and small fix to the contraction of row major tensors
|
2016-01-20 18:12:08 -08:00 |
|
Gael Guennebaud
|
62f7e77711
|
add upper|lower case in incomplete_cholesky unit test
|
2016-01-21 00:02:59 +01:00 |
|
Benoit Steiner
|
47076bf00e
|
Reduce the register pressure exerted by the tensor mappers whenever possible. This improves the performance of the contraction of a matrix with a vector by about 35%.
|
2016-01-20 14:51:48 -08:00 |
|
Benoit Steiner
|
ebd3388ee6
|
Pulled latest updates from trunk
|
2016-01-20 13:56:43 -08:00 |
|
Gael Guennebaud
|
ed8ade9c65
|
bug #1149: fix Pastix*::*parm()
|
2016-01-20 19:01:24 +01:00 |
|
Gael Guennebaud
|
4c5e96aab6
|
bug #1148: silent Pastix by default
|
2016-01-20 18:56:17 +01:00 |
|
Gael Guennebaud
|
db237d0c75
|
bug #1145: fix PastixSupport LLT/LDLT wrappers (missing resize prior to calls to selfAdjointView)
|
2016-01-20 18:49:01 +01:00 |
|
Gael Guennebaud
|
0b7169d1f7
|
bug #1147: fix compilation of PastixSupport
|
2016-01-20 18:15:59 +01:00 |
|
Gael Guennebaud
|
234a1094b7
|
Add static assertion to y(), z(), w() accessors
|
2016-01-20 09:18:44 +01:00 |
|
Ville Kallioniemi
|
915e7667cd
|
Remove executable bit from header files
|
2016-01-19 21:17:29 -07:00 |
|
Ville Kallioniemi
|
2832175a68
|
Use explicitly 32 bit integer types in constructors.
|
2016-01-19 20:12:17 -07:00 |
|
Benoit Steiner
|
df79c00901
|
Improved the formatting of the code
|
2016-01-19 17:24:08 -08:00 |
|
Benoit Steiner
|
6d472d8375
|
Moved the contraction mapping code to its own file to make the code more manageable.
|
2016-01-19 17:22:05 -08:00 |
|
Benoit Steiner
|
b3b722905f
|
Improved code indentation
|
2016-01-19 17:09:47 -08:00 |
|
Benoit Steiner
|
5b7713dd33
|
Record whether the underlying tensor storage can be accessed directly during the evaluation of an expression.
|
2016-01-19 17:05:10 -08:00 |
|
Ville Kallioniemi
|
63fb66f53a
|
Add ctor for long
|
2016-01-17 21:25:36 -07:00 |
|
Eugene Brevdo
|
6a75e7e0d5
|
Digamma cleanup
* Added permission from cephes author to use his code
* Cleanup in ArrayCwiseUnaryOps
|
2016-01-15 16:32:21 -08:00 |
|
Benoit Steiner
|
34057cff23
|
Fixed a race condition that could affect some reductions on CUDA devices.
|
2016-01-15 15:11:56 -08:00 |
|
Benoit Steiner
|
0461f0153e
|
Made it possible to compare tensor dimensions inside a CUDA kernel.
|
2016-01-15 11:22:16 -08:00 |
|
Benoit Steiner
|
aed4cb1269
|
Use warp shuffles instead of shared memory access to speedup the inner reduction kernel.
|
2016-01-14 21:45:14 -08:00 |
|
Benoit Steiner
|
c1a42c2d0d
|
Don't disable the AVX implementations of plset when compiling with AVX512 enabled
|
2016-01-14 17:21:39 -08:00 |
|
Benoit Steiner
|
0366478df8
|
Added alignment requirement to the AVX512 packet traits.
|
2016-01-14 17:02:39 -08:00 |
|
Benoit Steiner
|
3cfd16f3af
|
Fixed the signature of the plset primitives for AVX512
|
2016-01-14 16:58:01 -08:00 |
|
Benoit Steiner
|
67f44365ea
|
Fixed the AVX512 signature of the ptranspose primitives
|
2016-01-14 16:51:11 -08:00 |
|
Benoit Steiner
|
a282eb1363
|
pscatter/pgather use Index instead of int to specify the stride
|
2016-01-14 16:39:39 -08:00 |
|
Benoit Steiner
|
7832485575
|
Deleted unnecessary commas and semicolons
|
2016-01-14 16:36:29 -08:00 |
|
Benoit Steiner
|
8fe2532e70
|
Fixed a boundary condition bug in the outer reduction kernel
|
2016-01-14 09:29:48 -08:00 |
|
Benoit Steiner
|
9f013a9d86
|
Properly record the rank of reduced tensors in the tensor traits.
|
2016-01-13 14:24:37 -08:00 |
|
Benoit Steiner
|
79b69b7444
|
Trigger the optimized matrix vector path more conservatively.
|
2016-01-12 15:21:09 -08:00 |
|
Benoit Steiner
|
d920d57f38
|
Improved the performance of the contraction of a 2d tensor with a 1d tensor by a factor of 3 or more. This helps speedup LSTM neural networks.
|
2016-01-12 11:32:27 -08:00 |
|
Benoit Steiner
|
bd7d901da9
|
Reverted a previous change that tripped nvcc when compiling in debug mode.
|
2016-01-11 17:49:44 -08:00 |
|
Benoit Steiner
|
bbdabbb379
|
Made the blas utils usable from within a cuda kernel
|
2016-01-11 17:26:56 -08:00 |
|
Benoit Steiner
|
c5e6900400
|
Silenced a few compilation warnings.
|
2016-01-11 17:06:39 -08:00 |
|
Benoit Steiner
|
f894736d61
|
Updated the tensor traits: the alignment is not part of the Flags enum anymore
|
2016-01-11 16:42:18 -08:00 |
|
Benoit Steiner
|
4f7714d72c
|
Enabled the use of fixed dimensions from within a cuda kernel.
|
2016-01-11 16:01:00 -08:00 |
|
Benoit Steiner
|
01c55d37e6
|
Deleted unused variable.
|
2016-01-11 15:53:19 -08:00 |
|
Benoit Steiner
|
0504c56ea7
|
Silenced a nvcc compilation warning
|
2016-01-11 15:49:21 -08:00 |
|
Benoit Steiner
|
b523771a24
|
Silenced several compilation warnings triggered by nvcc.
|
2016-01-11 14:25:43 -08:00 |
|
Benoit Steiner
|
2c3b13eded
|
Merged in jeremy_barnes/eigen/shader-model-3.0 (pull request PR-152)
Alternative way of forcing instantiation of device kernels without causing warnings or requiring device to device kernel invocations.
|
2016-01-11 11:43:37 -08:00 |
|
Benoit Steiner
|
2ccb1c8634
|
Fixed a bug in the dispatch of optimized reduction kernels.
|
2016-01-11 10:36:37 -08:00 |
|
Benoit Steiner
|
780623261e
|
Re-enabled the optimized reduction CUDA code.
|
2016-01-11 09:07:14 -08:00 |
|
Jeremy Barnes
|
91678f489a
|
Cleaned up double-defined macro from last commit
|
2016-01-10 22:44:45 -05:00 |
|
Jeremy Barnes
|
403a7cb6c3
|
Alternative way of forcing instantiation of device kernels without
causing warnings or requiring device to device kernel invocations.
This allows Tensorflow to work on SM 3.0 (ie, Amazon EC2) machines.
|
2016-01-10 22:39:13 -05:00 |
|
Gael Guennebaud
|
b557662e58
|
merge
|
2016-01-09 08:37:01 +01:00 |
|
Gael Guennebaud
|
8b9dc9f0df
|
bug #1144: fix regression in x=y+A*x (aliasing), and move evaluator_traits::AssumeAliasing to evaluator_assume_aliasing.
|
2016-01-09 08:30:38 +01:00 |
|
Benoit Steiner
|
e76904af1b
|
Simplified the dispatch code.
|
2016-01-08 16:50:57 -08:00 |
|
Benoit Steiner
|
d726e864ac
|
Made it possible to use array of size 0 on CUDA devices
|
2016-01-08 16:38:14 -08:00 |
|
Benoit Steiner
|
3358dfd5dd
|
Reworked the dispatch of optimized cuda reduction kernels to workaround a nvcc bug that prevented the code from compiling in optimized mode in some cases
|
2016-01-08 16:28:53 -08:00 |
|
Benoit Steiner
|
53749ff415
|
Prevent nvcc from miscompiling the cuda metakernel. Unfortunately this reintroduces some compulation warnings but it's much better than having to deal with random assertion failures.
|
2016-01-08 13:53:40 -08:00 |
|
Gael Guennebaud
|
f9d71a1729
|
extend matlab conversion table
|
2016-01-08 22:24:45 +01:00 |
|