Benoit Steiner
3b614a2358
Use NumTraits::highest() and NumTraits::lowest() instead of the std::numeric_limits to make the tensor min and max functors more CUDA friendly.
2016-03-07 17:53:28 -08:00
Eugene Brevdo
0bb5de05a1
Finishing touches on igamma/igammac for GPU. Tests now pass.
2016-03-07 15:35:09 -08:00
Benoit Steiner
769685e74e
Added the ability to pad a tensor using a non-zero value
2016-03-07 14:45:37 -08:00
Benoit Steiner
7f87cc3a3b
Fix a couple of typos in the code.
2016-03-07 14:31:27 -08:00
Eugene Brevdo
5707004d6b
Fix Eigen's building of sharded tests that use CUDA & more igamma/igammac bugfixes.
...
0. Prior to this PR, not a single sharded CUDA test was actually being *run*.
Fixed that.
GPU tests are still failing for igamma/igammac.
1. Add calls for igamma/igammac to TensorBase
2. Fix up CUDA-specific calls of igamma/igammac
3. Add unit tests for digamma, igamma, igammac in CUDA.
2016-03-07 14:08:56 -08:00
Benoit Steiner
e5f25622e2
Added a test to validate the behavior of some of the tensor syntactic sugar.
2016-03-07 09:04:27 -08:00
Benoit Steiner
9f5740cbc1
Added missing include
2016-03-06 22:03:18 -08:00
Benoit Steiner
5238e03fe1
Don't try to compile the uint128 test with compilers that don't support uint127
2016-03-06 21:59:40 -08:00
Benoit Steiner
9a54c3e32b
Don't warn that msvc 2015 isn't c++11 compliant just because it doesn't claim to be.
2016-03-06 09:38:56 -08:00
Benoit Steiner
05bbca079a
Turn on some of the cxx11 features when compiling with visual studio 2015
2016-03-05 10:52:08 -08:00
Benoit Steiner
6093eb9ff5
Don't test our 128bit emulation code when compiling with msvc
2016-03-05 10:37:11 -08:00
Benoit Steiner
57b263c5b9
Avoid using initializer lists in test since not all version of msvc support them
2016-03-05 08:35:26 -08:00
Benoit Steiner
23aed8f2e4
Use EIGEN_PI instead of redefining our own constant PI
2016-03-05 08:04:45 -08:00
Benoit Steiner
c23e0be18f
Use the CMAKE_CXX_STANDARD variable to turn on cxx11
2016-03-04 20:18:01 -08:00
Benoit Steiner
ec35068edc
Don't rely on the M_PI constant since not all compilers provide it.
2016-03-04 16:42:38 -08:00
Benoit Steiner
60d9df11c1
Fixed the computation of leading zeros when compiling with msvc.
2016-03-04 16:27:02 -08:00
Benoit Steiner
4e49fd5eb9
MSVC uses __uint128 while other compilers use __uint128_t to encode 128bit unsigned integers. Make the cxx11_tensor_uint128.cpp test work in both cases.
2016-03-04 14:49:18 -08:00
Benoit Steiner
667fcc2b53
Fixed syntax error
2016-03-04 14:37:51 -08:00
Benoit Steiner
4416a5dcff
Added missing include
2016-03-04 14:35:43 -08:00
Benoit Steiner
c561eeb7bf
Don't use implicit type conversions in initializer lists since not all compilers support them.
2016-03-04 14:12:45 -08:00
Benoit Steiner
174edf976b
Made the contraction test more portable
2016-03-04 14:11:13 -08:00
Benoit Steiner
2c50fc878e
Fixed a typo
2016-03-04 14:09:38 -08:00
Benoit Steiner
deea866bbd
Added tests to cover the new rounding, flooring and ceiling tensor operations.
2016-03-03 12:38:02 -08:00
Benoit Steiner
5cf4558c0a
Added support for rounding, flooring, and ceiling to the tensor api
2016-03-03 12:36:55 -08:00
Benoit Steiner
dac58d7c35
Added a test to validate the conversion of half floats into floats on Kepler GPUs.
...
Restricted the testing of the random number generation code to GPU architecture greater than or equal to 3.5.
2016-03-03 10:37:25 -08:00
Benoit Steiner
68ac5c1738
Improved the performance of large outer reductions on cuda
2016-02-29 18:11:58 -08:00
Benoit Steiner
b2075cb7a2
Made the signature of the inner and outer reducers consistent
2016-02-29 10:53:38 -08:00
Benoit Steiner
3284842045
Optimized the performance of narrow reductions on CUDA devices
2016-02-29 10:48:16 -08:00
Benoit Steiner
609b3337a7
Print some information to stderr when a CUDA kernel fails
2016-02-27 20:42:57 +00:00
Benoit Steiner
ac2e6e0d03
Properly vectorized the random number generators
2016-02-26 13:52:24 -08:00
Benoit Steiner
caa54d888f
Made the TensorIndexList usable on GPU without having to use the -relaxed-constexpr compilation flag
2016-02-26 12:38:18 -08:00
Benoit Steiner
2cd32cad27
Reverted previous commit since it caused more problems than it solved
2016-02-26 13:21:44 +00:00
Benoit Steiner
d9d05dd96e
Fixed handling of long doubles on aarch64
2016-02-26 04:13:58 -08:00
Benoit Steiner
af199b4658
Made the CUDA architecture level a build setting.
2016-02-25 09:06:18 -08:00
Benoit Steiner
c36c09169e
Fixed a typo in the reduction code that could prevent large full reductionsx from running properly on old cuda devices.
2016-02-24 17:07:25 -08:00
Benoit Steiner
7a01cb8e4b
Marked the And and Or reducers as stateless.
2016-02-24 16:43:01 -08:00
Benoit Steiner
1d9256f7db
Updated the padding code to work with half floats
2016-02-23 05:51:22 +00:00
Benoit Steiner
72d2cf642e
Deleted the coordinate based evaluation of tensor expressions, since it's hardly ever used and started to cause some issues with some versions of xcode.
2016-02-22 15:29:41 -08:00
Benoit Steiner
5cd00068c0
include <iostream> in the tensor header since we now use it to better report cuda initialization errors
2016-02-22 13:59:03 -08:00
Benoit Steiner
257b640463
Fixed compilation warning generated by clang
2016-02-21 22:43:37 -08:00
Benoit Steiner
e644f60907
Pulled latest updates from trunk
2016-02-21 20:24:59 +00:00
Benoit Steiner
95fceb6452
Added the ability to compute the absolute value of a half float
2016-02-21 20:24:11 +00:00
Benoit Steiner
ed69cbeef0
Added some debugging information to the test to figure out why it fails sometimes
2016-02-21 11:20:20 -08:00
Benoit Steiner
96a24b05cc
Optimized casting of tensors in the case where the casting happens to be a no-op
2016-02-21 11:16:15 -08:00
Benoit Steiner
203490017f
Prevent unecessary Index to int conversions
2016-02-21 08:49:36 -08:00
Benoit Steiner
1e6fe6f046
Fixed the float16 tensor test.
2016-02-20 07:44:17 +00:00
Rasmus Munk Larsen
8eb127022b
Get rid of duplicate code.
2016-02-19 16:33:30 -08:00
Rasmus Munk Larsen
d5e2ec7447
Speed up tensor FFT by up ~25-50%.
...
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_tensor_fft_single_1D_cpu/8 132 134 -1.5%
BM_tensor_fft_single_1D_cpu/9 1162 1229 -5.8%
BM_tensor_fft_single_1D_cpu/16 199 195 +2.0%
BM_tensor_fft_single_1D_cpu/17 2587 2267 +12.4%
BM_tensor_fft_single_1D_cpu/32 373 341 +8.6%
BM_tensor_fft_single_1D_cpu/33 5922 4879 +17.6%
BM_tensor_fft_single_1D_cpu/64 797 675 +15.3%
BM_tensor_fft_single_1D_cpu/65 13580 10481 +22.8%
BM_tensor_fft_single_1D_cpu/128 1753 1375 +21.6%
BM_tensor_fft_single_1D_cpu/129 31426 22789 +27.5%
BM_tensor_fft_single_1D_cpu/256 4005 3008 +24.9%
BM_tensor_fft_single_1D_cpu/257 70910 49549 +30.1%
BM_tensor_fft_single_1D_cpu/512 8989 6524 +27.4%
BM_tensor_fft_single_1D_cpu/513 165402 107751 +34.9%
BM_tensor_fft_single_1D_cpu/999 198293 115909 +41.5%
BM_tensor_fft_single_1D_cpu/1ki 21289 14143 +33.6%
BM_tensor_fft_single_1D_cpu/1k 361980 233355 +35.5%
BM_tensor_fft_double_1D_cpu/8 138 131 +5.1%
BM_tensor_fft_double_1D_cpu/9 1253 1133 +9.6%
BM_tensor_fft_double_1D_cpu/16 218 200 +8.3%
BM_tensor_fft_double_1D_cpu/17 2770 2392 +13.6%
BM_tensor_fft_double_1D_cpu/32 406 368 +9.4%
BM_tensor_fft_double_1D_cpu/33 6418 5153 +19.7%
BM_tensor_fft_double_1D_cpu/64 856 728 +15.0%
BM_tensor_fft_double_1D_cpu/65 14666 11148 +24.0%
BM_tensor_fft_double_1D_cpu/128 1913 1502 +21.5%
BM_tensor_fft_double_1D_cpu/129 36414 24072 +33.9%
BM_tensor_fft_double_1D_cpu/256 4226 3216 +23.9%
BM_tensor_fft_double_1D_cpu/257 86638 52059 +39.9%
BM_tensor_fft_double_1D_cpu/512 9397 6939 +26.2%
BM_tensor_fft_double_1D_cpu/513 203208 114090 +43.9%
BM_tensor_fft_double_1D_cpu/999 237841 125583 +47.2%
BM_tensor_fft_double_1D_cpu/1ki 20921 15392 +26.4%
BM_tensor_fft_double_1D_cpu/1k 455183 250763 +44.9%
BM_tensor_fft_single_2D_cpu/8 1051 1005 +4.4%
BM_tensor_fft_single_2D_cpu/9 16784 14837 +11.6%
BM_tensor_fft_single_2D_cpu/16 4074 3772 +7.4%
BM_tensor_fft_single_2D_cpu/17 75802 63884 +15.7%
BM_tensor_fft_single_2D_cpu/32 20580 16931 +17.7%
BM_tensor_fft_single_2D_cpu/33 345798 278579 +19.4%
BM_tensor_fft_single_2D_cpu/64 97548 81237 +16.7%
BM_tensor_fft_single_2D_cpu/65 1592701 1227048 +23.0%
BM_tensor_fft_single_2D_cpu/128 472318 384303 +18.6%
BM_tensor_fft_single_2D_cpu/129 7038351 5445308 +22.6%
BM_tensor_fft_single_2D_cpu/256 2309474 1850969 +19.9%
BM_tensor_fft_single_2D_cpu/257 31849182 23797538 +25.3%
BM_tensor_fft_single_2D_cpu/512 10395194 8077499 +22.3%
BM_tensor_fft_single_2D_cpu/513 144053843 104242541 +27.6%
BM_tensor_fft_single_2D_cpu/999 279885833 208389718 +25.5%
BM_tensor_fft_single_2D_cpu/1ki 45967677 36070985 +21.5%
BM_tensor_fft_single_2D_cpu/1k 619727095 456489500 +26.3%
BM_tensor_fft_double_2D_cpu/8 1110 1016 +8.5%
BM_tensor_fft_double_2D_cpu/9 17957 15768 +12.2%
BM_tensor_fft_double_2D_cpu/16 4558 4000 +12.2%
BM_tensor_fft_double_2D_cpu/17 79237 66901 +15.6%
BM_tensor_fft_double_2D_cpu/32 21494 17699 +17.7%
BM_tensor_fft_double_2D_cpu/33 357962 290357 +18.9%
BM_tensor_fft_double_2D_cpu/64 105179 87435 +16.9%
BM_tensor_fft_double_2D_cpu/65 1617143 1288006 +20.4%
BM_tensor_fft_double_2D_cpu/128 512848 419397 +18.2%
BM_tensor_fft_double_2D_cpu/129 7271322 5636884 +22.5%
BM_tensor_fft_double_2D_cpu/256 2415529 1922032 +20.4%
BM_tensor_fft_double_2D_cpu/257 32517952 24462177 +24.8%
BM_tensor_fft_double_2D_cpu/512 10724898 8287617 +22.7%
BM_tensor_fft_double_2D_cpu/513 146007419 108603266 +25.6%
BM_tensor_fft_double_2D_cpu/999 296351330 221885776 +25.1%
BM_tensor_fft_double_2D_cpu/1ki 59334166 48357539 +18.5%
BM_tensor_fft_double_2D_cpu/1k 666660132 483840349 +27.4%
2016-02-19 16:29:23 -08:00
Benoit Steiner
46fc23f91c
Print an error message to stderr when the initialization of the CUDA runtime fails. This helps debugging setup issues.
2016-02-19 13:44:22 -08:00
Benoit Steiner
670db7988d
Updated the contraction code to make it compatible with half floats.
2016-02-19 13:03:26 -08:00