3122 Commits

Author SHA1 Message Date
Deven Desai
1bb6fa99a3 merging the CUDA and HIP implementation for the Tensor directory and the unit tests 2018-06-20 16:44:58 -04:00
Deven Desai
cfdabbcc8f removing the *Hip files from the unsupported/Eigen/CXX11/src/Tensor and unsupported/test directories 2018-06-20 12:57:02 -04:00
Deven Desai
7e41c8f1a9 renaming *Cuda files to *Gpu in the unsupported/Eigen/CXX11/src/Tensor and unsupported/test directories 2018-06-20 12:52:30 -04:00
Deven Desai
b6cc0961b1 updates based on PR feedback
There are two major changes (and a few minor ones which are not listed here...see PR discussion for details)

1. Eigen::half implementations for HIP and CUDA have been merged.
This means that
- `CUDA/Half.h` and `HIP/hcc/Half.h` got merged to a new file `GPU/Half.h`
- `CUDA/PacketMathHalf.h` and `HIP/hcc/PacketMathHalf.h` got merged to a new file `GPU/PacketMathHalf.h`
- `CUDA/TypeCasting.h` and `HIP/hcc/TypeCasting.h` got merged to a new file `GPU/TypeCasting.h`

After this change the `HIP/hcc` directory only contains one file `math_constants.h`. That will go away too once that file becomes a part of the HIP install.

2. new macros EIGEN_GPUCC, EIGEN_GPU_COMPILE_PHASE and EIGEN_HAS_GPU_FP16 have been added and the code has been updated to use them where appropriate.
- `EIGEN_GPUCC` is the same as `(EIGEN_CUDACC || EIGEN_HIPCC)`
- `EIGEN_GPU_DEVICE_COMPILE` is the same as `(EIGEN_CUDA_ARCH || EIGEN_HIP_DEVICE_COMPILE)`
- `EIGEN_HAS_GPU_FP16` is the same as `(EIGEN_HAS_CUDA_FP16 or EIGEN_HAS_HIP_FP16)`
2018-06-14 10:21:54 -04:00
Deven Desai
d1d22ef0f4 syncing this fork with upstream 2018-06-13 12:09:52 -04:00
Benoit Steiner
d3a380af4d Merged in mfigurnov/eigen/gamma-der-a (pull request PR-403)
Derivative of the incomplete Gamma function and the sample of a Gamma random variable

Approved-by: Benoit Steiner <benoit.steiner.goog@gmail.com>
2018-06-11 17:57:47 +00:00
Gael Guennebaud
67ec37f7b0 Activate dgmres unit test 2018-07-02 12:54:14 +02:00
Jonathan Liu
b7689bded9 Use std::complex constructor instead of assignment from scalar
Fixes GCC conversion to non-scalar type requested compile error when
using boost::multiprecision::cpp_dec_float_50 as scalar type.
2018-06-28 00:32:37 +10:00
Rasmus Munk Larsen
5418154a45 Fix oversharding bug in parallelFor. 2018-06-20 17:51:48 -07:00
Gael Guennebaud
7933267c67 fix prototype 2018-06-08 09:56:01 +02:00
Michael Figurnov
30fa3d0454 Merge from eigen/eigen 2018-06-07 17:57:56 +01:00
Michael Figurnov
6c71c7d360 Merge from eigen/eigen. 2018-06-07 15:54:18 +01:00
Gael Guennebaud
37348d03ae Fix int versus Index 2018-06-07 15:56:43 +02:00
Michael Figurnov
aa813d417b Fix compilation of special functions without C99 math.
The commit with Bessel functions i0e and i1e placed the ifdef/endif incorrectly,
causing i0e/i1e to be undefined when EIGEN_HAS_C99_MATH=0. These functions do not
actually require C99 math, so now they are always available.
2018-06-07 14:35:07 +01:00
Gael Guennebaud
b3fd93207b Fix typos found using codespell 2018-06-07 14:43:02 +02:00
Michael Figurnov
5172a32849 Updated the stopping criteria in igammac_cf_impl.
Previously, when computing the derivative, it used a relative error threshold. Now it uses an absolute error threshold. The behavior for computing the value is unchanged. This makes more sense since we do not expect the derivative to often be close to zero. This change makes the derivatives about 30% faster across the board. The error for the igamma_der_a is almost unchanged, while for gamma_sample_der_alpha it is a bit worse for float32 and unchanged for float64.
2018-06-07 12:03:58 +01:00
Michael Figurnov
4bd158fa37 Derivative of the incomplete Gamma function and the sample of a Gamma random variable.
In addition to igamma(a, x), this code implements:
* igamma_der_a(a, x) = d igamma(a, x) / da -- derivative of igamma with respect to the parameter
* gamma_sample_der_alpha(alpha, sample) -- reparameterization derivative of a Gamma(alpha, 1) random variable sample with respect to the alpha parameter

The derivatives are computed by forward mode differentiation of the igamma(a, x) code. Although gamma_sample_der_alpha can be implemented via igamma_der_a, a separate function is more accurate and efficient due to analytical cancellation of some terms. All three functions are implemented by a method parameterized with "mode" that always computes the derivatives, but does not return them unless required by the mode. The compiler is expected to (and, based on benchmarks, does) skip the unnecessary computations depending on the mode.
2018-06-06 18:49:26 +01:00
Deven Desai
8fbd47052b Adding support for using Eigen in HIP kernels.
This commit enables the use of Eigen on HIP kernels / AMD GPUs. Support has been added along the same lines as what already exists for using Eigen in CUDA kernels / NVidia GPUs.

Application code needs to explicitly define EIGEN_USE_HIP when using Eigen in HIP kernels. This is because some of the CUDA headers get picked up by default during Eigen compile (irrespective of whether or not the underlying compiler is CUDACC/NVCC, for e.g. Eigen/src/Core/arch/CUDA/Half.h). In order to maintain this behavior, the EIGEN_USE_HIP macro is used to switch to using the HIP version of those header files (see Eigen/Core and unsupported/Eigen/CXX11/Tensor)


Use the "-DEIGEN_TEST_HIP" cmake option to enable the HIP specific unit tests.
2018-06-06 10:12:58 -04:00
Benoit Steiner
e206f8d4a4 Merged in mfigurnov/eigen (pull request PR-400)
Exponentially scaled modified Bessel functions of order zero and one.

Approved-by: Benoit Steiner <benoit.steiner.goog@gmail.com>
2018-06-05 17:05:21 +00:00
Penporn Koanantakool
e2ed0cf8ab Add a ThreadPoolInterface* getter for ThreadPoolDevice. 2018-06-02 12:07:49 -07:00
Michael Figurnov
f216854453 Exponentially scaled modified Bessel functions of order zero and one.
The functions are conventionally called i0e and i1e. The exponentially scaled version is more numerically stable. The standard Bessel functions can be obtained as i0(x) = exp(|x|) i0e(x)

The code is ported from Cephes and tested against SciPy.
2018-05-31 15:34:53 +01:00
Katrin Leinweber
ea94543190 Hyperlink DOIs against preferred resolver 2018-05-24 18:55:40 +02:00
Vamsi Sripathi
6293ad3f39 Performance improvements to tensor broadcast operation
1. Added new packet functions using SIMD for NByOne, OneByN cases
  2. Modified existing packet functions to reduce index calculations when input stride is non-SIMD
  3. Added 4 test cases to cover the new packet functions
2018-05-23 14:02:05 -07:00
Benoit Steiner
0371380d5b Merged in rmlarsen/eigen2 (pull request PR-393)
Rename scalar_clip_op to scalar_clamp_op to prevent collision with existing functor in TensorFlow.
2018-05-16 21:45:42 +00:00
Rasmus Munk Larsen
b8d36774fa Rename clip2 to clamp. 2018-05-16 14:04:48 -07:00
Rasmus Munk Larsen
812480baa3 Rename scalar_clip_op to scalar_clip2_op to prevent collision with existing functor in TensorFlow. 2018-05-16 09:49:24 -07:00
Benoit Steiner
1403c2c15b Merged in didierjansen/eigen (pull request PR-360)
Fix bugs and typos in the contraction example of the tensor README
2018-05-16 01:16:36 +00:00
Rasmus Munk Larsen
afec3021f7 Use numext::maxi & numext::mini. 2018-05-14 16:35:39 -07:00
Rasmus Munk Larsen
b8c8e5f436 Add vectorized clip functor for Eigen Tensors. 2018-05-14 16:07:13 -07:00
Benoit Steiner
6118c6ff4f Enable RawAccess to tensor slices whenever possinle.
Avoid 32-bit integer overflow in TensorSlicingOp
2018-04-30 11:28:12 -07:00
Gael Guennebaud
2f3287da7d Fix "used uninitialized" warnings 2018-04-24 17:17:25 +02:00
Gael Guennebaud
3ffd449ef5 Workaround warning 2018-04-24 17:11:51 +02:00
Christoph Hertzberg
84dcd998a9 Recent Adolc versions require C++11 2018-04-13 19:10:23 +02:00
Weiming Zhao
b0eda3cb9f Avoid using memcpy for non-POD elements 2018-04-11 11:37:06 +02:00
Gael Guennebaud
67bac6368c protect calls to isnan 2018-04-03 14:19:04 +02:00
Gael Guennebaud
524119d32a Fix uninitialized output argument. 2018-04-03 10:56:10 +02:00
Viktor Csomor
000840cae0 Added a move constructor and move assignment operator to Tensor and wrote some tests. 2018-02-07 19:10:54 +01:00
Eugene Zhulenev
c95aacab90 Fix TensorContractionOp evaluators for GPU and SYCL 2018-07-17 14:09:37 -07:00
Deven Desai
f124f07965 applying EIGEN_DECLARE_TEST to *gpu* tests
Also, a few minor fixes for GPU tests running in HIP mode.

1. Adding an include for hip/hip_runtime.h in the Macros.h file
   For HIP __host__ and __device__ are macros which are defined in hip headers.
   Their definitions need to be included before their use in the file.

2. Fixing the compile failure in TensorContractionGpu introduced by the commit to
   "Fuse computations into the Tensor contractions using output kernel"

3. Fixing a HIP/clang specific compile error by making the struct-member assignment explicit
2018-07-17 14:16:48 -04:00
Gael Guennebaud
82f0ce2726 Get rid of EIGEN_TEST_FUNC, unit tests must now be declared with EIGEN_DECLARE_TEST(mytest) { /* code */ }.
This provide several advantages:
- more flexibility in designing unit tests
- unit tests can be glued to speed up compilation
- unit tests are compiled with same predefined macros, which is a requirement for zapcc
2018-07-17 14:46:15 +02:00
Eugene Zhulenev
43206ac4de Call OutputKernel in evalGemv 2018-07-12 14:52:23 -07:00
Eugene Zhulenev
e204ecdaaf Remove SimpleThreadPool and always use {NonBlocking}ThreadPool 2018-07-16 15:06:57 -07:00
Eugene Zhulenev
01fd4096d3 Fuse computations into the Tensor contractions using output kernel 2018-07-10 13:16:38 -07:00
Gael Guennebaud
5539587b1f Some warning fixes 2018-07-17 10:29:12 +02:00
Benoit Steiner
8f55956a57 Update the padding computation for PADDING_SAME to be consistent with TensorFlow. 2018-01-30 20:22:12 +00:00
Lee.Deokjae
5b3c367926 Fix typos in the contraction example of tensor README 2018-01-06 14:36:19 +09:00
RJ Ryan
59985cfd26 Disable use of recurrence for computing twiddle factors. Fixes FFT precision issues for large FFTs. https://github.com/tensorflow/tensorflow/issues/10749#issuecomment-354557689 2017-12-31 10:44:56 -05:00
Gael Guennebaud
73214c4bd0 Workaround nvcc 9.0 issue. See PR 351.
https://bitbucket.org/eigen/eigen/pull-requests/351
2017-12-15 14:10:59 +01:00
Yangzihao Wang
3122477c86 Update the padding computation for PADDING_SAME to be consistent with TensorFlow. 2017-12-12 11:15:24 -08:00
Rasmus Munk Larsen
e900b010c8 Improve robustness of igamma and igammac to bad inputs.
Check for nan inputs and propagate them immediately. Limit the number of internal iterations to 2000 (same number as used by scipy.special.gammainc). This prevents an infinite loop when the function is called with nan or very large arguments.

Original change by mfirgunov@google.com
2018-03-19 09:04:54 -07:00