Deven Desai
471cfe5ff7
renaming CUDA* to GPU* for some header files
2018-07-11 09:22:04 -04:00
Deven Desai
ba972fb6b4
moving Half headers from CUDA dir to GPU dir, removing the HIP versions
2018-06-13 12:26:18 -04:00
Michael Figurnov
4bd158fa37
Derivative of the incomplete Gamma function and the sample of a Gamma random variable.
...
In addition to igamma(a, x), this code implements:
* igamma_der_a(a, x) = d igamma(a, x) / da -- derivative of igamma with respect to the parameter
* gamma_sample_der_alpha(alpha, sample) -- reparameterization derivative of a Gamma(alpha, 1) random variable sample with respect to the alpha parameter
The derivatives are computed by forward mode differentiation of the igamma(a, x) code. Although gamma_sample_der_alpha can be implemented via igamma_der_a, a separate function is more accurate and efficient due to analytical cancellation of some terms. All three functions are implemented by a method parameterized with "mode" that always computes the derivatives, but does not return them unless required by the mode. The compiler is expected to (and, based on benchmarks, does) skip the unnecessary computations depending on the mode.
2018-06-06 18:49:26 +01:00
Michael Figurnov
f216854453
Exponentially scaled modified Bessel functions of order zero and one.
...
The functions are conventionally called i0e and i1e. The exponentially scaled version is more numerically stable. The standard Bessel functions can be obtained as i0(x) = exp(|x|) i0e(x)
The code is ported from Cephes and tested against SciPy.
2018-05-31 15:34:53 +01:00
Gael Guennebaud
e43ca0320d
bug #1520 : workaround some -Wfloat-equal warnings by calling std::equal_to
2018-04-11 15:24:13 +02:00
Daniel Trebbien
0c57be407d
Move up the specialization of std::numeric_limits
...
This fixes a compilation error seen when building TensorFlow on macOS:
https://github.com/tensorflow/tensorflow/issues/17067
2018-02-18 15:35:45 -08:00
nluehr
aefd5fd5c4
Replace __float2half_rn with __float2half
...
The latter provides a consistent definition for CUDA 8.0 and 9.0.
2017-11-28 10:15:46 -08:00
nluehr
dd6de618c3
Fix incorrect integer cast in predux<half2>().
...
Bug corrupts results on Maxwell and earlier GPU architectures.
2017-11-21 10:47:00 -08:00
Henry Schreiner
9bb26eb8f1
Restore __device__
2017-10-21 00:50:38 +00:00
Henry Schreiner
4245475d22
Fixing missing inlines on device functions for newer CUDA cards
2017-10-20 03:20:13 +00:00
Gael Guennebaud
9c353dd145
Add C++11 max_digits10 for half.
2017-09-06 10:22:47 +02:00
Benoit Steiner
a4089991eb
Added support for CUDA 9.0.
2017-08-31 02:49:39 +00:00
Gael Guennebaud
21633e585b
bug #1462 : remove all occurences of the deprecated __CUDACC_VER__ macro by introducing EIGEN_CUDACC_VER
2017-08-24 11:06:47 +02:00
Gael Guennebaud
bbd97b4095
Add a EIGEN_NO_CUDA option, and introduce EIGEN_CUDACC and EIGEN_CUDA_ARCH aliases
2017-07-17 01:02:51 +02:00
Benoit Steiner
3baef62b9a
Added missing __device__ qualifier
2017-06-13 12:56:55 -07:00
Benoit Steiner
449936828c
Added missing __device__ qualifier
2017-06-13 12:54:57 -07:00
Gael Guennebaud
26f552c18d
fix compilation of Half in C++98 (issue introduced in previous commit)
2017-06-09 13:36:58 +02:00
Gael Guennebaud
d588822779
Add missing std::numeric_limits specialization for half, and complete NumTraits<half>
2017-06-09 11:51:53 +02:00
Abhijit Kundu
9bc0a35731
Fixed nested angle barckets >> issue when compiling with cuda 8
2017-04-27 03:09:03 -04:00
Gael Guennebaud
e958c2baac
remove UTF8 symbols
2017-03-07 10:47:40 +01:00
Benoit Steiner
7b61944669
Made most of the packet math primitives usable within CUDA kernel when compiling with clang
2017-02-28 17:05:28 -08:00
Benoit Steiner
34d9fce93b
Avoid unecessary float to double conversions.
2017-02-27 16:33:33 -08:00
Srinivas Vasudevan
f7d7c33a28
Fix expm1 CUDA implementation (do not shadow exp CUDA implementation).
2016-12-05 12:19:01 -08:00
Srinivas Vasudevan
09ee7f0c80
Fix small nit where I changed name of plog1p to pexpm1.
2016-12-02 15:30:12 -08:00
Srinivas Vasudevan
218764ee1f
Added support for expm1 in Eigen.
2016-12-02 14:13:01 -08:00
Rasmus Munk Larsen
a0329f64fb
Add a default constructor for the "fake" __half class when not using the
...
__half class provided by CUDA.
2016-11-29 13:18:09 -08:00
Benoit Steiner
dff9a049c4
Optimized the computation of exp, sqrt, ceil anf floor for fp16 on Pascal GPUs
2016-11-16 09:01:51 -08:00
Benoit Steiner
c80587c92b
Merged eigen/eigen into default
2016-11-03 03:55:11 -07:00
Benoit Steiner
7a0e96b80d
Gate the code that refers to cuda fp16 primitives more thoroughly
2016-11-01 12:08:09 -07:00
Benoit Steiner
38b6048e14
Deleted redundant implementation of predux
2016-10-12 14:37:56 -07:00
Benoit Steiner
78d2926508
Merged eigen/eigen into default
2016-10-12 13:46:29 -07:00
Benoit Steiner
2e2f48e30e
Take advantage of AVX512 instructions whenever possible to speedup the processing of 16 bit floats.
2016-10-12 13:45:39 -07:00
Benoit Steiner
d485d12c51
Added missing AVX intrinsics for fp16: in particular, implemented predux which is required by the matrix-vector code.
2016-10-06 10:41:03 -07:00
Benoit Steiner
698ff69450
Properly characterize the CUDA packet primitives for fp16 as device only
2016-10-04 16:53:30 -07:00
Benoit Steiner
409e887d78
Added support for constand std::complex numbers on GPU
2016-10-03 11:06:24 -07:00
Benoit Steiner
26f9907542
Added missing typedefs
2016-09-20 12:58:03 -07:00
RJ Ryan
b2c6dc48d9
Add CUDA-specific std::complex<T> specializations for scalar_sum_op, scalar_difference_op, scalar_product_op, and scalar_quotient_op.
2016-09-20 07:18:20 -07:00
Gael Guennebaud
8f4b4ad5fb
use ::hlog if available.
2016-08-29 11:05:32 +02:00
Gael Guennebaud
35a8e94577
bug #1167 : simplify installation of header files using cmake's install(DIRECTORY ...) command.
2016-08-29 10:59:37 +02:00
Gael Guennebaud
d937a420a2
Fix compilation with MSVC by using our portable numext::log1p implementation.
2016-08-22 15:44:21 +02:00
Igor Babuschkin
59bacfe520
Fix compilation on CUDA 8 by removing call to h2log1p
2016-08-15 23:38:05 +01:00
Igor Babuschkin
aee693ac52
Add log1p support for CUDA and half floats
2016-08-08 20:24:59 +01:00
Benoit Steiner
fe778427f2
Fixed the constructors of the new half_base class.
2016-08-04 18:32:26 -07:00
Benoit Steiner
9506343349
Fixed the isnan, isfinite and isinf operations on GPU
2016-08-04 17:25:53 -07:00
Gael Guennebaud
17b9a55d98
Move Eigen::half_impl::half to Eigen::half while preserving the free functions to the Eigen::half_impl namespace together with ADL
2016-08-04 00:00:43 +02:00
Benoit Steiner
02fe89f5ef
half implementation has been moved to half_impl namespace
2016-07-29 15:09:34 -07:00
Christoph Hertzberg
c5b893f434
bug #1266 : half implementation has been moved to half_impl namespace
2016-07-29 18:36:08 +02:00
Gael Guennebaud
395c835f4b
Fix CUDA compilation
2016-07-22 15:30:24 +02:00
Gael Guennebaud
47afc9a365
More cleaning in half:
...
- put its definition and functions in its own half_impl namespace such that the free function does not polute the Eigen namespace while still making them visible for half through ADL.
- expose Eigen::half throguh a using statement
- move operator<< from std to half_float namespace
2016-07-22 14:33:28 +02:00
Gael Guennebaud
0f350a8b7e
Fix CUDA compilation
2016-07-21 18:47:07 +02:00