156 Commits

Author SHA1 Message Date
Rasmus Munk Larsen
f9fac1d5b0 Add log2() to Eigen. 2020-12-04 21:45:09 +00:00
Rasmus Munk Larsen
f23dc5b971 Revert "Add log2() operator to Eigen"
This reverts commit 4d91519a9be061da5d300079fca17dd0b9328050.
2020-12-03 14:32:45 -08:00
Rasmus Munk Larsen
4d91519a9b Add log2() operator to Eigen 2020-12-03 22:31:44 +00:00
Antonio Sanchez
22f67b5958 Fix boolean float conversion and product warnings.
This fixes some gcc warnings such as:
```
Eigen/src/Core/GenericPacketMath.h:655:63: warning: implicit conversion turns floating-point number into bool: 'typename __gnu_cxx::__enable_if<__is_integer<bool>::__value, double>::__type' (aka 'double') to 'bool' [-Wimplicit-conversion-floating-point-to-bool]
    Packet psqrt(const Packet& a) { EIGEN_USING_STD(sqrt); return sqrt(a); }
```

Details:

- Added `scalar_sqrt_op<bool>` (`-Wimplicit-conversion-floating-point-to-bool`).

- Added `scalar_square_op<bool>` and `scalar_cube_op<bool>`
specializations (`-Wint-in-bool-context`)

- Deprecated above specialized ops for bool.

- Modified `cxx11_tensor_block_eval` to specialize generator for
booleans (`-Wint-in-bool-context`) and to use `abs` instead of `square` to
avoid deprecated bool ops.
2020-11-24 20:20:36 +00:00
Rasmus Munk Larsen
c6953f799b Add packet generic ops predux_fmin, predux_fmin_nan, predux_fmax, and predux_fmax_nan that implement reductions with PropagateNaN, and PropagateNumbers semantics. Add (slow) generic implementations for most reductions. 2020-10-13 21:48:31 +00:00
David Tellenbach
4091f6b25c Drop EIGEN_USING_STD_MATH in favour of EIGEN_USING_STD 2020-10-09 02:05:05 +02:00
Rasmus Munk Larsen
b431024404 Don't make assumptions about NaN-propagation for pmin/pmax - it various across platforms.
Change test to only test for NaN-propagation for pfmin/pfmax.
2020-10-07 19:05:18 +00:00
Georg Jäger
1b1082334b adding attributes to constructors to support hip-clang on ROCm 3.5 2020-08-20 16:48:11 +02:00
Rasmus Munk Larsen
6964ae8d52 Change the sign operator in Eigen to return NaN for NaN arguments, not zero. 2020-07-07 01:54:04 +00:00
Sheng Yang
116c5235ac BF16 for scalar_cmp_with_cast_op 2020-07-01 18:33:42 +00:00
ShengYang1
b5d66b5e73 Implement scalar_cmp_with_cast_op 2020-06-09 08:12:07 +08:00
Rasmus Munk Larsen
daf9bbeca2 Fix compilation error in logistic packet op. 2020-06-03 00:57:41 +00:00
Gael Guennebaud
029a76e115 Bug #1777: make the scalar and packet path consistent for the logistic function + respective unit test 2020-05-31 00:53:37 +02:00
Rasmus Munk Larsen
c1d944dd91 Remove packet ops pinsertfirst and pinsertlast that are only used in a single place, and can be replaced by other ops when constructing the first/final packet in linspaced_op_impl::packetOp.
I cannot measure any performance changes for SSE, AVX, or AVX512.

name                                 old time/op             new time/op             delta
BM_LinSpace<float>/1                 1.63ns ± 0%             1.63ns ± 0%   ~             (p=0.762 n=5+5)
BM_LinSpace<float>/8                 4.92ns ± 3%             4.89ns ± 3%   ~             (p=0.421 n=5+5)
BM_LinSpace<float>/64                34.6ns ± 0%             34.6ns ± 0%   ~             (p=0.841 n=5+5)
BM_LinSpace<float>/512                217ns ± 0%              217ns ± 0%   ~             (p=0.421 n=5+5)
BM_LinSpace<float>/4k                1.68µs ± 0%             1.68µs ± 0%   ~             (p=1.000 n=5+5)
BM_LinSpace<float>/32k               13.3µs ± 0%             13.3µs ± 0%   ~             (p=0.905 n=5+4)
BM_LinSpace<float>/256k               107µs ± 0%              107µs ± 0%   ~             (p=0.841 n=5+5)
BM_LinSpace<float>/1M                 427µs ± 0%              427µs ± 0%   ~             (p=0.690 n=5+5)
2020-05-08 15:41:50 -07:00
Rasmus Munk Larsen
ab773c7e91 Extend support for Packet16b:
* Add ptranspose<*,4> to support matmul and add unit test for Matrix<bool> * Matrix<bool>
* work around a bug in slicing of Tensor<bool>.
* Add tensor tests

This speeds up matmul for boolean matrices by about 10x

name                            old time/op             new time/op             delta
BM_MatMul<bool>/8                267ns ± 0%              479ns ± 0%  +79.25%          (p=0.008 n=5+5)
BM_MatMul<bool>/32              6.42µs ± 0%             0.87µs ± 0%  -86.50%          (p=0.008 n=5+5)
BM_MatMul<bool>/64              43.3µs ± 0%              5.9µs ± 0%  -86.42%          (p=0.008 n=5+5)
BM_MatMul<bool>/128              315µs ± 0%               44µs ± 0%  -85.98%          (p=0.008 n=5+5)
BM_MatMul<bool>/256             2.41ms ± 0%             0.34ms ± 0%  -85.68%          (p=0.008 n=5+5)
BM_MatMul<bool>/512             18.8ms ± 0%              2.7ms ± 0%  -85.53%          (p=0.008 n=5+5)
BM_MatMul<bool>/1k               149ms ± 0%               22ms ± 0%  -85.40%          (p=0.008 n=5+5)
2020-04-28 16:12:47 +00:00
Rasmus Munk Larsen
2f6ddaa25c Add partial vectorization for matrices and tensors of bool. This speeds up boolean operations on Tensors by up to 25x.
Benchmark numbers for the logical and of two NxN tensors:

name                                               old time/op             new time/op             delta
BM_booleanAnd_1T/3   [using 1 threads]             14.6ns ± 0%             14.4ns ± 0%   -0.96%
BM_booleanAnd_1T/4   [using 1 threads]             20.5ns ±12%              9.0ns ± 0%  -56.07%
BM_booleanAnd_1T/7   [using 1 threads]             41.7ns ± 0%             10.5ns ± 0%  -74.87%
BM_booleanAnd_1T/8   [using 1 threads]             52.1ns ± 0%             10.1ns ± 0%  -80.59%
BM_booleanAnd_1T/10  [using 1 threads]             76.3ns ± 0%             13.8ns ± 0%  -81.87%
BM_booleanAnd_1T/15  [using 1 threads]              167ns ± 0%               16ns ± 0%  -90.45%
BM_booleanAnd_1T/16  [using 1 threads]              188ns ± 0%               16ns ± 0%  -91.57%
BM_booleanAnd_1T/31  [using 1 threads]              667ns ± 0%               34ns ± 0%  -94.83%
BM_booleanAnd_1T/32  [using 1 threads]              710ns ± 0%               35ns ± 0%  -95.01%
BM_booleanAnd_1T/64  [using 1 threads]             2.80µs ± 0%             0.11µs ± 0%  -95.93%
BM_booleanAnd_1T/128 [using 1 threads]             11.2µs ± 0%              0.4µs ± 0%  -96.11%
BM_booleanAnd_1T/256 [using 1 threads]             44.6µs ± 0%              2.5µs ± 0%  -94.31%
BM_booleanAnd_1T/512 [using 1 threads]              178µs ± 0%               10µs ± 0%  -94.35%
BM_booleanAnd_1T/1k  [using 1 threads]              717µs ± 0%               78µs ± 1%  -89.07%
BM_booleanAnd_1T/2k  [using 1 threads]             2.87ms ± 0%             0.31ms ± 1%  -89.08%
BM_booleanAnd_1T/4k  [using 1 threads]             11.7ms ± 0%              1.9ms ± 4%  -83.55%
BM_booleanAnd_1T/10k [using 1 threads]             70.3ms ± 0%             17.2ms ± 4%  -75.48%
2020-04-20 20:16:28 +00:00
Rasmus Munk Larsen
4da2c6b197 Remove reference to non-existent unary_op_base class. 2020-03-19 18:23:06 +00:00
Rasmus Munk Larsen
eda90baf35 Add missing arguments to numext::absdiff(). 2020-03-19 18:16:55 +00:00
Joel Holdsworth
d5c665742b Add absolute_difference coefficient-wise binary Array function 2020-03-19 17:45:20 +00:00
Joel Holdsworth
232f904082 Add shift_left<N> and shift_right<N> coefficient-wise unary Array functions 2020-03-19 17:24:06 +00:00
Allan Leal
37ccb86916 Update NullaryFunctors.h 2020-03-16 11:59:02 +00:00
Rasmus Munk Larsen
e6fcee995b Don't use the rational approximation to the logistic function on GPUs as it appears to be slightly slower. 2020-01-09 00:04:26 +00:00
Rasmus Munk Larsen
4217a9f090 The upper limits for where to use the rational approximation to the logistic function were not set carefully enough in the original commit, and some arguments would cause the function to return values greater than 1. This change set the versions found by scanning all floating point numbers (using std::nextafterf()). 2020-01-08 22:21:37 +00:00
Ilya Tokar
19876ced76 Bug #1785: Introduce numext::rint.
This provides a new op that matches std::rint and previous behavior of
pround. Also adds corresponding unsupported/../Tensor op.
Performance is the same as e. g. floor (tested SSE/AVX).
2020-01-07 21:22:44 +00:00
Rasmus Munk Larsen
a566074480 Improve accuracy of fast approximate tanh and the logistic functions in Eigen, such that they preserve relative accuracy to within a few ULPs where their function values tend to zero (around x=0 for tanh, and for large negative x for the logistic function).
This change re-instates the fast rational approximation of the logistic function for float32 in Eigen (removed in 66f07efeae), but uses the more accurate approximation 1/(1+exp(-1)) ~= exp(x) below -9. The exponential is only calculated on the vectorized path if at least one element in the SIMD input vector is less than -9.

This change also contains a few improvements to speed up the original float specialization of logistic:
  - Introduce EIGEN_PREDICT_{FALSE,TRUE} for __builtin_predict and use it to predict that the logistic-only path is most likely (~2-3% speedup for the common case).
  - Carefully set the upper clipping point to the smallest x where the approximation evaluates to exactly 1. This saves the explicit clamping of the output (~7% speedup).

The increased accuracy for tanh comes at a cost of 10-20% depending on instruction set.

The benchmarks below repeated calls

   u = v.logistic()  (u = v.tanh(), respectively)

where u and v are of type Eigen::ArrayXf, have length 8k, and v contains random numbers in [-1,1].

Benchmark numbers for logistic:

Before:
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
SSE
BM_eigen_logistic_float        4467           4468         155835  model_time: 4827
AVX
BM_eigen_logistic_float        2347           2347         299135  model_time: 2926
AVX+FMA
BM_eigen_logistic_float        1467           1467         476143  model_time: 2926
AVX512
BM_eigen_logistic_float         805            805         858696  model_time: 1463

After:
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
SSE
BM_eigen_logistic_float        2589           2590         270264  model_time: 4827
AVX
BM_eigen_logistic_float        1428           1428         489265  model_time: 2926
AVX+FMA
BM_eigen_logistic_float        1059           1059         662255  model_time: 2926
AVX512
BM_eigen_logistic_float         673            673        1000000  model_time: 1463

Benchmark numbers for tanh:

Before:
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
SSE
BM_eigen_tanh_float        2391           2391         292624  model_time: 4242
AVX
BM_eigen_tanh_float        1256           1256         554662  model_time: 2633
AVX+FMA
BM_eigen_tanh_float         823            823         866267  model_time: 1609
AVX512
BM_eigen_tanh_float         443            443        1578999  model_time: 805

After:
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
SSE
BM_eigen_tanh_float        2588           2588         273531  model_time: 4242
AVX
BM_eigen_tanh_float        1536           1536         452321  model_time: 2633
AVX+FMA
BM_eigen_tanh_float        1007           1007         694681  model_time: 1609
AVX512
BM_eigen_tanh_float         471            471        1472178  model_time: 805
2019-12-16 21:33:42 +00:00
Rasmus Munk Larsen
66f07efeae Revert the specialization for scalar_logistic_op<float> introduced in:
77b447c24e


While providing a 50% speedup on Haswell+ processors, the large relative error outside [-18, 18] in this approximation causes problems, e.g., when computing gradients of activation functions like softplus in neural networks.
2019-12-02 17:00:58 -08:00
Gael Guennebaud
87427d2eaa PR 719: fix real/imag namespace conflict 2019-10-08 09:15:17 +02:00
Mehdi Goli
16a56b2ddd [SYCL] This PR adds the minimum modifications to Eigen core required to run Eigen unsupported modules on devices supporting SYCL.
* Adding SYCL memory model
* Enabling/Disabling SYCL  backend in Core
*  Supporting Vectorization
2019-06-27 12:25:09 +01:00
Rasmus Munk Larsen
d3ef7cf03e Fix build with clang on Windows. 2019-05-09 11:07:04 -07:00
Christoph Hertzberg
cca76c272c Restore C++03 compatibility 2019-05-06 16:18:22 +02:00
Rasmus Munk Larsen
8e33844fc7 Fix traits for scalar_logistic_op. 2019-05-03 15:49:09 -07:00
Rasmus Munk Larsen
e42f9aa68a Make clipping outside [-18:18] consistent for vectorized and non-vectorized paths of scalar_logistic_<float>. 2019-03-15 17:15:14 -07:00
Gael Guennebaud
d7d2f0680e bug #1684: partially workaround clang's 6/7 bug #40815 2019-03-13 10:40:01 +01:00
Gael Guennebaud
7580112c31 Fix harmless Scalar vs RealScalar cast. 2019-02-18 22:12:28 +01:00
Gael Guennebaud
c69d0d08d0 Set cost of conjugate to 0 (in practice it boils down to a no-op).
This is also important to make sure that A.conjugate() * B.conjugate() does not evaluate
its arguments into temporaries (e.g., if A and B are fixed and small, or * fall back to lazyProduct)
2019-02-18 14:43:07 +01:00
Rasmus Munk Larsen
28ba1b2c32 Add support for inverse hyperbolic functions.
Fix cost of division.
2019-01-11 17:45:37 -08:00
Gael Guennebaud
48fe78c375 bug #1630: fix linspaced when requesting smaller packet size than default one. 2018-11-28 13:15:06 +01:00
Rasmus Munk Larsen
77b447c24e Add optimized version of logistic function for float. As an example, this is about 50% faster than the existing version on Haswell using AVX. 2018-11-12 13:42:24 -08:00
Rasmus Munk Larsen
d6e283ba96 sigmoid -> logistic 2018-08-13 11:14:50 -07:00
Rasmus Munk Larsen
fa68342ef8 Move sigmoid functor to core. 2018-08-03 17:31:23 -07:00
Deven Desai
b6cc0961b1 updates based on PR feedback
There are two major changes (and a few minor ones which are not listed here...see PR discussion for details)

1. Eigen::half implementations for HIP and CUDA have been merged.
This means that
- `CUDA/Half.h` and `HIP/hcc/Half.h` got merged to a new file `GPU/Half.h`
- `CUDA/PacketMathHalf.h` and `HIP/hcc/PacketMathHalf.h` got merged to a new file `GPU/PacketMathHalf.h`
- `CUDA/TypeCasting.h` and `HIP/hcc/TypeCasting.h` got merged to a new file `GPU/TypeCasting.h`

After this change the `HIP/hcc` directory only contains one file `math_constants.h`. That will go away too once that file becomes a part of the HIP install.

2. new macros EIGEN_GPUCC, EIGEN_GPU_COMPILE_PHASE and EIGEN_HAS_GPU_FP16 have been added and the code has been updated to use them where appropriate.
- `EIGEN_GPUCC` is the same as `(EIGEN_CUDACC || EIGEN_HIPCC)`
- `EIGEN_GPU_DEVICE_COMPILE` is the same as `(EIGEN_CUDA_ARCH || EIGEN_HIP_DEVICE_COMPILE)`
- `EIGEN_HAS_GPU_FP16` is the same as `(EIGEN_HAS_CUDA_FP16 or EIGEN_HAS_HIP_FP16)`
2018-06-14 10:21:54 -04:00
Deven Desai
8fbd47052b Adding support for using Eigen in HIP kernels.
This commit enables the use of Eigen on HIP kernels / AMD GPUs. Support has been added along the same lines as what already exists for using Eigen in CUDA kernels / NVidia GPUs.

Application code needs to explicitly define EIGEN_USE_HIP when using Eigen in HIP kernels. This is because some of the CUDA headers get picked up by default during Eigen compile (irrespective of whether or not the underlying compiler is CUDACC/NVCC, for e.g. Eigen/src/Core/arch/CUDA/Half.h). In order to maintain this behavior, the EIGEN_USE_HIP macro is used to switch to using the HIP version of those header files (see Eigen/Core and unsupported/Eigen/CXX11/Tensor)


Use the "-DEIGEN_TEST_HIP" cmake option to enable the HIP specific unit tests.
2018-06-06 10:12:58 -04:00
Gael Guennebaud
4213b63f5c Factories code between numext::hypot and scalar_hyot_op functor. 2018-04-04 15:12:43 +02:00
Gael Guennebaud
407e3e2621 bug #1532: disable stl::*_negate in C++17 (they are deprecated) 2018-04-03 15:59:30 +02:00
Gael Guennebaud
bbd97b4095 Add a EIGEN_NO_CUDA option, and introduce EIGEN_CUDACC and EIGEN_CUDA_ARCH aliases 2017-07-17 01:02:51 +02:00
Gael Guennebaud
8508db52ab bug #1417: make LinSpace compatible with std::complex 2017-06-06 17:25:56 +02:00
Gael Guennebaud
850ca961d2 bug #1383: fix regression in LinSpaced for integers and high<low 2017-01-25 18:13:53 +01:00
Gael Guennebaud
d06a48959a bug #1383: Fix regression from 3.2 with LinSpaced(n,0,n-1) with n==0. 2017-01-25 15:27:13 +01:00
Gael Guennebaud
ba3f977946 bug #1376: add missing assertion on size mismatch with compound assignment operators (e.g., mat += mat.col(j)) 2017-01-23 22:06:08 +01:00
Gael Guennebaud
e383d6159a MSVC 2015 has all we want about c++11 and MSVC 2017 fails on binder1st/binder2nd 2017-01-06 15:44:13 +01:00