Rasmus Munk Larsen
e8a2aa24a2
Fix a couple of issues with unary pow():
2022-09-09 17:21:11 +00:00
Tobias Schlüter
133498c329
Add constexpr, test for C++14 constexpr.
2022-09-07 03:42:34 +00:00
Antonio Sánchez
30c42222a6
Fix some test build errors in new unary pow.
2022-08-30 17:24:14 +00:00
Charles Schlosser
e5af9f87f2
Vectorize pow for integer base / exponent types
2022-08-29 19:23:54 +00:00
chuckyschluz
8acbf5c11c
re-enable pow for complex types
2022-08-26 17:29:02 -04:00
Charles Schlosser
76a669fb45
add fixed power unary operation
2022-08-16 21:32:36 +00:00
Rasmus Munk Larsen
97e0784dc6
Vectorize the sign operator in Eigen.
2022-08-09 19:54:57 +00:00
Tobias Schlüter
f3ba220c5d
Remove EIGEN_EMPTY_STRUCT_CTOR
2022-04-08 18:27:26 +00:00
Rasmus Munk Larsen
ea2c02060c
Add reciprocal packet op and fast specializations for float with SSE, AVX, and AVX512.
2022-01-21 23:49:18 +00:00
Rasmus Munk Larsen
a30ecb7221
Don't use the fast implementation if EIGEN_GPU_CC, since integer_packet is not defined for float4 used by the GPU compiler (even on host).
2022-01-12 20:16:16 +00:00
Rasmus Munk Larsen
0b58738938
Fix two corner cases in the new implementation of logistic sigmoid.
2022-01-12 00:41:29 +00:00
Rasmus Munk Larsen
80ccacc717
Fix accuracy of logistic sigmoid
2022-01-08 00:15:14 +00:00
Rasmus Munk Larsen
8b8125c574
Make sure the scalar and vectorized path for array.exp() return consistent values.
2022-01-07 23:31:35 +00:00
Erik Schultheis
ec2fd0f7ed
Require recent GCC and MSCV and removed EIGEN_HAS_CXX14
and some other feature test macros
2021-12-01 00:48:34 +00:00
Erik Schultheis
f33a31b823
removed EIGEN_HAS_CXX11_* and redundant EIGEN_COMP_CXXVER checks
2021-11-29 19:18:57 +00:00
sciencewhiz
4b6036e276
fix various typos
2021-09-22 16:15:06 +00:00
Rasmus Munk Larsen
d7d0bf832d
Issue an error in case of direct inclusion of internal headers.
2021-09-10 19:12:26 +00:00
Antonio Sanchez
7880f10526
Enable equality comparisons on GPU.
...
Since `std::equal_to::operator()` is not a device function, it
fails on GPU. On my device, I seem to get a silent crash in the
kernel (no reported error, but the kernel does not complete).
Replacing this with a portable version enables comparisons on device.
Addresses #2292 - would need to be cherry-picked. The 3.3 branch
also requires adding `EIGEN_DEVICE_FUNC` in `BooleanRedux.h` to get
fully working.
2021-08-03 01:53:31 +00:00
Antonio Sanchez
de2e62c62d
Disable vectorization of comparisons except for bool.
...
Packet input/output types must currently be the same, and since these
have a return type of `bool`, vectorization will only work if
input is bool.
2021-07-25 13:39:50 -07:00
derekjchow
66ca41bd47
Add support for vectorizing logical comparisons.
2021-07-23 20:07:48 +00:00
Rasmus Munk Larsen
f64b2954c7
Fix c++20 warnings about using enums in arithmetic expressions.
2021-06-10 17:17:39 -07:00
Nathan Luehr
6753f0f197
Fix ambiguity due to argument dependent lookup.
2021-05-11 15:41:11 -05:00
Antonio Sanchez
2468253c9a
Define EIGEN_CPLUSPLUS and replace most __cplusplus checks.
...
The macro `__cplusplus` is not defined correctly in MSVC unless building
with the the `/Zc:__cplusplus` flag. Instead, it defines `_MSVC_LANG` to the
specified c++ standard version number.
Here we introduce `EIGEN_CPLUSPLUS` which will contain the c++ version
number both for MSVC and otherwise. This simplifies checks for supported
features.
Also replaced most instances of standard version checking via `__cplusplus`
with the existing `EIGEN_COMP_CXXVER` macro for better clarity.
Fixes : #2170
2021-03-05 18:33:18 +00:00
Rasmus Munk Larsen
113e61f364
Remove unused function scalar_cmp_with_cast.
2021-02-24 23:59:35 +00:00
Antonio Sanchez
abcde69a79
Disable vectorized pow for half/bfloat16.
...
We are potentially seeing some accuracy issues with these. Ideally we
would hand off to `float`, but that's not trivial with the current
setup.
We may want to consider adding `ppow<Packet>` and `HasPow`, so
implementations can more easily specialize this.
2021-02-05 12:17:34 -08:00
Antonio Sanchez
f0e46ed5d4
Fix pow and other cwise ops for half/bfloat16.
...
The new `generic_pow` implementation was failing for half/bfloat16 since
their construction from int/float is not `constexpr`. Modified
in `GenericPacketMathFunctions` to remove `constexpr`.
While adding tests for half/bfloat16, found other issues related to
implicit conversions.
Also needed to implement `numext::arg` for non-integer, non-complex,
non-float/double/long double types. These seem to be implicitly
converted to `std::complex<T>`, which then fails for half/bfloat16.
2021-01-22 11:10:54 -08:00
Rasmus Munk Larsen
cdd8fdc32e
Vectorize pow(x, y)
. This closes https://gitlab.com/libeigen/eigen/-/issues/2085 , which also contains a description of the algorithm.
...
I ran some testing (comparing to `std::pow(double(x), double(y)))` for `x` in the set of all (positive) floats in the interval `[std::sqrt(std::numeric_limits<float>::min()), std::sqrt(std::numeric_limits<float>::max())]`, and `y` in `{2, sqrt(2), -sqrt(2)}` I get the following error statistics:
```
max_rel_error = 8.34405e-07
rms_rel_error = 2.76654e-07
```
If I widen the range to all normal float I see lower accuracy for arguments where the result is subnormal, e.g. for `y = sqrt(2)`:
```
max_rel_error = 0.666667
rms = 6.8727e-05
count = 1335165689
argmax = 2.56049e-32, 2.10195e-45 != 1.4013e-45
```
which seems reasonable, since these results are subnormals with only couple of significant bits left.
2021-01-18 13:25:16 +00:00
Antonio Sanchez
bde6741641
Improved std::complex sqrt and rsqrt.
...
Replaces `std::sqrt` with `complex_sqrt` for all platforms (previously
`complex_sqrt` was only used for CUDA and MSVC), and implements
custom `complex_rsqrt`.
Also introduces `numext::rsqrt` to simplify implementation, and modified
`numext::hypot` to adhere to IEEE IEC 6059 for special cases.
The `complex_sqrt` and `complex_rsqrt` implementations were found to be
significantly faster than `std::sqrt<std::complex<T>>` and
`1/numext::sqrt<std::complex<T>>`.
Benchmark file attached.
```
GCC 10, Intel Xeon, x86_64:
---------------------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------------------
BM_Sqrt<std::complex<float>> 9.21 ns 9.21 ns 73225448
BM_StdSqrt<std::complex<float>> 17.1 ns 17.1 ns 40966545
BM_Sqrt<std::complex<double>> 8.53 ns 8.53 ns 81111062
BM_StdSqrt<std::complex<double>> 21.5 ns 21.5 ns 32757248
BM_Rsqrt<std::complex<float>> 10.3 ns 10.3 ns 68047474
BM_DivSqrt<std::complex<float>> 16.3 ns 16.3 ns 42770127
BM_Rsqrt<std::complex<double>> 11.3 ns 11.3 ns 61322028
BM_DivSqrt<std::complex<double>> 16.5 ns 16.5 ns 42200711
Clang 11, Intel Xeon, x86_64:
---------------------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------------------
BM_Sqrt<std::complex<float>> 7.46 ns 7.45 ns 90742042
BM_StdSqrt<std::complex<float>> 16.6 ns 16.6 ns 42369878
BM_Sqrt<std::complex<double>> 8.49 ns 8.49 ns 81629030
BM_StdSqrt<std::complex<double>> 21.8 ns 21.7 ns 31809588
BM_Rsqrt<std::complex<float>> 8.39 ns 8.39 ns 82933666
BM_DivSqrt<std::complex<float>> 14.4 ns 14.4 ns 48638676
BM_Rsqrt<std::complex<double>> 9.83 ns 9.82 ns 70068956
BM_DivSqrt<std::complex<double>> 15.7 ns 15.7 ns 44487798
Clang 9, Pixel 2, aarch64:
---------------------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------------------
BM_Sqrt<std::complex<float>> 24.2 ns 24.1 ns 28616031
BM_StdSqrt<std::complex<float>> 104 ns 103 ns 6826926
BM_Sqrt<std::complex<double>> 31.8 ns 31.8 ns 22157591
BM_StdSqrt<std::complex<double>> 128 ns 128 ns 5437375
BM_Rsqrt<std::complex<float>> 31.9 ns 31.8 ns 22384383
BM_DivSqrt<std::complex<float>> 99.2 ns 98.9 ns 7250438
BM_Rsqrt<std::complex<double>> 46.0 ns 45.8 ns 15338689
BM_DivSqrt<std::complex<double>> 119 ns 119 ns 5898944
```
2021-01-17 08:50:57 -08:00
Antonio Sanchez
c6efc4e0ba
Replace M_LOG2E and M_LN2 with custom macros.
...
For these to exist we would need to define `_USE_MATH_DEFINES` before
`cmath` or `math.h` is first included. However, we don't
control the include order for projects outside Eigen, so even defining
the macro in `Eigen/Core` does not fix the issue for projects that
end up including `<cmath>` before Eigen does (explicitly or transitively).
To fix this, we define `EIGEN_LOG2E` and `EIGEN_LN2` ourselves.
2020-12-11 14:34:31 -08:00
Rasmus Munk Larsen
f9fac1d5b0
Add log2() to Eigen.
2020-12-04 21:45:09 +00:00
Rasmus Munk Larsen
f23dc5b971
Revert "Add log2() operator to Eigen"
...
This reverts commit 4d91519a9be061da5d300079fca17dd0b9328050.
2020-12-03 14:32:45 -08:00
Rasmus Munk Larsen
4d91519a9b
Add log2() operator to Eigen
2020-12-03 22:31:44 +00:00
Antonio Sanchez
22f67b5958
Fix boolean float conversion and product warnings.
...
This fixes some gcc warnings such as:
```
Eigen/src/Core/GenericPacketMath.h:655:63: warning: implicit conversion turns floating-point number into bool: 'typename __gnu_cxx::__enable_if<__is_integer<bool>::__value, double>::__type' (aka 'double') to 'bool' [-Wimplicit-conversion-floating-point-to-bool]
Packet psqrt(const Packet& a) { EIGEN_USING_STD(sqrt); return sqrt(a); }
```
Details:
- Added `scalar_sqrt_op<bool>` (`-Wimplicit-conversion-floating-point-to-bool`).
- Added `scalar_square_op<bool>` and `scalar_cube_op<bool>`
specializations (`-Wint-in-bool-context`)
- Deprecated above specialized ops for bool.
- Modified `cxx11_tensor_block_eval` to specialize generator for
booleans (`-Wint-in-bool-context`) and to use `abs` instead of `square` to
avoid deprecated bool ops.
2020-11-24 20:20:36 +00:00
Rasmus Munk Larsen
c6953f799b
Add packet generic ops predux_fmin
, predux_fmin_nan
, predux_fmax
, and predux_fmax_nan
that implement reductions with PropagateNaN
, and PropagateNumbers
semantics. Add (slow) generic implementations for most reductions.
2020-10-13 21:48:31 +00:00
David Tellenbach
4091f6b25c
Drop EIGEN_USING_STD_MATH in favour of EIGEN_USING_STD
2020-10-09 02:05:05 +02:00
Rasmus Munk Larsen
b431024404
Don't make assumptions about NaN-propagation for pmin/pmax - it various across platforms.
...
Change test to only test for NaN-propagation for pfmin/pfmax.
2020-10-07 19:05:18 +00:00
Georg Jäger
1b1082334b
adding attributes to constructors to support hip-clang on ROCm 3.5
2020-08-20 16:48:11 +02:00
Rasmus Munk Larsen
6964ae8d52
Change the sign operator in Eigen to return NaN for NaN arguments, not zero.
2020-07-07 01:54:04 +00:00
Sheng Yang
116c5235ac
BF16 for scalar_cmp_with_cast_op
2020-07-01 18:33:42 +00:00
ShengYang1
b5d66b5e73
Implement scalar_cmp_with_cast_op
2020-06-09 08:12:07 +08:00
Rasmus Munk Larsen
daf9bbeca2
Fix compilation error in logistic packet op.
2020-06-03 00:57:41 +00:00
Gael Guennebaud
029a76e115
Bug #1777 : make the scalar and packet path consistent for the logistic function + respective unit test
2020-05-31 00:53:37 +02:00
Rasmus Munk Larsen
c1d944dd91
Remove packet ops pinsertfirst and pinsertlast that are only used in a single place, and can be replaced by other ops when constructing the first/final packet in linspaced_op_impl::packetOp.
...
I cannot measure any performance changes for SSE, AVX, or AVX512.
name old time/op new time/op delta
BM_LinSpace<float>/1 1.63ns ± 0% 1.63ns ± 0% ~ (p=0.762 n=5+5)
BM_LinSpace<float>/8 4.92ns ± 3% 4.89ns ± 3% ~ (p=0.421 n=5+5)
BM_LinSpace<float>/64 34.6ns ± 0% 34.6ns ± 0% ~ (p=0.841 n=5+5)
BM_LinSpace<float>/512 217ns ± 0% 217ns ± 0% ~ (p=0.421 n=5+5)
BM_LinSpace<float>/4k 1.68µs ± 0% 1.68µs ± 0% ~ (p=1.000 n=5+5)
BM_LinSpace<float>/32k 13.3µs ± 0% 13.3µs ± 0% ~ (p=0.905 n=5+4)
BM_LinSpace<float>/256k 107µs ± 0% 107µs ± 0% ~ (p=0.841 n=5+5)
BM_LinSpace<float>/1M 427µs ± 0% 427µs ± 0% ~ (p=0.690 n=5+5)
2020-05-08 15:41:50 -07:00
Rasmus Munk Larsen
ab773c7e91
Extend support for Packet16b:
...
* Add ptranspose<*,4> to support matmul and add unit test for Matrix<bool> * Matrix<bool>
* work around a bug in slicing of Tensor<bool>.
* Add tensor tests
This speeds up matmul for boolean matrices by about 10x
name old time/op new time/op delta
BM_MatMul<bool>/8 267ns ± 0% 479ns ± 0% +79.25% (p=0.008 n=5+5)
BM_MatMul<bool>/32 6.42µs ± 0% 0.87µs ± 0% -86.50% (p=0.008 n=5+5)
BM_MatMul<bool>/64 43.3µs ± 0% 5.9µs ± 0% -86.42% (p=0.008 n=5+5)
BM_MatMul<bool>/128 315µs ± 0% 44µs ± 0% -85.98% (p=0.008 n=5+5)
BM_MatMul<bool>/256 2.41ms ± 0% 0.34ms ± 0% -85.68% (p=0.008 n=5+5)
BM_MatMul<bool>/512 18.8ms ± 0% 2.7ms ± 0% -85.53% (p=0.008 n=5+5)
BM_MatMul<bool>/1k 149ms ± 0% 22ms ± 0% -85.40% (p=0.008 n=5+5)
2020-04-28 16:12:47 +00:00
Rasmus Munk Larsen
2f6ddaa25c
Add partial vectorization for matrices and tensors of bool. This speeds up boolean operations on Tensors by up to 25x.
...
Benchmark numbers for the logical and of two NxN tensors:
name old time/op new time/op delta
BM_booleanAnd_1T/3 [using 1 threads] 14.6ns ± 0% 14.4ns ± 0% -0.96%
BM_booleanAnd_1T/4 [using 1 threads] 20.5ns ±12% 9.0ns ± 0% -56.07%
BM_booleanAnd_1T/7 [using 1 threads] 41.7ns ± 0% 10.5ns ± 0% -74.87%
BM_booleanAnd_1T/8 [using 1 threads] 52.1ns ± 0% 10.1ns ± 0% -80.59%
BM_booleanAnd_1T/10 [using 1 threads] 76.3ns ± 0% 13.8ns ± 0% -81.87%
BM_booleanAnd_1T/15 [using 1 threads] 167ns ± 0% 16ns ± 0% -90.45%
BM_booleanAnd_1T/16 [using 1 threads] 188ns ± 0% 16ns ± 0% -91.57%
BM_booleanAnd_1T/31 [using 1 threads] 667ns ± 0% 34ns ± 0% -94.83%
BM_booleanAnd_1T/32 [using 1 threads] 710ns ± 0% 35ns ± 0% -95.01%
BM_booleanAnd_1T/64 [using 1 threads] 2.80µs ± 0% 0.11µs ± 0% -95.93%
BM_booleanAnd_1T/128 [using 1 threads] 11.2µs ± 0% 0.4µs ± 0% -96.11%
BM_booleanAnd_1T/256 [using 1 threads] 44.6µs ± 0% 2.5µs ± 0% -94.31%
BM_booleanAnd_1T/512 [using 1 threads] 178µs ± 0% 10µs ± 0% -94.35%
BM_booleanAnd_1T/1k [using 1 threads] 717µs ± 0% 78µs ± 1% -89.07%
BM_booleanAnd_1T/2k [using 1 threads] 2.87ms ± 0% 0.31ms ± 1% -89.08%
BM_booleanAnd_1T/4k [using 1 threads] 11.7ms ± 0% 1.9ms ± 4% -83.55%
BM_booleanAnd_1T/10k [using 1 threads] 70.3ms ± 0% 17.2ms ± 4% -75.48%
2020-04-20 20:16:28 +00:00
Rasmus Munk Larsen
4da2c6b197
Remove reference to non-existent unary_op_base class.
2020-03-19 18:23:06 +00:00
Rasmus Munk Larsen
eda90baf35
Add missing arguments to numext::absdiff().
2020-03-19 18:16:55 +00:00
Joel Holdsworth
d5c665742b
Add absolute_difference coefficient-wise binary Array function
2020-03-19 17:45:20 +00:00
Joel Holdsworth
232f904082
Add shift_left<N> and shift_right<N> coefficient-wise unary Array functions
2020-03-19 17:24:06 +00:00
Allan Leal
37ccb86916
Update NullaryFunctors.h
2020-03-16 11:59:02 +00:00