Antonio Sánchez
8719b9c5bc
Disable test for 32-bit systems (e.g. ARM, i386)
...
Both i386 and 32-bit ARM do not define __uint128_t. On most systems, if
__uint128_t is defined, then so is the macro __SIZEOF_INT128__.
https://stackoverflow.com/questions/18531782/how-to-know-if-uint128-t-is-defined1
2020-05-28 17:40:15 +00:00
Rasmus Munk Larsen
ab773c7e91
Extend support for Packet16b:
...
* Add ptranspose<*,4> to support matmul and add unit test for Matrix<bool> * Matrix<bool>
* work around a bug in slicing of Tensor<bool>.
* Add tensor tests
This speeds up matmul for boolean matrices by about 10x
name old time/op new time/op delta
BM_MatMul<bool>/8 267ns ± 0% 479ns ± 0% +79.25% (p=0.008 n=5+5)
BM_MatMul<bool>/32 6.42µs ± 0% 0.87µs ± 0% -86.50% (p=0.008 n=5+5)
BM_MatMul<bool>/64 43.3µs ± 0% 5.9µs ± 0% -86.42% (p=0.008 n=5+5)
BM_MatMul<bool>/128 315µs ± 0% 44µs ± 0% -85.98% (p=0.008 n=5+5)
BM_MatMul<bool>/256 2.41ms ± 0% 0.34ms ± 0% -85.68% (p=0.008 n=5+5)
BM_MatMul<bool>/512 18.8ms ± 0% 2.7ms ± 0% -85.53% (p=0.008 n=5+5)
BM_MatMul<bool>/1k 149ms ± 0% 22ms ± 0% -85.40% (p=0.008 n=5+5)
2020-04-28 16:12:47 +00:00
Rasmus Munk Larsen
2f6ddaa25c
Add partial vectorization for matrices and tensors of bool. This speeds up boolean operations on Tensors by up to 25x.
...
Benchmark numbers for the logical and of two NxN tensors:
name old time/op new time/op delta
BM_booleanAnd_1T/3 [using 1 threads] 14.6ns ± 0% 14.4ns ± 0% -0.96%
BM_booleanAnd_1T/4 [using 1 threads] 20.5ns ±12% 9.0ns ± 0% -56.07%
BM_booleanAnd_1T/7 [using 1 threads] 41.7ns ± 0% 10.5ns ± 0% -74.87%
BM_booleanAnd_1T/8 [using 1 threads] 52.1ns ± 0% 10.1ns ± 0% -80.59%
BM_booleanAnd_1T/10 [using 1 threads] 76.3ns ± 0% 13.8ns ± 0% -81.87%
BM_booleanAnd_1T/15 [using 1 threads] 167ns ± 0% 16ns ± 0% -90.45%
BM_booleanAnd_1T/16 [using 1 threads] 188ns ± 0% 16ns ± 0% -91.57%
BM_booleanAnd_1T/31 [using 1 threads] 667ns ± 0% 34ns ± 0% -94.83%
BM_booleanAnd_1T/32 [using 1 threads] 710ns ± 0% 35ns ± 0% -95.01%
BM_booleanAnd_1T/64 [using 1 threads] 2.80µs ± 0% 0.11µs ± 0% -95.93%
BM_booleanAnd_1T/128 [using 1 threads] 11.2µs ± 0% 0.4µs ± 0% -96.11%
BM_booleanAnd_1T/256 [using 1 threads] 44.6µs ± 0% 2.5µs ± 0% -94.31%
BM_booleanAnd_1T/512 [using 1 threads] 178µs ± 0% 10µs ± 0% -94.35%
BM_booleanAnd_1T/1k [using 1 threads] 717µs ± 0% 78µs ± 1% -89.07%
BM_booleanAnd_1T/2k [using 1 threads] 2.87ms ± 0% 0.31ms ± 1% -89.08%
BM_booleanAnd_1T/4k [using 1 threads] 11.7ms ± 0% 1.9ms ± 4% -83.55%
BM_booleanAnd_1T/10k [using 1 threads] 70.3ms ± 0% 17.2ms ± 4% -75.48%
2020-04-20 20:16:28 +00:00
Aaron Franke
5c22c7a7de
Make file formatting comply with POSIX and Unix standards
...
UTF-8, LF, no BOM, and newlines at the end of files
2020-03-23 18:09:02 +00:00
Srinivas Vasudevan
f6c6de5d63
Ensure Igamma does not NaN or Inf for large values.
2020-01-14 21:32:48 +00:00
Srinivas Vasudevan
2e099e8d8f
Added special_packetmath test and tweaked bounds on tests.
...
Refactor shared packetmath code to header file.
(Squashed from PR !38 )
2020-01-11 10:31:21 +00:00
Christoph Hertzberg
1e9664b147
Bug #1796 : Make matrix squareroot usable for Map and Ref types
2019-12-20 18:10:22 +01:00
Christoph Hertzberg
c21771ac04
Use double-braces initialization (as everywhere else in the test-suite).
2019-12-19 19:20:48 +01:00
Eugene Zhulenev
ae07801dd8
Tensor block evaluation cost model
2019-12-18 20:07:00 +00:00
Eugene Zhulenev
1c879eb010
Remove V2 suffix from TensorBlock
2019-12-10 15:40:23 -08:00
Eugene Zhulenev
dbca11e880
Remove TensorBlock.h and old TensorBlock/BlockMapper
2019-12-10 14:31:44 -08:00
Janek Kozicki
11d6465326
fix AlignedVector3 inconsisent interface with other Vector classes, default constructor and operator- were missing.
2019-12-06 21:07:39 +01:00
Rasmus Munk Larsen
366cf005b0
Add missing initialization in cxx11_tensor_trace.cpp.
2019-12-04 23:56:37 +00:00
Mehdi Goli
00f32752f7
[SYCL] Rebasing the SYCL support branch on top of the Einge upstream master branch.
...
* Unifying all loadLocalTile from lhs and rhs to an extract_block function.
* Adding get_tensor operation which was missing in TensorContractionMapper.
* Adding the -D method missing from cmake for Disable_Skinny Contraction operation.
* Wrapping all the indices in TensorScanSycl into Scan parameter struct.
* Fixing typo in Device SYCL
* Unifying load to private register for tall/skinny no shared
* Unifying load to vector tile for tensor-vector/vector-tensor operation
* Removing all the LHS/RHS class for extracting data from global
* Removing Outputfunction from TensorContractionSkinnyNoshared.
* Combining the local memory version of tall/skinny and normal tensor contraction into one kernel.
* Combining the no-local memory version of tall/skinny and normal tensor contraction into one kernel.
* Combining General Tensor-Vector and VectorTensor contraction into one kernel.
* Making double buffering optional for Tensor contraction when local memory is version is used.
* Modifying benchmark to accept custom Reduction Sizes
* Disabling AVX optimization for SYCL backend on the host to allow SSE optimization to the host
* Adding Test for SYCL
* Modifying SYCL CMake
2019-11-28 10:08:54 +00:00
Hans Johnson
6fb3e5f176
STYLE: Remove CMake-language block-end command arguments
...
Ancient versions of CMake required else(), endif(), and similar block
termination commands to have arguments matching the command starting the block.
This is no longer the preferred style.
2019-10-31 11:36:27 -05:00
Gael Guennebaud
b9837ca9ae
bug #1281 : fix AutoDiffScalar's make_coherent for nested expression of constant ADs.
2019-11-14 14:58:08 +01:00
Eugene Zhulenev
13c3327f5c
Remove legacy block evaluation support
2019-11-12 10:12:28 -08:00
Rasmus Munk Larsen
ebf04fb3e8
Fix data race in css11_tensor_notification test.
2019-11-08 17:44:50 -08:00
Rasmus Munk Larsen
97c0c5d485
Add block evaluation V2 to TensorAsyncExecutor.
...
Add async evaluation to a number of ops.
2019-10-22 12:42:44 -07:00
Rasmus Munk Larsen
668ab3fc47
Drop support for c++03 in Eigen tensor. Get rid of some code used to emulate c++11 functionality with older compilers.
2019-10-18 16:42:00 -07:00
Eugene Zhulenev
0d2a14ce11
Cleanup Tensor block destination and materialized block storage allocation
2019-10-16 17:14:37 -07:00
Eugene Zhulenev
02431cbe71
TensorBroadcasting support for random/uniform blocks
2019-10-16 13:26:28 -07:00
Eugene Zhulenev
d380c23b2c
Block evaluation for TensorGenerator/TensorReverse/TensorShuffling
2019-10-14 14:31:59 -07:00
Eugene Zhulenev
a411e9f344
Block evaluation for TensorGenerator + TensorReverse + fixed bug in tensor reverse op
2019-10-10 10:56:58 -07:00
Eugene Zhulenev
33e1746139
Block evaluation for TensorChipping + fixed bugs in TensorPadding and TensorSlicing
2019-10-09 12:45:31 -07:00
Gael Guennebaud
f0a4642bab
Implement c++03 compatible fix for changeset 7a43af1a335da2c0489b4119a33ee1cbff0c15d6
2019-10-09 16:00:57 +02:00
Gael Guennebaud
7a43af1a33
Fix compilation of FFTW unit test
2019-10-08 08:58:35 +02:00
Eugene Zhulenev
f74ab8cb8d
Add block evaluation to TensorEvalTo and fix few small bugs
2019-10-07 15:34:26 -07:00
Eugene Zhulenev
98bdd7252e
Fix compilation warnings and errors with clang in TensorBlockV2 code and tests
2019-10-04 10:15:33 -07:00
Eugene Zhulenev
60ae24ee1a
Add block evaluation to TensorReshaping/TensorCasting/TensorPadding/TensorSelect
2019-10-02 12:44:06 -07:00
Eugene Zhulenev
7c8bc0d928
Fix cxx11_tensor_block_io test
2019-09-25 11:48:11 -07:00
Eugene Zhulenev
71d5bedf72
Fix compilation warnings and errors with clang in TensorBlockV2
2019-09-25 11:25:22 -07:00
Eugene Zhulenev
c97b208468
Add new TensorBlock api implementation + tests
2019-09-24 15:17:35 -07:00
Eugene Zhulenev
ef9dfee7bd
Tensor block evaluation V2 support for unary/binary/broadcsting
2019-09-24 12:52:45 -07:00
Rasmus Munk Larsen
1d5af0693c
Add support for asynchronous evaluation of tensor casting expressions.
2019-09-19 13:54:49 -07:00
Srinivas Vasudevan
df0816b71f
Merging eigen/eigen.
2019-09-16 19:33:29 -04:00
Srinivas Vasudevan
6e215cf109
Add Bessel functions to SpecialFunctions.
...
- Split SpecialFunctions files in to a separate BesselFunctions file.
In particular add:
- Modified bessel functions of the second kind k0, k1, k0e, k1e
- Bessel functions of the first kind j0, j1
- Bessel functions of the second kind y0, y1
2019-09-14 12:16:47 -04:00
Deven Desai
cdb377d0cb
Fix for the HIP build+test errors introduced by the ndtri support.
...
The fixes needed are
* adding EIGEN_DEVICE_FUNC attribute to a couple of funcs (else HIPCC will error out when non-device funcs are called from global/device funcs)
* switching to using ::<math_func> instead std::<math_func> (only for HIPCC) in cases where the std::<math_func> is not recognized as a device func by HIPCC
* removing an errant "j" from a testcase (don't know how that made it in to begin with!)
2019-09-06 16:03:49 +00:00
Eugene Zhulenev
d918bd9a8b
Update ThreadLocal to use separate Initialize/Release callables
2019-09-10 16:13:32 -07:00
Eugene Zhulenev
e3dec4dcc1
ThreadLocal container that does not rely on thread local storage
2019-09-09 15:18:14 -07:00
Srinivas Vasudevan
e38dd48a27
PR 681: Add ndtri function, the inverse of the normal distribution function.
2019-08-12 19:26:29 -04:00
Eugene Zhulenev
47fefa235f
Allow move-only done callback in TensorAsyncDevice
2019-09-03 17:20:56 -07:00
Eugene Zhulenev
a8d264fa9c
Add test for const TensorMap underlying data mutation
2019-09-03 11:38:39 -07:00
Eugene Zhulenev
f0b36fb9a4
evalSubExprsIfNeededAsync + async TensorContractionThreadPool
2019-08-30 15:13:38 -07:00
Eugene Zhulenev
66665e7e76
Asynchronous expression evaluation with TensorAsyncDevice
2019-08-30 14:49:40 -07:00
Eugene Zhulenev
bc40d4522c
Const correctness in TensorMap<const Tensor<T, ...>> expressions
2019-08-28 17:46:05 -07:00
Eugene Zhulenev
071311821e
Remove XSMM support from Tensor module
2019-08-19 11:44:25 -07:00
Rasmus Munk Larsen
facc4e4536
Disable tests for contraction with output kernels when using libxsmm, which does not support this.
2019-08-07 14:11:15 -07:00
Eugene Zhulenev
6e7c76481a
Merge with Eigen head
2019-06-28 11:22:46 -07:00
Eugene Zhulenev
878845cb25
Add block access to TensorReverseOp and make sure that TensorForcedEval uses block access when preferred
2019-06-28 11:13:44 -07:00