Rasmus Munk Larsen
f4ec8edea8
Add macro EIGEN_AVOID_THREAD_LOCAL to make it possible to manually disable the use of thread_local.
2019-03-06 11:52:04 -08:00
Rasmus Munk Larsen
41cdc370d0
Fix placement of "#if defined(EIGEN_GPUCC)" guard region.
...
Found with -Wundefined-func-template.
Author: tkoeppe@google.com
2019-03-06 11:42:22 -08:00
Rasmus Munk Larsen
cc407c9d4d
Fix placement of "#if defined(EIGEN_GPUCC)" guard region.
...
Found with -Wundefined-func-template.
Author: tkoeppe@google.com
2019-03-06 11:40:06 -08:00
Eugene Zhulenev
1bc2a0a57c
Add missing return to NonBlockingThreadPool::LocalSteal
2019-03-06 10:49:49 -08:00
Eugene Zhulenev
4e4dcd9026
Remove redundant steal loop
2019-03-06 10:39:07 -08:00
Eugene Zhulenev
25abaa2e41
Check that inner block dimension is continuous
2019-03-05 17:34:35 -08:00
Eugene Zhulenev
5d9a6686ed
Block evaluation for TensorGeneratorOp
2019-03-05 16:35:21 -08:00
Eugene Zhulenev
a407e022e6
Tune tensor contraction threadpool heuristics
2019-03-05 14:19:59 -08:00
Eugene Zhulenev
56c6373f82
Add an extra check for the RunQueue size estimate
2019-03-05 11:51:26 -08:00
Eugene Zhulenev
efb5080d31
Do not initialize invalid fast_strides in TensorGeneratorOp
2019-03-04 16:58:49 -08:00
Eugene Zhulenev
b95941e5c2
Add tiled evaluation for TensorForcedEvalOp
2019-03-04 16:02:22 -08:00
Eugene Zhulenev
694084ecbd
Use fast divisors in TensorGeneratorOp
2019-03-04 11:10:21 -08:00
Rasmus Munk Larsen
cf4a1c81fa
Fix specialization for conjugate on non-complex types in TensorBase.h.
2019-03-01 14:21:09 -08:00
Rasmus Munk Larsen
6560692c67
Improve EventCount used by the non-blocking threadpool.
...
The current algorithm requires threads to commit/cancel waiting in order
they called Prewait. Spinning caused by that serialization can consume
lots of CPU time on some workloads. Restructure the algorithm to not
require that serialization and remove spin waits from Commit/CancelWait.
Note: this reduces max number of threads from 2^16 to 2^14 to leave
more space for ABA counter (which is now 22 bits).
Implementation details are explained in comments.
2019-02-22 13:56:26 -08:00
Rasmus Munk Larsen
071629a440
Fix incorrect value of NumDimensions in TensorContraction traits.
...
Reported here: #1671
2019-02-19 10:49:54 -08:00
Rasmus Larsen
efeabee445
Merged in ezhulenev/eigen-01 (pull request PR-590)
...
Do not generate no-op cast() and conjugate() expressions
2019-02-14 21:16:12 +00:00
Eugene Zhulenev
7b837559a7
Fix signed-unsigned return in RuqQueue
2019-02-14 10:40:21 -08:00
Eugene Zhulenev
f0d42d2265
Fix signed-unsigned comparison warning in RunQueue
2019-02-14 10:27:28 -08:00
Eugene Zhulenev
106ba7bb1a
Do not generate no-op cast() and conjugate() expressions
2019-02-14 09:51:51 -08:00
Eugene Zhulenev
8c2f30c790
Speedup Tensor ThreadPool RunQueu::Empty()
2019-02-13 10:20:53 -08:00
Eugene Zhulenev
21eb97d3e0
Add PacketConv implementation for non-vectorizable src expressions
2019-02-08 15:47:25 -08:00
Eugene Zhulenev
1e36166ed1
Optimize TensorConversion evaluator: do not convert same type
2019-02-08 15:13:24 -08:00
Eugene Zhulenev
59998117bb
Don't do parallel_pack if we can use thread_local memory in tensor contractions
2019-02-07 09:21:25 -08:00
Eugene Zhulenev
8491127082
Do not reduce parallelism too much in contractions with small number of threads
2019-02-04 12:59:33 -08:00
Eugene Zhulenev
eb21bab769
Parallelize tensor contraction only by sharding dimension and use 'thread-local' memory for packing
2019-02-04 10:43:16 -08:00
Gael Guennebaud
d586686924
Workaround lack of support for arbitrary packet-type in Tensor by manually loading half/quarter packets in tensor contraction mapper.
2019-01-30 16:48:01 +01:00
Christoph Hertzberg
a7779a9b42
Hide some annoying unused variable warnings in g++8.1
2019-01-29 16:48:21 +01:00
Christoph Hertzberg
c9825b967e
Renaming even more I
identifiers
2019-01-26 13:22:13 +01:00
Christoph Hertzberg
934b8a1304
Avoid I
as an identifier, since it may clash with the C-header complex.h
2019-01-25 14:54:39 +01:00
Eugene Zhulenev
1e6d15b55b
Fix shorten-64-to-32 warning in TensorContractionThreadPool
2019-01-11 11:41:53 -08:00
Eugene Zhulenev
0abe03764c
Fix shorten-64-to-32 warning in TensorContractionThreadPool
2019-01-10 10:27:55 -08:00
Gael Guennebaud
d812f411c3
bug #1654 : fix compilation with cuda and no c++11
2019-01-09 18:00:05 +01:00
Eugene Zhulenev
e70ffef967
Optimize evalShardedByInnerDim
2019-01-08 16:26:31 -08:00
Rasmus Munk Larsen
dd6d65898a
Fix shorten-64-to-32 warning. Use regular memcpy if num_threads==0.
2018-12-12 14:45:31 -08:00
Rasmus Munk Larsen
8a02883d58
Merged in markdryan/eigen/avx512-contraction-2 (pull request PR-554)
...
Fix tensor contraction on AVX512 builds
Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>
2018-12-05 18:19:32 +00:00
Mark D Ryan
36f8f6d0be
Fix evalShardedByInnerDim for AVX512 builds
...
evalShardedByInnerDim ensures that the values it passes for start_k and
end_k to evalGemmPartialWithoutOutputKernel are multiples of 8 as the kernel
does not work correctly when the values of k are not multiples of the
packet_size. While this precaution works for AVX builds, it is insufficient
for AVX512 builds where the maximum packet size is 16. The result is slightly
incorrect float32 contractions on AVX512 builds.
This commit fixes the problem by ensuring that k is always a multiple of
the packet_size if the packet_size is > 8.
2018-12-05 12:29:03 +01:00
Deven Desai
e7e6809e6b
ROCm/HIP specfic fixes + updates
...
1. Eigen/src/Core/arch/GPU/Half.h
Updating the HIPCC implementation half so that it can declared as a __shared__ variable
2. Eigen/src/Core/util/Macros.h, Eigen/src/Core/util/Memory.h
introducing a EIGEN_USE_STD(func) macro that calls
- std::func be default
- ::func when eigen is being compiled with HIPCC
This change was requested in the previous HIP PR
(https://bitbucket.org/eigen/eigen/pull-requests/518/pr-with-hip-specific-fixes-for-the-eigen/diff )
3. unsupported/Eigen/CXX11/src/Tensor/TensorDeviceThreadPool.h
Removing EIGEN_DEVICE_FUNC attribute from pure virtual methods as it is not supported by HIPCC
4. unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h
Disabling the template specializations of InnerMostDimReducer as they run into HIPCC link errors
2018-11-19 18:13:59 +00:00
Rasmus Munk Larsen
72928a2c8a
Merged in rmlarsen/eigen2 (pull request PR-543)
...
Add parallel memcpy to TensorThreadPoolDevice in Eigen, but limit the number of threads to 4, beyond which we just seem to be wasting CPU cycles as the threads contend for memory bandwidth.
Approved-by: Eugene Zhulenev <ezhulenev@google.com>
2018-11-13 17:10:30 +00:00
Rasmus Munk Larsen
cda479d626
Remove accidental changes.
2018-11-12 18:34:04 -08:00
Rasmus Munk Larsen
719d9aee65
Add parallel memcpy to TensorThreadPoolDevice in Eigen, but limit the number of threads to 4, beyond which we just seem to be wasting CPU cycles as the threads contend for memory bandwidth.
2018-11-12 17:46:02 -08:00
Rasmus Munk Larsen
93f9988a7e
A few small fixes to a) prevent throwing in ctors and dtors of the threading code, and b) supporting matrix exponential on platforms with 113 bits of mantissa for long doubles.
2018-11-09 14:15:32 -08:00
Christoph Hertzberg
449ff74672
Fix most Doxygen warnings. Also add links to stable documentation from unsupported modules (by using the corresponding Doxytags file).
...
Manually grafted from d107a371c61b764c73fd1570b1f3ed1c6400dd7e
2018-10-19 21:10:28 +02:00
Eugene Zhulenev
9e96e91936
Move from rvalue arguments in ThreadPool enqueue* methods
2018-10-16 16:48:32 -07:00
Eugene Zhulenev
217d839816
Reduce thread scheduling overhead in parallelFor
2018-10-16 14:53:06 -07:00
Eugene Zhulenev
900c7c61bb
Check if it's allowed to squueze inner dimensions in TensorBlockIO
2018-10-15 16:52:33 -07:00
Christoph Hertzberg
3f2c8b7ff0
Fix a lot of Doxygen warnings in Tensor module
2018-10-09 20:22:47 +02:00
Rasmus Munk Larsen
d16634c4d4
Fix out-of bounds access in TensorArgMax.h.
2018-10-08 16:41:36 -07:00
Gael Guennebaud
64b1a15318
Workaround stupid warning
2018-10-08 12:01:18 +02:00
Christoph Hertzberg
b92c71235d
Move struct outside of method for C++03 compatibility.
2018-10-02 18:59:10 +02:00
Christoph Hertzberg
051f9c1aff
Make code compile in C++03 mode again
2018-10-02 18:36:30 +02:00