eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-10-17 18:41:31 +08:00

Author	SHA1	Message	Date
Rasmus Munk Larsen	f4ec8edea8	Add macro EIGEN_AVOID_THREAD_LOCAL to make it possible to manually disable the use of thread_local.	2019-03-06 11:52:04 -08:00
Rasmus Munk Larsen	41cdc370d0	Fix placement of "#if defined(EIGEN_GPUCC)" guard region. Found with -Wundefined-func-template. Author: tkoeppe@google.com	2019-03-06 11:42:22 -08:00
Rasmus Munk Larsen	cc407c9d4d	Fix placement of "#if defined(EIGEN_GPUCC)" guard region. Found with -Wundefined-func-template. Author: tkoeppe@google.com	2019-03-06 11:40:06 -08:00
Eugene Zhulenev	1bc2a0a57c	Add missing return to NonBlockingThreadPool::LocalSteal	2019-03-06 10:49:49 -08:00
Eugene Zhulenev	4e4dcd9026	Remove redundant steal loop	2019-03-06 10:39:07 -08:00
Eugene Zhulenev	25abaa2e41	Check that inner block dimension is continuous	2019-03-05 17:34:35 -08:00
Eugene Zhulenev	5d9a6686ed	Block evaluation for TensorGeneratorOp	2019-03-05 16:35:21 -08:00
Eugene Zhulenev	a407e022e6	Tune tensor contraction threadpool heuristics	2019-03-05 14:19:59 -08:00
Eugene Zhulenev	56c6373f82	Add an extra check for the RunQueue size estimate	2019-03-05 11:51:26 -08:00
Eugene Zhulenev	efb5080d31	Do not initialize invalid fast_strides in TensorGeneratorOp	2019-03-04 16:58:49 -08:00
Eugene Zhulenev	b95941e5c2	Add tiled evaluation for TensorForcedEvalOp	2019-03-04 16:02:22 -08:00
Eugene Zhulenev	694084ecbd	Use fast divisors in TensorGeneratorOp	2019-03-04 11:10:21 -08:00
Rasmus Munk Larsen	cf4a1c81fa	Fix specialization for conjugate on non-complex types in TensorBase.h.	2019-03-01 14:21:09 -08:00
Rasmus Munk Larsen	6560692c67	Improve EventCount used by the non-blocking threadpool. The current algorithm requires threads to commit/cancel waiting in order they called Prewait. Spinning caused by that serialization can consume lots of CPU time on some workloads. Restructure the algorithm to not require that serialization and remove spin waits from Commit/CancelWait. Note: this reduces max number of threads from 2^16 to 2^14 to leave more space for ABA counter (which is now 22 bits). Implementation details are explained in comments.	2019-02-22 13:56:26 -08:00
Rasmus Munk Larsen	071629a440	Fix incorrect value of NumDimensions in TensorContraction traits. Reported here: #1671	2019-02-19 10:49:54 -08:00
Rasmus Larsen	efeabee445	Merged in ezhulenev/eigen-01 (pull request PR-590) Do not generate no-op cast() and conjugate() expressions	2019-02-14 21:16:12 +00:00
Eugene Zhulenev	7b837559a7	Fix signed-unsigned return in RuqQueue	2019-02-14 10:40:21 -08:00
Eugene Zhulenev	f0d42d2265	Fix signed-unsigned comparison warning in RunQueue	2019-02-14 10:27:28 -08:00
Eugene Zhulenev	106ba7bb1a	Do not generate no-op cast() and conjugate() expressions	2019-02-14 09:51:51 -08:00
Eugene Zhulenev	8c2f30c790	Speedup Tensor ThreadPool RunQueu::Empty()	2019-02-13 10:20:53 -08:00
Eugene Zhulenev	21eb97d3e0	Add PacketConv implementation for non-vectorizable src expressions	2019-02-08 15:47:25 -08:00
Eugene Zhulenev	1e36166ed1	Optimize TensorConversion evaluator: do not convert same type	2019-02-08 15:13:24 -08:00
Eugene Zhulenev	59998117bb	Don't do parallel_pack if we can use thread_local memory in tensor contractions	2019-02-07 09:21:25 -08:00
Eugene Zhulenev	8491127082	Do not reduce parallelism too much in contractions with small number of threads	2019-02-04 12:59:33 -08:00
Eugene Zhulenev	eb21bab769	Parallelize tensor contraction only by sharding dimension and use 'thread-local' memory for packing	2019-02-04 10:43:16 -08:00
Gael Guennebaud	d586686924	Workaround lack of support for arbitrary packet-type in Tensor by manually loading half/quarter packets in tensor contraction mapper.	2019-01-30 16:48:01 +01:00
Christoph Hertzberg	a7779a9b42	Hide some annoying unused variable warnings in g++8.1	2019-01-29 16:48:21 +01:00
Christoph Hertzberg	c9825b967e	Renaming even more `I` identifiers	2019-01-26 13:22:13 +01:00
Christoph Hertzberg	934b8a1304	Avoid `I` as an identifier, since it may clash with the C-header complex.h	2019-01-25 14:54:39 +01:00
Eugene Zhulenev	1e6d15b55b	Fix shorten-64-to-32 warning in TensorContractionThreadPool	2019-01-11 11:41:53 -08:00
Eugene Zhulenev	0abe03764c	Fix shorten-64-to-32 warning in TensorContractionThreadPool	2019-01-10 10:27:55 -08:00
Gael Guennebaud	d812f411c3	bug #1654 : fix compilation with cuda and no c++11	2019-01-09 18:00:05 +01:00
Eugene Zhulenev	e70ffef967	Optimize evalShardedByInnerDim	2019-01-08 16:26:31 -08:00
Rasmus Munk Larsen	dd6d65898a	Fix shorten-64-to-32 warning. Use regular memcpy if num_threads==0.	2018-12-12 14:45:31 -08:00
Rasmus Munk Larsen	8a02883d58	Merged in markdryan/eigen/avx512-contraction-2 (pull request PR-554) Fix tensor contraction on AVX512 builds Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>	2018-12-05 18:19:32 +00:00
Mark D Ryan	36f8f6d0be	Fix evalShardedByInnerDim for AVX512 builds evalShardedByInnerDim ensures that the values it passes for start_k and end_k to evalGemmPartialWithoutOutputKernel are multiples of 8 as the kernel does not work correctly when the values of k are not multiples of the packet_size. While this precaution works for AVX builds, it is insufficient for AVX512 builds where the maximum packet size is 16. The result is slightly incorrect float32 contractions on AVX512 builds. This commit fixes the problem by ensuring that k is always a multiple of the packet_size if the packet_size is > 8.	2018-12-05 12:29:03 +01:00
Deven Desai	e7e6809e6b	ROCm/HIP specfic fixes + updates 1. Eigen/src/Core/arch/GPU/Half.h Updating the HIPCC implementation half so that it can declared as a __shared__ variable 2. Eigen/src/Core/util/Macros.h, Eigen/src/Core/util/Memory.h introducing a EIGEN_USE_STD(func) macro that calls - std::func be default - ::func when eigen is being compiled with HIPCC This change was requested in the previous HIP PR (https://bitbucket.org/eigen/eigen/pull-requests/518/pr-with-hip-specific-fixes-for-the-eigen/diff) 3. unsupported/Eigen/CXX11/src/Tensor/TensorDeviceThreadPool.h Removing EIGEN_DEVICE_FUNC attribute from pure virtual methods as it is not supported by HIPCC 4. unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h Disabling the template specializations of InnerMostDimReducer as they run into HIPCC link errors	2018-11-19 18:13:59 +00:00
Rasmus Munk Larsen	72928a2c8a	Merged in rmlarsen/eigen2 (pull request PR-543) Add parallel memcpy to TensorThreadPoolDevice in Eigen, but limit the number of threads to 4, beyond which we just seem to be wasting CPU cycles as the threads contend for memory bandwidth. Approved-by: Eugene Zhulenev <ezhulenev@google.com>	2018-11-13 17:10:30 +00:00
Rasmus Munk Larsen	cda479d626	Remove accidental changes.	2018-11-12 18:34:04 -08:00
Rasmus Munk Larsen	719d9aee65	Add parallel memcpy to TensorThreadPoolDevice in Eigen, but limit the number of threads to 4, beyond which we just seem to be wasting CPU cycles as the threads contend for memory bandwidth.	2018-11-12 17:46:02 -08:00
Rasmus Munk Larsen	93f9988a7e	A few small fixes to a) prevent throwing in ctors and dtors of the threading code, and b) supporting matrix exponential on platforms with 113 bits of mantissa for long doubles.	2018-11-09 14:15:32 -08:00
Christoph Hertzberg	449ff74672	Fix most Doxygen warnings. Also add links to stable documentation from unsupported modules (by using the corresponding Doxytags file). Manually grafted from d107a371c61b764c73fd1570b1f3ed1c6400dd7e	2018-10-19 21:10:28 +02:00
Eugene Zhulenev	9e96e91936	Move from rvalue arguments in ThreadPool enqueue* methods	2018-10-16 16:48:32 -07:00
Eugene Zhulenev	217d839816	Reduce thread scheduling overhead in parallelFor	2018-10-16 14:53:06 -07:00
Eugene Zhulenev	900c7c61bb	Check if it's allowed to squueze inner dimensions in TensorBlockIO	2018-10-15 16:52:33 -07:00
Christoph Hertzberg	3f2c8b7ff0	Fix a lot of Doxygen warnings in Tensor module	2018-10-09 20:22:47 +02:00
Rasmus Munk Larsen	d16634c4d4	Fix out-of bounds access in TensorArgMax.h.	2018-10-08 16:41:36 -07:00
Gael Guennebaud	64b1a15318	Workaround stupid warning	2018-10-08 12:01:18 +02:00
Christoph Hertzberg	b92c71235d	Move struct outside of method for C++03 compatibility.	2018-10-02 18:59:10 +02:00
Christoph Hertzberg	051f9c1aff	Make code compile in C++03 mode again	2018-10-02 18:36:30 +02:00

... 2 3 4 5 6 ...

1344 Commits