eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-10-19 19:41:07 +08:00

Author	SHA1	Message	Date
Jeremy Barnes	91678f489a	Cleaned up double-defined macro from last commit	2016-01-10 22:44:45 -05:00
Jeremy Barnes	403a7cb6c3	Alternative way of forcing instantiation of device kernels without causing warnings or requiring device to device kernel invocations. This allows Tensorflow to work on SM 3.0 (ie, Amazon EC2) machines.	2016-01-10 22:39:13 -05:00
Benoit Steiner	e76904af1b	Simplified the dispatch code.	2016-01-08 16:50:57 -08:00
Benoit Steiner	d726e864ac	Made it possible to use array of size 0 on CUDA devices	2016-01-08 16:38:14 -08:00
Benoit Steiner	3358dfd5dd	Reworked the dispatch of optimized cuda reduction kernels to workaround a nvcc bug that prevented the code from compiling in optimized mode in some cases	2016-01-08 16:28:53 -08:00
Benoit Steiner	53749ff415	Prevent nvcc from miscompiling the cuda metakernel. Unfortunately this reintroduces some compulation warnings but it's much better than having to deal with random assertion failures.	2016-01-08 13:53:40 -08:00
Benoit Steiner	6639b7d6e8	Removed a couple of partial specialization that confuse nvcc and result in errors such as this: error: more than one partial specialization matches the template argument list of class "Eigen::internal::get<3, Eigen::internal::numeric_list<std::size_t, 1UL, 1UL, 1UL, 1UL>>" "Eigen::internal::get<n, Eigen::internal::numeric_list<T, a, as...>>" "Eigen::internal::get<n, Eigen::internal::numeric_list<T, as...>>"	2016-01-07 18:45:19 -08:00
Benoit Steiner	0cb2ca5de2	Fixed a typo.	2016-01-06 18:50:28 -08:00
Benoit Steiner	213459d818	Optimized the performance of broadcasting of scalars.	2016-01-06 18:47:45 -08:00
Benoit Steiner	cfff40b1d4	Improved the performance of reductions on CUDA devices	2016-01-04 17:25:00 -08:00
Benoit Steiner	515dee0baf	Added a 'divup' util to compute the floor of the quotient of two integers	2016-01-04 16:29:26 -08:00
Gael Guennebaud	978c379ed7	Add missing ctor from uint	2015-12-30 12:52:38 +01:00
Eugene Brevdo	f7362772e3	Add digamma for CPU + CUDA. Includes tests.	2015-12-24 21:15:38 -08:00
Benoit Steiner	bdcbc66a5c	Don't attempt to vectorize mean reductions of integers since we can't use SSE or AVX instructions to divide 2 integers.	2015-12-22 17:51:55 -08:00
Benoit Steiner	a1e08fb2a5	Optimized the configuration of the outer reduction cuda kernel	2015-12-22 16:30:10 -08:00
Benoit Steiner	9c7d96697b	Added missing define	2015-12-22 16:11:07 -08:00
Benoit Steiner	e7e6d01810	Made sure the optimized gpu reduction code is actually compiled.	2015-12-22 15:07:33 -08:00
Benoit Steiner	b5d2078c4a	Optimized outer reduction on GPUs.	2015-12-22 15:06:17 -08:00
Benoit Steiner	1c3e78319d	Added missing const	2015-12-21 15:05:01 -08:00
Benoit Steiner	1b82969559	Add alignment requirement for local buffer used by the slicing op.	2015-12-18 14:36:35 -08:00
Benoit Steiner	75a7fa1919	Doubled the speed of full reductions on GPUs.	2015-12-18 14:07:31 -08:00
Benoit Steiner	8dd17cbe80	Fixed a clang compilation warning triggered by the use of arrays of size 0.	2015-12-17 14:00:33 -08:00
Benoit Steiner	4aac55f684	Silenced some compilation warnings triggered by nvcc	2015-12-17 13:39:01 -08:00
Benoit Steiner	40e6250fc3	Made it possible to run tensor chipping operations on CUDA devices	2015-12-17 13:29:08 -08:00
Benoit Steiner	17352e2792	Made the entire TensorFixedSize api callable from a CUDA kernel.	2015-12-14 15:20:31 -08:00
Benoit Steiner	75e19fc7ca	Marked the tensor constructors as EIGEN_DEVICE_FUNC: This makes it possible to call them from a CUDA kernel.	2015-12-14 15:12:55 -08:00
Gael Guennebaud	ca39b1546e	Merged in ebrevdo/eigen (pull request PR-148) Add special functions to eigen: lgamma, erf, erfc.	2015-12-11 11:52:09 +01:00
Benoit Steiner	6af52a1227	Fixed a typo in the constructor of tensors of rank 5.	2015-12-10 23:31:12 -08:00
Benoit Steiner	8e00ea9a92	Fixed the coefficient accessors use for the 2d and 3d case when compiling without cxx11 support.	2015-12-10 22:45:10 -08:00
Eugene Brevdo	fa4f933c0f	Add special functions to Eigen: lgamma, erf, erfc. Includes CUDA support and unit tests.	2015-12-07 15:24:49 -08:00
Benoit Steiner	7dfe75f445	Fixed compilation warnings	2015-12-07 08:12:30 -08:00
Benoit Steiner	f4ca8ad917	Use signed integers instead of unsigned ones more consistently in the codebase.	2015-12-04 18:14:16 -08:00
Benoit Steiner	490d26e4c1	Use integers instead of std::size_t to encode the number of dimensions in the Tensor class since most of the code currently already use integers.	2015-12-04 10:15:11 -08:00
Benoit Steiner	d20efc974d	Made it possible to use the sigmoid functor within a CUDA kernel.	2015-12-04 09:38:15 -08:00
Benoit Steiner	029052d276	Deleted redundant code	2015-12-03 17:08:47 -08:00
Mark Borgerding	7ddcf97da7	added scalar_sign_op (both real,complex)	2015-11-24 17:15:07 -05:00
Benoit Steiner	44848ac39b	Fixed a bug in TensorArgMax.h	2015-11-23 15:58:47 -08:00
Benoit Steiner	547a8608e5	Fixed the implementation of Eigen::internal::count_leading_zeros for MSVC. Also updated the code to silence bogux warnings generated by nvcc when compilining this function.	2015-11-23 12:17:45 -08:00
Benoit Steiner	562078780a	Don't create more cuda blocks than necessary	2015-11-23 11:00:10 -08:00
Benoit Steiner	df31ca3b9e	Made it possible to refer t oa GPUDevice from code compile with a regular C++ compiler	2015-11-23 10:03:53 -08:00
Benoit Steiner	1e04059012	Deleted unused variable.	2015-11-23 08:36:54 -08:00
Benoit Steiner	9fa65d3838	Split TensorDeviceType.h in 3 files to make it more manageable	2015-11-20 17:42:50 -08:00
Benoit Steiner	a367804856	Added option to force the usage of the Eigen array class instead of the std::array class.	2015-11-20 12:41:40 -08:00
Benoit Steiner	383d1cc2ed	Added proper support for fast 64bit integer division on CUDA	2015-11-20 11:09:46 -08:00
Benoit Steiner	f37a5f1c53	Fixed compilation error triggered by nvcc	2015-11-19 14:34:26 -08:00
Benoit Steiner	f8df393165	Added support for 128bit integers on CUDA devices.	2015-11-19 13:57:27 -08:00
Benoit Steiner	1dd444ea71	Avoid using the version of TensorIntDiv optimized for 32-bit integers when the divisor can be equal to one since it isn't supported.	2015-11-18 11:37:58 -08:00
Benoit Steiner	f1fbd74db9	Added sanity check	2015-11-13 09:07:27 -08:00
Benoit Steiner	7815b84be4	Fixed a compilation warning	2015-11-12 20:16:59 -08:00
Benoit Steiner	10a91930cc	Fixed a compilation warning triggered by nvcc	2015-11-12 20:10:52 -08:00

... 2 3 4 5 6 ...

526 Commits