eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-04-23 10:09:36 +08:00

Author	SHA1	Message	Date
Antonio Sanchez	414c42bfcf	Fix cuda clang builds	2025-03-12 21:47:26 -07:00
Antonio Sánchez	d45ac54008	Correct use of EIGEN_CUDACC to respect EIGEN_NO_CUDA.	2022-02-04 22:24:31 +00:00
nluehr	4707c3aa86	Fix incorrect integer cast in predux<half2>(). Bug corrupts results on Maxwell and earlier GPU architectures. (cherry picked from commit dd6de618c3fda4275aff3a57c590f82b6e628ac1)	2020-09-04 19:12:05 +02:00
Gael Guennebaud	6eb4ce5f8e	backport some nvcc 9 fixes	2018-07-30 14:45:08 +02:00
Gael Guennebaud	e7c065ec71	bug #1462 : remove all occurences of the deprecated __CUDACC_VER__ macro by introducing EIGEN_CUDACC_VER	2017-08-24 11:06:47 +02:00
Benoit Steiner	c80587c92b	Merged eigen/eigen into default	2016-11-03 03:55:11 -07:00
Benoit Steiner	7a0e96b80d	Gate the code that refers to cuda fp16 primitives more thoroughly	2016-11-01 12:08:09 -07:00
Benoit Steiner	38b6048e14	Deleted redundant implementation of predux	2016-10-12 14:37:56 -07:00
Benoit Steiner	78d2926508	Merged eigen/eigen into default	2016-10-12 13:46:29 -07:00
Benoit Steiner	2e2f48e30e	Take advantage of AVX512 instructions whenever possible to speedup the processing of 16 bit floats.	2016-10-12 13:45:39 -07:00
Benoit Steiner	d485d12c51	Added missing AVX intrinsics for fp16: in particular, implemented predux which is required by the matrix-vector code.	2016-10-06 10:41:03 -07:00
Benoit Steiner	698ff69450	Properly characterize the CUDA packet primitives for fp16 as device only	2016-10-04 16:53:30 -07:00
Igor Babuschkin	59bacfe520	Fix compilation on CUDA 8 by removing call to h2log1p	2016-08-15 23:38:05 +01:00
Igor Babuschkin	aee693ac52	Add log1p support for CUDA and half floats	2016-08-08 20:24:59 +01:00
Benoit Steiner	02fe89f5ef	half implementation has been moved to half_impl namespace	2016-07-29 15:09:34 -07:00
Gael Guennebaud	395c835f4b	Fix CUDA compilation	2016-07-22 15:30:24 +02:00
Gael Guennebaud	47afc9a365	More cleaning in half: - put its definition and functions in its own half_impl namespace such that the free function does not polute the Eigen namespace while still making them visible for half through ADL. - expose Eigen::half throguh a using statement - move operator<< from std to half_float namespace	2016-07-22 14:33:28 +02:00
Benoit Steiner	8fd57a97f2	Enable the vectorization of adds and mults of fp16	2016-06-07 18:22:18 -07:00
Benoit Steiner	b6e306f189	Improved support for CUDA 8.0	2016-05-31 09:47:59 -07:00
Benoit Steiner	3a5d6a3c38	Disable the use of MMX instructions since the code is broken on many platforms	2016-05-27 09:13:26 -07:00
Gael Guennebaud	7ff5fadcc0	Disable usage of MMX with msvc.	2016-05-26 17:58:46 +02:00
Gael Guennebaud	cc1ab64f29	Add missing inclusion of mmintrin.h	2016-05-26 09:51:50 +02:00
Benoit Steiner	3585ff585e	Silenced a compilation warning	2016-05-25 22:09:19 -07:00
Benoit Steiner	efeb89dcdb	Specify the rounding mode in the correct location	2016-05-25 17:53:24 -07:00
Benoit Steiner	0322c66a3f	Explicitly specify the rounding mode when converting floats to fp16	2016-05-25 15:56:15 -07:00
Benoit Steiner	ed783872ab	Disable the use of MMX instructions on x86_64 since too many compilers only support them in 32bit mode	2016-05-25 08:27:26 -07:00
Benoit Steiner	d041a528da	Cleaned up the fp16 code a little more	2016-05-24 22:43:26 -07:00
Benoit Steiner	ff4a289572	Cleaned up the fp16 code	2016-05-24 18:50:09 -07:00
Benoit Steiner	e617711306	Don't attempt to use MMX instructions with visualstudio since they're only partially supported.	2016-05-24 06:43:58 -07:00
Benoit Steiner	b517ab349b	Use the generic ploadquad intrinsics since it does the job	2016-05-24 00:11:17 -07:00
Benoit Steiner	646872cb3b	Worked around missing clang intrinsics	2016-05-24 00:07:08 -07:00
Benoit Steiner	33a94f5dc7	Use the Index type instead of integers to specify the strides in pgather/pscatter	2016-05-23 20:37:30 -07:00
Benoit Steiner	6bc684ab6a	Added missing alignment in the fp16 packet traits	2016-05-23 20:32:30 -07:00
Benoit Steiner	283e33dea4	ptranspose is not a template.	2016-05-23 19:55:55 -07:00
Benoit Steiner	7d980d74e5	Started to vectorize the processing of 16bit floats on CPU.	2016-05-23 15:21:40 -07:00
Benoit Steiner	fae0493f98	Fixed a couple of bugs related to the Pascalfamily of GPUs H: Enter commit message. Lines beginning with 'HG:' are removed.	2016-05-11 23:02:26 -07:00
Benoit Steiner	b6a517c47d	Added the ability to load fp16 using the texture path. Improved the performance of some reductions on fp16	2016-05-11 21:26:48 -07:00
Benoit Steiner	56a1757d74	Made predux_min and predux_max on fp16 less noisy	2016-05-11 17:37:34 -07:00
Benoit Steiner	9091351dbe	__ldg is only available with cuda architectures >= 3.5	2016-05-11 15:22:13 -07:00
Benoit Steiner	02f76dae2d	Fixed a typo	2016-05-11 15:08:38 -07:00
Benoit Steiner	0b9e3dcd06	Added packet primitives to compute exp, log, sqrt and rsqrt on fp16. This improves the performance by 10 to 30%.	2016-05-10 11:05:33 -07:00
Benoit Steiner	8adf5cc70f	Added support for packet processing of fp16 on kepler and maxwell gpus	2016-05-06 19:16:43 -07:00
Benoit Steiner	995f202cea	Disabled the use of half2 on cuda devices of compute capability < 5.3	2016-04-08 14:43:36 -07:00
Benoit Steiner	3394379319	Fixed the packet_traits for half floats.	2016-04-08 13:33:59 -07:00
Benoit Steiner	14ea7c7ec7	Fixed packet_traits<half>	2016-04-06 19:30:21 -07:00
Benoit Steiner	048c4d6efd	Made half floats usable on hardware that doesn't support them natively.	2016-03-11 17:21:42 -08:00
Benoit Steiner	456e038a4e	Fixed the +=, -=, *= and /= operators to return a reference	2016-03-10 15:17:44 -08:00
Benoit Steiner	1032441c6f	Enable partial support for half floats on Kepler GPUs.	2016-03-03 10:34:20 -08:00
Benoit Steiner	6270d851e3	Declare the half float type as arithmetic.	2016-02-22 13:59:33 -08:00
Benoit Steiner	584832cb3c	Implemented the ptranspose function on half floats	2016-02-21 12:44:53 -08:00

1 2

55 Commits