eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-10-21 20:41:06 +08:00

Author	SHA1	Message	Date
Gael Guennebaud	aa2b46aa91	allow vectorization of mat44.col() by adding a InnerPanel boolean template parameter to Block	2010-07-23 16:29:29 +02:00
Benoit Jacob	3a30a2bc3e	forgot to remove a #endif	2010-08-13 14:03:38 -04:00
Benoit Jacob	b80d9dd42e	fix determination of number of registers on sse: __i386__ was not defined by MSVC 2010. fixed as (2sizeof(void)). also move that to SSE/ and let the default for unknown arch's be just 8.	2010-08-13 13:55:28 -04:00
Benoit Jacob	97ced33b33	Backed out changeset 40f6e26a247976ba1868520a4747e49e0739a42a See thread on mailing list: "InnerPanel change mis-detects alignment?"	2010-08-11 00:04:06 -04:00
Gael Guennebaud	40f6e26a24	allow vectorization of mat44.col() by adding a InnerPanel boolean template parameter to Block	2010-07-23 16:29:29 +02:00
Gael Guennebaud	c7f40e522e	merge	2010-07-22 13:21:06 +02:00
Gael Guennebaud	0dfc5b296b	fix strict aliasing issue	2010-07-22 13:16:53 +02:00
Gael Guennebaud	35f0bc70d8	fix a strict aliasing issue with gcc 4.3	2010-07-20 22:43:55 +02:00
Gael Guennebaud	ced1a45f82	add NEON ploaddup and pcplxflip functions	2010-07-20 14:24:01 +02:00
Gael Guennebaud	c2ee454df4	* fix compilation of mixed scalar product * optimize mixed scalar products	2010-07-19 16:49:09 +02:00
Gael Guennebaud	6e157dd7c6	* fix a couple of remaining issues with previous commit, * merge ei_product_blocking_traits into ei_gepb_traits	2010-07-19 15:45:13 +02:00
Gael Guennebaud	f8aae7a908	* _mm_loaddup_pd is slow * optimize SSE ei_ploaddup<Packet4f>	2010-07-19 15:43:27 +02:00
Gael Guennebaud	cd0e5dca9b	wip: extend the gebp kernel to optimize complex and mixed products	2010-07-19 08:50:59 +02:00
Gael Guennebaud	36d9b51a44	optimize non fused MADD, and add a flatten attribute macro to enforce inlining within a function	2010-07-13 15:16:34 +02:00
Gael Guennebaud	f8678272a4	mixing types step 3: - improve support of colmajor by vector and matrix - matrix - now all configurations are well handled, but the perf are not always very good	2010-07-11 23:57:23 +02:00
Gael Guennebaud	ff96c94043	mixing types in product step 2: * pload* and pset1 are now templated on the packet type * gemv routines are now embeded into a structure with a consistent API with respect to gemm * some configurations of vector * matrix and matrix * matrix works fine, some need more work...	2010-07-11 15:48:30 +02:00
Gael Guennebaud	4161b8be67	sync	2010-07-10 22:58:51 +02:00
Benoit Jacob	6dcd373b9d	let ei_pset1 use _mm_loaddup_pd. Not a significant speed improvement, but also not a speed regression, and replaces 3 instructions by 1 single instruction.	2010-07-09 18:51:17 -04:00
Konstantinos Margaritis	6ad3f1ab1f	Added NEON/Complex.h, ~3.5x faster than scalar std::complex<float> minor fix in AltiVec Complex.h	2010-07-10 00:09:29 +03:00
Gael Guennebaud	96f9015807	disable MSVC optimization when the underlying compiler is ICC	2010-07-09 19:33:43 +02:00
Konstantinos Margaritis	642cc27eb1	forgot to commit ei_p4f_FORWARD;	2010-07-09 18:08:18 +03:00
Konstantinos Margaritis	d9e134c73c	Altivec port of Complex.h. Note: For some reason g++ 4.4 is >200% slower than g++ 4.3 on altivec code. The same benchmark (bench_gemm) was tested, on the same hardware/OS (G4/Debian testing), with same CFLAGS. With some code reorganizing I managed to get some minor gain on 4.4, but I just could not reach 4.3 speed. This is most likely a bug, but I'm waiting to see if it's fixed on 4.5. I'll look into this a bit more.	2010-07-09 17:54:41 +03:00
Gael Guennebaud	b1a17dbfe4	fix a few weird issues with gcc 4.3 32bits and complex<float>	2010-07-09 08:27:58 +02:00
Gael Guennebaud	300a226ffa	scalars fitting in a single packet requires more work, step 1 * add a, Alignable trait * update LinearVectorization assignment	2010-07-08 14:27:47 +02:00
Gael Guennebaud	2066ed91de	enabling aligned loads/store for complex<double> is much more tricky, so the temporary fix is to always perform unaligned load/store	2010-07-07 22:50:19 +02:00
Gael Guennebaud	d89925e6de	an attempt to fix wrong unaligned store	2010-07-07 22:35:06 +02:00
Gael Guennebaud	31a36aa9c4	support for real * complex matrix product - step 1 (works for some special cases)	2010-07-07 19:49:09 +02:00
Gael Guennebaud	a2415388ef	optimized conjugate products for SSE3	2010-07-07 16:37:20 +02:00
Gael Guennebaud	65257f6b29	optimize for SSE3 => significant speed up !!	2010-07-07 15:34:46 +02:00
Gael Guennebaud	dd18b22f0b	optimize pmul for complex<double>	2010-07-07 15:29:04 +02:00
Gael Guennebaud	e07c0f6bb5	cleanning	2010-07-07 11:41:29 +02:00
Gael Guennebaud	b0896382a3	s/IsVectorized/Vectorizable	2010-07-07 11:10:46 +02:00
Gael Guennebaud	f8d3b4c060	fix mixing types in DiagonalProduct	2010-07-07 09:43:29 +02:00
Gael Guennebaud	bfa606d16f	* add a IsVectorized mechanism (instead of packet-size>1...) * vectorize complex<double>	2010-07-06 23:36:00 +02:00
Gael Guennebaud	d6454788d9	add support for vectorized conjugated products	2010-07-06 19:10:24 +02:00
Gael Guennebaud	c69a226192	* extend the Has* packet traits and makes all functor use it * extend the packing routines to support conjugation	2010-07-05 23:27:54 +02:00
Gael Guennebaud	e1eccfad3f	add intitial support for the vectorization of complex<float>	2010-07-05 16:18:09 +02:00
Konstantinos Margaritis	cf3616b2c0	AltiVec signed integer pmadd removed, proved to be 2x slower than the scalar trait(!).	2010-06-28 21:24:55 +03:00
Gael Guennebaud	75b6d2b2f8	fix very annoying warning (gcc 4.3): type qualifiers ignored on function return type	2010-06-25 13:20:34 +02:00
Gael Guennebaud	28e64b0da3	email change	2010-06-24 23:21:58 +02:00
Benoit Jacob	f0a6d56f07	fix linking errors with multiply defined functions	2010-06-18 09:01:34 -04:00
Benoit Jacob	134ca4acb3	packet math functions: - take const Packet& args like the other packet funcs - SSE specializations: make them be actual template specializations	2010-06-15 08:29:21 -04:00
Gael Guennebaud	88cd6885be	Add a proof concept API to configure the blocking parameters at runtime. After validation of the final API I'll update the other products to use it.	2010-06-07 16:35:25 +02:00
Konstantinos Margaritis	9337f371d2	(proper commit this time) replaced _mm_prefetch in GeneralBlockPanelKernel.h, with ei_prefetch() inline function. Implemented NEON and AltiVec versions, copied SSE version over from GeneralBlockPanelKernel.h. Also in GCC case (or rather !_MSC_VER) it's implemented using __builtin_prefetch(). NEON managed to give a small but welcome boost, 0.88GFLOPS -> 0.91GFLOPS.	2010-04-24 00:58:44 +03:00
Konstantinos Margaritis	5acf46bd12	Backed out changeset 6972c140f737874d88da0e225c7c27b4563a4518	2010-04-24 00:57:10 +03:00
oem	6972c140f7	replaced _mm_prefetch in GeneralBlockPanelKernel.h, with ei_prefetch() inline function. Implemented NEON and AltiVec versions, copied SSE version over from GeneralBlockPanelKernel.h. Also in GCC case (or rather !_MSC_VER) it's implemented using __builtin_prefetch(). NEON managed to give a small but welcome boost, 0.88GFLOPS -> 0.91GFLOPS.	2010-04-24 00:44:14 +03:00
Hauke Heibel	9d6afdeb22	ei_psqrt fix for zero input	2010-04-01 15:10:52 +02:00
Gael Guennebaud	9a3b00c040	add missing cmake directives for arch/Default	2010-03-08 22:15:35 +01:00
Hauke Heibel	9fe040ad29	Reintroduced the if-clause for MSVC ei_ploadu via _loadu_.	2010-03-07 14:05:26 +01:00
Gael Guennebaud	afd7ee759b	fix copy pasted comment	2010-03-05 21:35:11 +01:00

1 2 3

125 Commits