140 Commits

Author SHA1 Message Date
Gael Guennebaud
fb1a29fed5 fix ICE and warning with gcc 4.2.4 2011-02-21 16:11:18 +01:00
Gael Guennebaud
8f8c67b8bd fix bug #186 (in 32 bits mode, gcc 4.3 messed up with pfirst for complex<float>) 2011-02-18 15:47:17 +01:00
Hauke Heibel
1a6597b8e4 MSVC does not like using uninitialized SSE variables, so we have to pass all zeros. 2011-02-12 21:29:16 +01:00
Gael Guennebaud
9d2bf35a05 implement optimized ploadu for MSVC10: this also fix bad code generation in gebp_kernel :) 2011-02-12 16:40:09 +01:00
Benoit Jacob
6a5a13e394 The pfirst hack is needed also on msvc 2010 as it gets completely nuts, even though it doesnt segfault as msvc 2008 did 2011-02-09 15:13:23 -05:00
Gael Guennebaud
d6c4ca4845 fix redundancy 2011-02-09 13:44:05 +01:00
Gael Guennebaud
c0d5131435 workaround gcc 4.2.1 ICE (fix bug #145) 2011-02-09 13:04:35 +01:00
Gael Guennebaud
c5c8efa575 workaround gcc 4.2 and 4.3 compilation issue with NEON 2011-02-07 16:41:21 +01:00
Jitse Niesen
e2d46eac42 Remove all references to EIGEN_TUNE_CPU_CACHE_SIZE.
This macro is no longer used as of revision 0212eec23f4cb64e8426bf32568156df302f8fcf
.
2011-02-04 22:33:53 +01:00
Gael Guennebaud
5887a086cf fix SSE3 issue (infinite loop after the ei_ => internal change) - this fix bug #174 2011-02-03 17:55:24 +01:00
Konstantinos Margaritis
e05c79cbd8 Fixed NEON compilation errors, changed float-abi back to softfp (which is the most used right now).
Some complex tests appear to segfault, needs a more careful look.
2010-12-10 20:27:46 +02:00
Gael Guennebaud
da05b6af0e fix some remainign issue with ei_ -> internal change 2010-11-16 15:54:48 +01:00
Hauke Heibel
7bc8e3ac09 Initial fixes for bug #85.
Renamed meta_{true|false} to {true|false}_type, meta_if to conditional, is_same_type to is_same, un{ref|pointer|const} to remove_{reference|pointer|const} and makeconst to add_const.
Changed boolean type 'ret' member to 'value'.
Changed 'ret' members refering to types to 'type'.
Adapted all code occurences.
2010-10-25 22:13:49 +02:00
Benoit Jacob
4716040703 bug #86 : use internal:: namespace instead of ei_ prefix 2010-10-25 10:15:22 -04:00
Gael Guennebaud
d4b664c4cd fix ugly conversion from double[2] to complex 2010-08-19 14:47:58 +02:00
Gael Guennebaud
aa2b46aa91 allow vectorization of mat44.col() by adding a InnerPanel boolean
template parameter to Block
2010-07-23 16:29:29 +02:00
Benoit Jacob
3a30a2bc3e forgot to remove a #endif 2010-08-13 14:03:38 -04:00
Benoit Jacob
b80d9dd42e fix determination of number of registers on sse:
__i386__ was not defined by MSVC 2010.
fixed as (2*sizeof(void*)).
also move that to SSE/ and let the default for unknown arch's be just 8.
2010-08-13 13:55:28 -04:00
Benoit Jacob
97ced33b33 Backed out changeset 40f6e26a247976ba1868520a4747e49e0739a42a
See thread on mailing list: "InnerPanel change mis-detects alignment?"
2010-08-11 00:04:06 -04:00
Gael Guennebaud
40f6e26a24 allow vectorization of mat44.col() by adding a InnerPanel boolean
template parameter to Block
2010-07-23 16:29:29 +02:00
Gael Guennebaud
c7f40e522e merge 2010-07-22 13:21:06 +02:00
Gael Guennebaud
0dfc5b296b fix strict aliasing issue 2010-07-22 13:16:53 +02:00
Gael Guennebaud
35f0bc70d8 fix a strict aliasing issue with gcc 4.3 2010-07-20 22:43:55 +02:00
Gael Guennebaud
ced1a45f82 add NEON ploaddup and pcplxflip functions 2010-07-20 14:24:01 +02:00
Gael Guennebaud
c2ee454df4 * fix compilation of mixed scalar product
* optimize mixed scalar products
2010-07-19 16:49:09 +02:00
Gael Guennebaud
6e157dd7c6 * fix a couple of remaining issues with previous commit,
* merge ei_product_blocking_traits into ei_gepb_traits
2010-07-19 15:45:13 +02:00
Gael Guennebaud
f8aae7a908 * _mm_loaddup_pd is slow
* optimize SSE ei_ploaddup<Packet4f>
2010-07-19 15:43:27 +02:00
Gael Guennebaud
cd0e5dca9b wip: extend the gebp kernel to optimize complex and mixed products 2010-07-19 08:50:59 +02:00
Gael Guennebaud
36d9b51a44 optimize non fused MADD, and add a flatten attribute macro to enforce
inlining within a function
2010-07-13 15:16:34 +02:00
Gael Guennebaud
f8678272a4 mixing types step 3:
- improve support of colmajor by vector and matrix - matrix
- now all configurations are well handled, but the perf are not always very good
2010-07-11 23:57:23 +02:00
Gael Guennebaud
ff96c94043 mixing types in product step 2:
* pload* and pset1 are now templated on the packet type
* gemv routines are now embeded into a structure with
  a consistent API with respect to gemm
* some configurations of vector * matrix and matrix * matrix works fine,
  some need more work...
2010-07-11 15:48:30 +02:00
Gael Guennebaud
4161b8be67 sync 2010-07-10 22:58:51 +02:00
Benoit Jacob
6dcd373b9d let ei_pset1 use _mm_loaddup_pd. Not a significant speed improvement, but also not a speed regression, and replaces 3 instructions by 1 single instruction. 2010-07-09 18:51:17 -04:00
Konstantinos Margaritis
6ad3f1ab1f Added NEON/Complex.h, ~3.5x faster than scalar std::complex<float>
minor fix in AltiVec Complex.h
2010-07-10 00:09:29 +03:00
Gael Guennebaud
96f9015807 disable MSVC optimization when the underlying compiler is ICC 2010-07-09 19:33:43 +02:00
Konstantinos Margaritis
642cc27eb1 forgot to commit ei_p4f_FORWARD; 2010-07-09 18:08:18 +03:00
Konstantinos Margaritis
d9e134c73c Altivec port of Complex.h.
Note: For some reason g++ 4.4 is >200% slower than g++ 4.3 on altivec code.
The same benchmark (bench_gemm) was tested, on the same hardware/OS (G4/Debian testing),
with same CFLAGS. With some code reorganizing I managed to get some minor gain
on 4.4, but I just could not reach 4.3 speed. This is most likely a bug, but I'm waiting
to see if it's fixed on 4.5. I'll look into this a bit more.
2010-07-09 17:54:41 +03:00
Gael Guennebaud
b1a17dbfe4 fix a few weird issues with gcc 4.3 32bits and complex<float> 2010-07-09 08:27:58 +02:00
Gael Guennebaud
300a226ffa scalars fitting in a single packet requires more work, step 1
* add a, Alignable trait
* update LinearVectorization assignment
2010-07-08 14:27:47 +02:00
Gael Guennebaud
2066ed91de enabling aligned loads/store for complex<double> is much more tricky,
so the temporary fix is to always perform unaligned load/store
2010-07-07 22:50:19 +02:00
Gael Guennebaud
d89925e6de an attempt to fix wrong unaligned store 2010-07-07 22:35:06 +02:00
Gael Guennebaud
31a36aa9c4 support for real * complex matrix product - step 1 (works for some special cases) 2010-07-07 19:49:09 +02:00
Gael Guennebaud
a2415388ef optimized conjugate products for SSE3 2010-07-07 16:37:20 +02:00
Gael Guennebaud
65257f6b29 optimize for SSE3 => significant speed up !! 2010-07-07 15:34:46 +02:00
Gael Guennebaud
dd18b22f0b optimize pmul for complex<double> 2010-07-07 15:29:04 +02:00
Gael Guennebaud
e07c0f6bb5 cleanning 2010-07-07 11:41:29 +02:00
Gael Guennebaud
b0896382a3 s/IsVectorized/Vectorizable 2010-07-07 11:10:46 +02:00
Gael Guennebaud
f8d3b4c060 fix mixing types in DiagonalProduct 2010-07-07 09:43:29 +02:00
Gael Guennebaud
bfa606d16f * add a IsVectorized mechanism (instead of packet-size>1...)
* vectorize complex<double>
2010-07-06 23:36:00 +02:00
Gael Guennebaud
d6454788d9 add support for vectorized conjugated products 2010-07-06 19:10:24 +02:00