Benoit Jacob
b3544ce2ae
bug #195 - fix this once and for all: just never use _mm_load_sd on gcc/i386, it generates redundant x87 ops
2011-02-27 17:26:59 -05:00
Benoit Jacob
5dfae4524b
fix bug #195 : fast unaligned load for integer using _mm_load_sd failed when the value interpreted as a NaN
2011-02-24 10:31:57 -05:00
Gael Guennebaud
bb9a465c5a
fix AltiVec ploaddup
2011-02-24 00:23:50 +03:00
Gael Guennebaud
23aae0d63e
fix pset1 for complex
2011-02-23 21:24:47 +03:00
Gael Guennebaud
c121e6f390
implement ploaddup for complex and SSE/NEON even though they are not used in practice
2011-02-23 16:31:42 +01:00
Gael Guennebaud
955c099eb5
implement ploaddup for altivec and add respective unit test
2011-02-23 18:20:55 +03:00
Gael Guennebaud
6e01780541
fix a couple of issues with pcplxflip
2011-02-23 17:51:40 +03:00
Gael Guennebaud
78e1a62c54
implement pcplxflip for altivec
2011-02-23 14:20:58 +01:00
Gael Guennebaud
7dc18b20bb
same for neon
2011-02-23 09:41:55 +01:00
Gael Guennebaud
32e7dae776
Altivec: fix infinite loop (ei_ -> internal:: change)
2011-02-23 09:41:02 +01:00
Gael Guennebaud
2fb5567e08
add missing AlignedOnScalar
2011-02-22 21:25:47 +01:00
Gael Guennebaud
39b27fb656
altivec compilation fix
2011-02-22 15:26:28 +01:00
Gael Guennebaud
659c97ee49
gcc 4.4 also defines float32_t as a special type
2011-02-22 10:04:09 +01:00
Gael Guennebaud
51da67f211
more compilation fixes for altivec
2011-02-21 20:36:20 +01:00
Gael Guennebaud
05545d0197
fix compilation
2011-02-21 17:47:31 +01:00
Gael Guennebaud
fb1a29fed5
fix ICE and warning with gcc 4.2.4
2011-02-21 16:11:18 +01:00
Gael Guennebaud
8f8c67b8bd
fix bug #186 (in 32 bits mode, gcc 4.3 messed up with pfirst for complex<float>)
2011-02-18 15:47:17 +01:00
Hauke Heibel
1a6597b8e4
MSVC does not like using uninitialized SSE variables, so we have to pass all zeros.
2011-02-12 21:29:16 +01:00
Gael Guennebaud
9d2bf35a05
implement optimized ploadu for MSVC10: this also fix bad code generation in gebp_kernel :)
2011-02-12 16:40:09 +01:00
Benoit Jacob
6a5a13e394
The pfirst hack is needed also on msvc 2010 as it gets completely nuts, even though it doesnt segfault as msvc 2008 did
2011-02-09 15:13:23 -05:00
Gael Guennebaud
d6c4ca4845
fix redundancy
2011-02-09 13:44:05 +01:00
Gael Guennebaud
c0d5131435
workaround gcc 4.2.1 ICE (fix bug #145 )
2011-02-09 13:04:35 +01:00
Gael Guennebaud
c5c8efa575
workaround gcc 4.2 and 4.3 compilation issue with NEON
2011-02-07 16:41:21 +01:00
Jitse Niesen
e2d46eac42
Remove all references to EIGEN_TUNE_CPU_CACHE_SIZE.
...
This macro is no longer used as of revision 0212eec23f4cb64e8426bf32568156df302f8fcf
.
2011-02-04 22:33:53 +01:00
Gael Guennebaud
5887a086cf
fix SSE3 issue (infinite loop after the ei_ => internal change) - this fix bug #174
2011-02-03 17:55:24 +01:00
Konstantinos Margaritis
e05c79cbd8
Fixed NEON compilation errors, changed float-abi back to softfp (which is the most used right now).
...
Some complex tests appear to segfault, needs a more careful look.
2010-12-10 20:27:46 +02:00
Gael Guennebaud
da05b6af0e
fix some remainign issue with ei_ -> internal change
2010-11-16 15:54:48 +01:00
Hauke Heibel
7bc8e3ac09
Initial fixes for bug #85 .
...
Renamed meta_{true|false} to {true|false}_type, meta_if to conditional, is_same_type to is_same, un{ref|pointer|const} to remove_{reference|pointer|const} and makeconst to add_const.
Changed boolean type 'ret' member to 'value'.
Changed 'ret' members refering to types to 'type'.
Adapted all code occurences.
2010-10-25 22:13:49 +02:00
Benoit Jacob
4716040703
bug #86 : use internal:: namespace instead of ei_ prefix
2010-10-25 10:15:22 -04:00
Gael Guennebaud
d4b664c4cd
fix ugly conversion from double[2] to complex
2010-08-19 14:47:58 +02:00
Gael Guennebaud
aa2b46aa91
allow vectorization of mat44.col() by adding a InnerPanel boolean
...
template parameter to Block
2010-07-23 16:29:29 +02:00
Benoit Jacob
3a30a2bc3e
forgot to remove a #endif
2010-08-13 14:03:38 -04:00
Benoit Jacob
b80d9dd42e
fix determination of number of registers on sse:
...
__i386__ was not defined by MSVC 2010.
fixed as (2*sizeof(void*)).
also move that to SSE/ and let the default for unknown arch's be just 8.
2010-08-13 13:55:28 -04:00
Benoit Jacob
97ced33b33
Backed out changeset 40f6e26a247976ba1868520a4747e49e0739a42a
...
See thread on mailing list: "InnerPanel change mis-detects alignment?"
2010-08-11 00:04:06 -04:00
Gael Guennebaud
40f6e26a24
allow vectorization of mat44.col() by adding a InnerPanel boolean
...
template parameter to Block
2010-07-23 16:29:29 +02:00
Gael Guennebaud
c7f40e522e
merge
2010-07-22 13:21:06 +02:00
Gael Guennebaud
0dfc5b296b
fix strict aliasing issue
2010-07-22 13:16:53 +02:00
Gael Guennebaud
35f0bc70d8
fix a strict aliasing issue with gcc 4.3
2010-07-20 22:43:55 +02:00
Gael Guennebaud
ced1a45f82
add NEON ploaddup and pcplxflip functions
2010-07-20 14:24:01 +02:00
Gael Guennebaud
c2ee454df4
* fix compilation of mixed scalar product
...
* optimize mixed scalar products
2010-07-19 16:49:09 +02:00
Gael Guennebaud
6e157dd7c6
* fix a couple of remaining issues with previous commit,
...
* merge ei_product_blocking_traits into ei_gepb_traits
2010-07-19 15:45:13 +02:00
Gael Guennebaud
f8aae7a908
* _mm_loaddup_pd is slow
...
* optimize SSE ei_ploaddup<Packet4f>
2010-07-19 15:43:27 +02:00
Gael Guennebaud
cd0e5dca9b
wip: extend the gebp kernel to optimize complex and mixed products
2010-07-19 08:50:59 +02:00
Gael Guennebaud
36d9b51a44
optimize non fused MADD, and add a flatten attribute macro to enforce
...
inlining within a function
2010-07-13 15:16:34 +02:00
Gael Guennebaud
f8678272a4
mixing types step 3:
...
- improve support of colmajor by vector and matrix - matrix
- now all configurations are well handled, but the perf are not always very good
2010-07-11 23:57:23 +02:00
Gael Guennebaud
ff96c94043
mixing types in product step 2:
...
* pload* and pset1 are now templated on the packet type
* gemv routines are now embeded into a structure with
a consistent API with respect to gemm
* some configurations of vector * matrix and matrix * matrix works fine,
some need more work...
2010-07-11 15:48:30 +02:00
Gael Guennebaud
4161b8be67
sync
2010-07-10 22:58:51 +02:00
Benoit Jacob
6dcd373b9d
let ei_pset1 use _mm_loaddup_pd. Not a significant speed improvement, but also not a speed regression, and replaces 3 instructions by 1 single instruction.
2010-07-09 18:51:17 -04:00
Konstantinos Margaritis
6ad3f1ab1f
Added NEON/Complex.h, ~3.5x faster than scalar std::complex<float>
...
minor fix in AltiVec Complex.h
2010-07-10 00:09:29 +03:00
Gael Guennebaud
96f9015807
disable MSVC optimization when the underlying compiler is ICC
2010-07-09 19:33:43 +02:00