Gael Guennebaud
aa2b46aa91
allow vectorization of mat44.col() by adding a InnerPanel boolean
...
template parameter to Block
2010-07-23 16:29:29 +02:00
Benoit Jacob
3a30a2bc3e
forgot to remove a #endif
2010-08-13 14:03:38 -04:00
Benoit Jacob
b80d9dd42e
fix determination of number of registers on sse:
...
__i386__ was not defined by MSVC 2010.
fixed as (2*sizeof(void*)).
also move that to SSE/ and let the default for unknown arch's be just 8.
2010-08-13 13:55:28 -04:00
Benoit Jacob
97ced33b33
Backed out changeset 40f6e26a247976ba1868520a4747e49e0739a42a
...
See thread on mailing list: "InnerPanel change mis-detects alignment?"
2010-08-11 00:04:06 -04:00
Gael Guennebaud
40f6e26a24
allow vectorization of mat44.col() by adding a InnerPanel boolean
...
template parameter to Block
2010-07-23 16:29:29 +02:00
Gael Guennebaud
c7f40e522e
merge
2010-07-22 13:21:06 +02:00
Gael Guennebaud
0dfc5b296b
fix strict aliasing issue
2010-07-22 13:16:53 +02:00
Gael Guennebaud
35f0bc70d8
fix a strict aliasing issue with gcc 4.3
2010-07-20 22:43:55 +02:00
Gael Guennebaud
ced1a45f82
add NEON ploaddup and pcplxflip functions
2010-07-20 14:24:01 +02:00
Gael Guennebaud
c2ee454df4
* fix compilation of mixed scalar product
...
* optimize mixed scalar products
2010-07-19 16:49:09 +02:00
Gael Guennebaud
6e157dd7c6
* fix a couple of remaining issues with previous commit,
...
* merge ei_product_blocking_traits into ei_gepb_traits
2010-07-19 15:45:13 +02:00
Gael Guennebaud
f8aae7a908
* _mm_loaddup_pd is slow
...
* optimize SSE ei_ploaddup<Packet4f>
2010-07-19 15:43:27 +02:00
Gael Guennebaud
cd0e5dca9b
wip: extend the gebp kernel to optimize complex and mixed products
2010-07-19 08:50:59 +02:00
Gael Guennebaud
36d9b51a44
optimize non fused MADD, and add a flatten attribute macro to enforce
...
inlining within a function
2010-07-13 15:16:34 +02:00
Gael Guennebaud
f8678272a4
mixing types step 3:
...
- improve support of colmajor by vector and matrix - matrix
- now all configurations are well handled, but the perf are not always very good
2010-07-11 23:57:23 +02:00
Gael Guennebaud
ff96c94043
mixing types in product step 2:
...
* pload* and pset1 are now templated on the packet type
* gemv routines are now embeded into a structure with
a consistent API with respect to gemm
* some configurations of vector * matrix and matrix * matrix works fine,
some need more work...
2010-07-11 15:48:30 +02:00
Gael Guennebaud
4161b8be67
sync
2010-07-10 22:58:51 +02:00
Benoit Jacob
6dcd373b9d
let ei_pset1 use _mm_loaddup_pd. Not a significant speed improvement, but also not a speed regression, and replaces 3 instructions by 1 single instruction.
2010-07-09 18:51:17 -04:00
Konstantinos Margaritis
6ad3f1ab1f
Added NEON/Complex.h, ~3.5x faster than scalar std::complex<float>
...
minor fix in AltiVec Complex.h
2010-07-10 00:09:29 +03:00
Gael Guennebaud
96f9015807
disable MSVC optimization when the underlying compiler is ICC
2010-07-09 19:33:43 +02:00
Konstantinos Margaritis
642cc27eb1
forgot to commit ei_p4f_FORWARD;
2010-07-09 18:08:18 +03:00
Konstantinos Margaritis
d9e134c73c
Altivec port of Complex.h.
...
Note: For some reason g++ 4.4 is >200% slower than g++ 4.3 on altivec code.
The same benchmark (bench_gemm) was tested, on the same hardware/OS (G4/Debian testing),
with same CFLAGS. With some code reorganizing I managed to get some minor gain
on 4.4, but I just could not reach 4.3 speed. This is most likely a bug, but I'm waiting
to see if it's fixed on 4.5. I'll look into this a bit more.
2010-07-09 17:54:41 +03:00
Gael Guennebaud
b1a17dbfe4
fix a few weird issues with gcc 4.3 32bits and complex<float>
2010-07-09 08:27:58 +02:00
Gael Guennebaud
300a226ffa
scalars fitting in a single packet requires more work, step 1
...
* add a, Alignable trait
* update LinearVectorization assignment
2010-07-08 14:27:47 +02:00
Gael Guennebaud
2066ed91de
enabling aligned loads/store for complex<double> is much more tricky,
...
so the temporary fix is to always perform unaligned load/store
2010-07-07 22:50:19 +02:00
Gael Guennebaud
d89925e6de
an attempt to fix wrong unaligned store
2010-07-07 22:35:06 +02:00
Gael Guennebaud
31a36aa9c4
support for real * complex matrix product - step 1 (works for some special cases)
2010-07-07 19:49:09 +02:00
Gael Guennebaud
a2415388ef
optimized conjugate products for SSE3
2010-07-07 16:37:20 +02:00
Gael Guennebaud
65257f6b29
optimize for SSE3 => significant speed up !!
2010-07-07 15:34:46 +02:00
Gael Guennebaud
dd18b22f0b
optimize pmul for complex<double>
2010-07-07 15:29:04 +02:00
Gael Guennebaud
e07c0f6bb5
cleanning
2010-07-07 11:41:29 +02:00
Gael Guennebaud
b0896382a3
s/IsVectorized/Vectorizable
2010-07-07 11:10:46 +02:00
Gael Guennebaud
f8d3b4c060
fix mixing types in DiagonalProduct
2010-07-07 09:43:29 +02:00
Gael Guennebaud
bfa606d16f
* add a IsVectorized mechanism (instead of packet-size>1...)
...
* vectorize complex<double>
2010-07-06 23:36:00 +02:00
Gael Guennebaud
d6454788d9
add support for vectorized conjugated products
2010-07-06 19:10:24 +02:00
Gael Guennebaud
c69a226192
* extend the Has* packet traits and makes all functor use it
...
* extend the packing routines to support conjugation
2010-07-05 23:27:54 +02:00
Gael Guennebaud
e1eccfad3f
add intitial support for the vectorization of complex<float>
2010-07-05 16:18:09 +02:00
Konstantinos Margaritis
cf3616b2c0
AltiVec signed integer pmadd removed, proved to be 2x slower than the scalar trait(!).
2010-06-28 21:24:55 +03:00
Gael Guennebaud
75b6d2b2f8
fix very annoying warning (gcc 4.3): type qualifiers ignored on function return type
2010-06-25 13:20:34 +02:00
Gael Guennebaud
28e64b0da3
email change
2010-06-24 23:21:58 +02:00
Benoit Jacob
f0a6d56f07
fix linking errors with multiply defined functions
2010-06-18 09:01:34 -04:00
Benoit Jacob
134ca4acb3
packet math functions:
...
- take const Packet& args like the other packet funcs
- SSE specializations: make them be actual template specializations
2010-06-15 08:29:21 -04:00
Gael Guennebaud
88cd6885be
Add a proof concept API to configure the blocking parameters at runtime.
...
After validation of the final API I'll update the other products to use it.
2010-06-07 16:35:25 +02:00
Konstantinos Margaritis
9337f371d2
(proper commit this time)
...
replaced _mm_prefetch in GeneralBlockPanelKernel.h, with ei_prefetch() inline function.
Implemented NEON and AltiVec versions, copied SSE version over from GeneralBlockPanelKernel.h.
Also in GCC case (or rather !_MSC_VER) it's implemented using __builtin_prefetch().
NEON managed to give a small but welcome boost, 0.88GFLOPS -> 0.91GFLOPS.
2010-04-24 00:58:44 +03:00
Konstantinos Margaritis
5acf46bd12
Backed out changeset 6972c140f737874d88da0e225c7c27b4563a4518
2010-04-24 00:57:10 +03:00
oem
6972c140f7
replaced _mm_prefetch in GeneralBlockPanelKernel.h, with ei_prefetch() inline function.
...
Implemented NEON and AltiVec versions, copied SSE version over from GeneralBlockPanelKernel.h.
Also in GCC case (or rather !_MSC_VER) it's implemented using __builtin_prefetch().
NEON managed to give a small but welcome boost, 0.88GFLOPS -> 0.91GFLOPS.
2010-04-24 00:44:14 +03:00
Hauke Heibel
9d6afdeb22
ei_psqrt fix for zero input
2010-04-01 15:10:52 +02:00
Gael Guennebaud
9a3b00c040
add missing cmake directives for arch/Default
2010-03-08 22:15:35 +01:00
Hauke Heibel
9fe040ad29
Reintroduced the if-clause for MSVC ei_ploadu via _loadu_.
2010-03-07 14:05:26 +01:00
Gael Guennebaud
afd7ee759b
fix copy pasted comment
2010-03-05 21:35:11 +01:00