Gael Guennebaud
01fd880424
Revert previous change and introduce a new workaround regarding gcc generating a shufps instruction instead of the more efficient pshufd instruction.
...
The trick consists in introducing a new pload1 function to be used in low level product kernels for which bug #203 does not apply.
Indeed, it turned out that using inline assembly prevents gcc of doing a good job at instructtion reordering.
2014-03-20 16:03:46 +01:00
Gael Guennebaud
c39a3fa7a1
Makes gcc to generate a pshufd instruction for pset1
2014-03-20 10:14:26 +01:00
Gael Guennebaud
a7621809fe
Remove useless register keyword, and optimize predux_min/max for SSE4
2014-01-25 16:54:13 +01:00
Gael Guennebaud
19521c83b8
bug #677 : fix usage of pld instrinsics for ccomplexes
2013-11-02 12:10:48 +01:00
Gael Guennebaud
6dc0e59b1e
Fix bug #677 : compilation issue on arm64 which does not have the PLD instruction
2013-10-31 13:52:43 +01:00
Gael Guennebaud
9f3f42d66a
fix a few "dead stores" warnings
2013-10-26 13:59:02 +02:00
Gael Guennebaud
4612a1cd87
Fix ploaddup and lin-spaced with AltiVec.
2013-09-10 16:13:59 +02:00
Gael Guennebaud
c47010e3d2
typo
2013-08-19 16:10:00 +02:00
Gael Guennebaud
d4dd6aaed2
Fix bug #642 : add vectorization of sqrt for doubles, and make sqrt really safe if EIGEN_FAST_MATH is disabled
2013-08-19 16:02:27 +02:00
Simon Pilgrim
fab0235369
Fix bug #590 : NEON Duplicate lane load
2013-06-23 14:13:21 +02:00
Gael Guennebaud
9f11f80db1
Make psqrt works with numeric_limits<float>::min
2013-06-14 10:55:05 +02:00
Jeff Dean
d5fa5001a7
Fix bug #613 : psqrt was incorrect for small numbers
2013-06-13 18:17:27 +02:00
Gael Guennebaud
62670c83a0
Fix bug #314 : move remaining math functions from internal to numext namespace
2013-06-10 23:40:56 +02:00
Simon Pilgrim
ca67c60150
Fix bug #591 : minor optimization in NEON vectorization support
2013-06-10 15:59:03 +02:00
Gael Guennebaud
b3adc4face
Add missing pconj specializations
2013-05-17 17:25:29 +02:00
Gael Guennebaud
d63712163c
Add SSE4 min/max for integers
2013-03-20 18:28:40 +01:00
Gael Guennebaud
8745da14d8
Fix SSE plog<float> to return -INF on 0
2013-02-14 23:34:05 +01:00
Gael Guennebaud
e4ec63aee7
Suppress annoying "may be used uninitialized in this function" warning with gcc >= 4.6
2013-01-24 11:59:17 +01:00
Gael Guennebaud
7d98c864ff
fix warning
2012-08-01 10:44:59 +02:00
Gael Guennebaud
22e0ebbc2c
fix lower acceptable bound of SSE pexp for double
2012-07-31 23:11:04 +02:00
Gael Guennebaud
e8aa1f00c5
add SSE pexp function for double, make use of _mm_floor_p* for pexp with SSE4.1
2012-07-27 23:40:04 +02:00
Benoit Jacob
69124cfca2
Automatic relicensing to MPL2 using Keirs script. Manual fixup follows.
2012-07-13 14:42:47 -04:00
Konstantinos Margaritis
d878cf2227
fix typo
2012-07-04 11:28:59 +03:00
Konstantinos Margaritis
f737536744
fix NEON port, use vget_lane_*() instead of temporary variables (saves extra
...
load/store), following advice by Josh Bleecher Snyder <josharian@gmail.com>.
Also implement pmadd() using vmla instead of nested padd/pmul.
2012-07-04 11:12:02 +03:00
Gael Guennebaud
a3e700db72
fix bug #475 : .exp() now returns +inf when overflow occurs (SSE)
2012-06-14 10:38:39 +02:00
kmargar
97cdf6ce9e
ARM NEON supports multiply-accumulate instruction vmla, use that in pmadd().
2012-05-28 14:55:23 +03:00
Jitse Niesen
3c412183b2
Get rid of include directives inside namespace blocks (bug #339 ).
2012-04-15 11:06:28 +01:00
Gael Guennebaud
634fedaf68
proper C++ casting
2012-01-31 18:56:25 +01:00
Gael Guennebaud
9c86ee2695
fix static inline versus inline static issues (the former is the correct order)
2012-01-31 12:58:52 +01:00
Marton Danoczy
f422668d39
Patches to support ARM NEON with Clang 3.0 and LLVM-GCC
2011-11-04 16:37:10 +01:00
Gael Guennebaud
c331c092d5
no comment
2011-09-21 14:20:41 +02:00
Gael Guennebaud
7301f4345c
quick workaround of MSVC9' ICE in pset1
2011-09-21 14:18:41 +02:00
Gael Guennebaud
f2837aebc4
NEON: fix plset
2011-05-18 21:12:08 +02:00
Gael Guennebaud
85c137ccd4
NEON: fix ploaddup
2011-05-18 08:15:47 +02:00
Gael Guennebaud
97b6d26f5b
fix compilation on ARM NEON (missing AlignedOnScalar)
2011-05-06 09:03:48 +02:00
Thomas Capricelli
883219041f
better fix for gcc 4.6.0 / ptrdiff_t, as suggested by Benoit
2011-05-05 18:48:18 +02:00
Thomas Capricelli
a18a1be42d
Fix compilation with gcc-4.6.0, patch provided by Anton Gladky <gladky.anton@gmail.com>,
...
working on debian packaging.
2011-05-05 00:44:24 +02:00
Gael Guennebaud
c8e1b679fa
re-enable fast pset1-pstore by introducing a new higher level pstore1 function
2011-03-02 10:55:44 +01:00
Benoit Jacob
eef03525b8
fix bug #203 : revert to using _mm_set1_p[sd]
2011-02-28 00:04:05 -05:00
Benoit Jacob
9be2712bf7
remove now-useless comments
2011-02-27 22:35:17 -05:00
Benoit Jacob
0612768c1c
fix bug #201 : Clang too has intrinsics bugs preventing us to use custom unaligned loads
2011-02-27 21:59:07 -05:00
Benoit Jacob
b3544ce2ae
bug #195 - fix this once and for all: just never use _mm_load_sd on gcc/i386, it generates redundant x87 ops
2011-02-27 17:26:59 -05:00
Benoit Jacob
5dfae4524b
fix bug #195 : fast unaligned load for integer using _mm_load_sd failed when the value interpreted as a NaN
2011-02-24 10:31:57 -05:00
Gael Guennebaud
bb9a465c5a
fix AltiVec ploaddup
2011-02-24 00:23:50 +03:00
Gael Guennebaud
23aae0d63e
fix pset1 for complex
2011-02-23 21:24:47 +03:00
Gael Guennebaud
c121e6f390
implement ploaddup for complex and SSE/NEON even though they are not used in practice
2011-02-23 16:31:42 +01:00
Gael Guennebaud
955c099eb5
implement ploaddup for altivec and add respective unit test
2011-02-23 18:20:55 +03:00
Gael Guennebaud
6e01780541
fix a couple of issues with pcplxflip
2011-02-23 17:51:40 +03:00
Gael Guennebaud
78e1a62c54
implement pcplxflip for altivec
2011-02-23 14:20:58 +01:00
Gael Guennebaud
7dc18b20bb
same for neon
2011-02-23 09:41:55 +01:00