Rasmus Munk Larsen 3ed67cb0bb Fix a bug in the implementation of Carmack's fast sqrt algorithm in Eigen (enabled by EIGEN_FAST_MATH), which causes the vectorized parts of the computation to return -0.0 instead of NaN for negative arguments.
Benchmark speed in Giga-sqrts/s
Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
-----------------------------------------
                    SSE        AVX
Fast=1              2.529G     4.380G
Fast=0              1.944G     1.898G
Fast=1 fixed        2.214G     3.739G

This table illustrates the worst case in terms speed impact: It was measured by repeatedly computing the sqrt of an n=4096 float vector that fits in L1 cache. For large vectors the operation becomes memory bound and the differences between the different versions almost negligible.
2016-10-04 14:22:56 -07:00
..
2016-05-18 14:03:03 +02:00
2016-09-21 21:53:00 +02:00
2014-01-24 12:51:33 +01:00
2013-08-12 07:39:24 +02:00
2016-09-17 14:14:01 +02:00
2016-07-09 23:37:11 +02:00
2016-07-06 11:05:30 +02:00
2016-06-03 08:12:14 +02:00
2015-10-23 10:36:33 +02:00
2016-07-07 11:03:01 +02:00
2015-06-16 22:11:41 +02:00
2016-08-31 13:04:29 +02:00
2015-10-28 11:42:14 +01:00
2015-11-16 13:33:54 +01:00
2016-05-19 22:48:16 +02:00
2015-08-04 16:12:16 +02:00
2015-10-23 10:36:33 +02:00
2016-01-30 22:26:17 +01:00
2016-07-12 17:19:26 +02:00
2016-09-21 17:08:51 +02:00
2016-04-13 22:18:02 +02:00
2016-07-06 22:25:24 +02:00