Srinivas Vasudevan
facdec5aa7
Add packetized versions of i0e and i1e special functions.
- In particular refactor the i0e and i1e code so scalar and vectorized path share code.
- Move chebevl to GenericPacketMathFunctions.
A brief benchmark with building Eigen with FMA, AVX and AVX2 flags
Before:
CPU: Intel Haswell with HyperThreading (6 cores)
Benchmark Time(ns) CPU(ns) Iterations
-----------------------------------------------------------------
BM_eigen_i0e_double/1 57.3 57.3 10000000
BM_eigen_i0e_double/8 398 398 1748554
BM_eigen_i0e_double/64 3184 3184 218961
BM_eigen_i0e_double/512 25579 25579 27330
BM_eigen_i0e_double/4k 205043 205042 3418
BM_eigen_i0e_double/32k 1646038 1646176 422
BM_eigen_i0e_double/256k 13180959 13182613 53
BM_eigen_i0e_double/1M 52684617 52706132 10
BM_eigen_i0e_float/1 28.4 28.4 24636711
BM_eigen_i0e_float/8 75.7 75.7 9207634
BM_eigen_i0e_float/64 512 512 1000000
BM_eigen_i0e_float/512 4194 4194 166359
BM_eigen_i0e_float/4k 32756 32761 21373
BM_eigen_i0e_float/32k 261133 261153 2678
BM_eigen_i0e_float/256k 2087938 2088231 333
BM_eigen_i0e_float/1M 8380409 8381234 84
BM_eigen_i1e_double/1 56.3 56.3 10000000
BM_eigen_i1e_double/8 397 397 1772376
BM_eigen_i1e_double/64 3114 3115 223881
BM_eigen_i1e_double/512 25358 25361 27761
BM_eigen_i1e_double/4k 203543 203593 3462
BM_eigen_i1e_double/32k 1613649 1613803 428
BM_eigen_i1e_double/256k 12910625 12910374 54
BM_eigen_i1e_double/1M 51723824 51723991 10
BM_eigen_i1e_float/1 28.3 28.3 24683049
BM_eigen_i1e_float/8 74.8 74.9 9366216
BM_eigen_i1e_float/64 505 505 1000000
BM_eigen_i1e_float/512 4068 4068 171690
BM_eigen_i1e_float/4k 31803 31806 21948
BM_eigen_i1e_float/32k 253637 253692 2763
BM_eigen_i1e_float/256k 2019711 2019918 346
BM_eigen_i1e_float/1M 8238681 8238713 86
After:
CPU: Intel Haswell with HyperThreading (6 cores)
Benchmark Time(ns) CPU(ns) Iterations
-----------------------------------------------------------------
BM_eigen_i0e_double/1 15.8 15.8 44097476
BM_eigen_i0e_double/8 99.3 99.3 7014884
BM_eigen_i0e_double/64 777 777 886612
BM_eigen_i0e_double/512 6180 6181 100000
BM_eigen_i0e_double/4k 48136 48140 14678
BM_eigen_i0e_double/32k 385936 385943 1801
BM_eigen_i0e_double/256k 3293324 3293551 228
BM_eigen_i0e_double/1M 12423600 12424458 57
BM_eigen_i0e_float/1 16.3 16.3 43038042
BM_eigen_i0e_float/8 30.1 30.1 23456931
BM_eigen_i0e_float/64 169 169 4132875
BM_eigen_i0e_float/512 1338 1339 516860
BM_eigen_i0e_float/4k 10191 10191 68513
BM_eigen_i0e_float/32k 81338 81337 8531
BM_eigen_i0e_float/256k 651807 651984 1000
BM_eigen_i0e_float/1M 2633821 2634187 268
BM_eigen_i1e_double/1 16.2 16.2 42352499
BM_eigen_i1e_double/8 110 110 6316524
BM_eigen_i1e_double/64 822 822 851065
BM_eigen_i1e_double/512 6480 6481 100000
BM_eigen_i1e_double/4k 51843 51843 10000
BM_eigen_i1e_double/32k 414854 414852 1680
BM_eigen_i1e_double/256k 3320001 3320568 212
BM_eigen_i1e_double/1M 13442795 13442391 53
BM_eigen_i1e_float/1 17.6 17.6 41025735
BM_eigen_i1e_float/8 35.5 35.5 19597891
BM_eigen_i1e_float/64 240 240 2924237
BM_eigen_i1e_float/512 1424 1424 485953
BM_eigen_i1e_float/4k 10722 10723 65162
BM_eigen_i1e_float/32k 86286 86297 8048
BM_eigen_i1e_float/256k 691821 691868 1000
BM_eigen_i1e_float/1M 2777336 2777747 256
This shows anywhere from a 50% to 75% improvement on these operations.
I've also benchmarked without any of these flags turned on, and got similar
performance to before (if not better).
Also tested packetmath.cpp + special_functions to ensure no regressions.
2019-09-11 18:34:02 -07:00
..
2019-01-17 11:33:43 +01:00
2018-07-30 14:52:15 +02:00
2019-01-22 15:30:50 +01:00
2018-10-09 22:54:54 +02:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2019-03-14 10:08:12 +01:00
2018-07-17 14:46:15 +02:00
2018-07-17 15:52:58 +02:00
2019-01-17 01:17:39 +01:00
2019-01-25 14:54:39 +01:00
2019-01-15 15:09:49 +01:00
2019-01-25 14:54:39 +01:00
2016-05-18 14:03:03 +02:00
2016-05-18 14:03:03 +02:00
2016-05-18 14:03:03 +02:00
2019-01-17 01:17:39 +01:00
2018-07-17 14:46:15 +02:00
2019-05-31 14:08:34 -07:00
2018-07-17 14:46:15 +02:00
2019-01-25 14:54:39 +01:00
2018-07-18 23:33:07 +02:00
2019-03-18 11:38:36 +01:00
2018-07-17 14:46:15 +02:00
2018-11-23 15:37:09 +01:00
2019-02-19 10:31:56 +01:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2019-03-14 10:18:24 +01:00
2018-07-17 14:46:15 +02:00
2018-10-10 23:38:22 +02:00
2018-07-17 14:46:15 +02:00
2019-02-20 13:52:11 +01:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2019-01-17 19:14:07 +01:00
2018-07-17 14:46:15 +02:00
2013-11-06 18:17:59 +01:00
2018-11-16 11:24:51 +01:00
2018-07-18 23:27:37 +02:00
2018-10-11 09:45:30 +02:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2018-08-27 10:38:20 +02:00
2019-01-15 22:50:42 +01:00
2018-07-17 14:46:15 +02:00
2019-05-31 15:26:06 -07:00
2019-08-27 11:30:31 -07:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2019-01-25 14:54:39 +01:00
2019-09-11 15:40:07 +02:00
2019-03-14 11:40:28 +01:00
2018-07-17 14:46:15 +02:00
2018-09-10 18:57:28 +02:00
2019-01-29 10:27:13 +01:00
2018-07-17 14:46:15 +02:00
2018-11-21 15:59:47 +01:00
2019-01-17 01:17:39 +01:00
2018-07-17 14:46:15 +02:00
2018-07-17 15:52:58 +02:00
2018-07-17 14:46:15 +02:00
2019-02-11 17:56:20 +01:00
2019-06-27 12:25:09 +01:00
2018-07-18 02:26:43 -07:00
2018-08-28 18:32:39 +02:00
2018-07-17 14:46:15 +02:00
2018-08-27 13:07:34 +02:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2018-07-26 00:01:24 +02:00
2019-08-07 14:19:00 -07:00
2019-01-17 18:27:25 +01:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2019-02-18 22:03:47 +01:00
2018-07-17 14:46:15 +02:00
2019-01-15 11:18:48 +01:00
2019-09-11 18:34:02 -07:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2018-10-10 21:48:58 +02:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2018-11-16 11:24:51 +01:00
2019-09-10 23:29:52 +02:00
2019-02-18 14:45:55 +01:00
2018-07-17 14:46:15 +02:00
2018-07-17 15:52:58 +02:00
2019-09-11 15:04:25 +02:00
2019-09-10 23:29:52 +02:00
2019-09-11 15:04:25 +02:00
2018-07-17 14:46:15 +02:00
2019-09-11 15:04:25 +02:00
2019-09-10 16:25:24 +02:00
2019-01-17 01:17:39 +01:00
2019-01-17 01:17:39 +01:00
2019-01-17 01:17:39 +01:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2018-10-10 13:23:52 -07:00
2019-03-03 15:25:25 +01:00
2019-01-17 17:35:32 +01:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2019-01-25 14:54:39 +01:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2019-01-17 01:17:39 +01:00
2019-01-28 17:29:50 +01:00
2018-07-18 23:33:07 +02:00
2018-07-17 14:46:15 +02:00
2018-07-20 17:51:17 +02:00
2018-07-17 14:46:15 +02:00
2018-10-16 00:43:44 +02:00
2019-01-16 15:24:59 +01:00
2018-07-17 14:46:15 +02:00
2019-02-20 13:59:34 +01:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2019-02-19 22:57:51 +01:00
2018-07-17 14:46:15 +02:00
2018-07-16 18:55:40 +02:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2018-10-07 21:54:49 +02:00
2018-10-07 21:54:49 +02:00
2018-10-07 21:54:49 +02:00
2018-10-07 21:54:49 +02:00
2018-10-07 21:54:49 +02:00
2018-10-07 21:54:49 +02:00
2018-11-09 21:45:10 +01:00
2018-07-17 14:46:15 +02:00
2019-01-17 01:17:39 +01:00
2018-07-12 17:16:40 +02:00
2018-07-17 14:46:15 +02:00
2019-01-15 10:51:03 +01:00
2019-05-10 14:57:05 +02:00
2019-02-20 15:23:23 +01:00
2018-11-23 15:12:06 +01:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2018-09-21 14:40:26 +02:00
2018-07-17 14:46:15 +02:00
2018-12-06 16:55:00 +01:00
2019-01-17 16:55:42 +01:00
2018-07-17 14:46:15 +02:00
2019-01-15 15:21:14 +01:00