Rasmus Munk Larsen
b47c777993
Block transposeInPlace() when the matrix is real and square. This yields a large speedup because we transpose in registers (or L1 if we spill), instead of one packet at a time, which in the worst case makes the code write to the same cache line PacketSize times instead of once.
rmlarsen@rmlarsen4:.../eigen_bench/google3$ benchy --benchmarks=.*TransposeInPlace.*float.* --reference=srcfs experimental/users/rmlarsen/bench:matmul_bench
10 / 10 [====================================================================================================================================================================================================================] 100.00% 2m50s
(Generated by http://go/benchy. Settings: --runs 5 --benchtime 1s --reference "srcfs" --benchmarks ".*TransposeInPlace.*float.*" experimental/users/rmlarsen/bench:matmul_bench)
name old time/op new time/op delta
BM_TransposeInPlace<float>/4 9.84ns ± 0% 6.51ns ± 0% -33.80% (p=0.008 n=5+5)
BM_TransposeInPlace<float>/8 23.6ns ± 1% 17.6ns ± 0% -25.26% (p=0.016 n=5+4)
BM_TransposeInPlace<float>/16 78.8ns ± 0% 60.3ns ± 0% -23.50% (p=0.029 n=4+4)
BM_TransposeInPlace<float>/32 302ns ± 0% 229ns ± 0% -24.40% (p=0.008 n=5+5)
BM_TransposeInPlace<float>/59 1.03µs ± 0% 0.84µs ± 1% -17.87% (p=0.016 n=5+4)
BM_TransposeInPlace<float>/64 1.20µs ± 0% 0.89µs ± 1% -25.81% (p=0.008 n=5+5)
BM_TransposeInPlace<float>/128 8.96µs ± 0% 3.82µs ± 2% -57.33% (p=0.008 n=5+5)
BM_TransposeInPlace<float>/256 152µs ± 3% 17µs ± 2% -89.06% (p=0.008 n=5+5)
BM_TransposeInPlace<float>/512 837µs ± 1% 208µs ± 0% -75.15% (p=0.008 n=5+5)
BM_TransposeInPlace<float>/1k 4.28ms ± 2% 1.08ms ± 2% -74.72% (p=0.008 n=5+5)
2020-04-28 16:08:16 +00:00
..
2019-01-17 11:33:43 +01:00
2019-10-08 16:28:14 +02:00
2020-04-28 16:08:16 +00:00
2018-10-09 22:54:54 +02:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2019-03-14 10:08:12 +01:00
2018-07-17 14:46:15 +02:00
2018-07-17 15:52:58 +02:00
2019-12-19 13:42:14 +01:00
2019-01-25 14:54:39 +01:00
2019-01-15 15:09:49 +01:00
2019-01-25 14:54:39 +01:00
2016-05-18 14:03:03 +02:00
2016-05-18 14:03:03 +02:00
2016-05-18 14:03:03 +02:00
2019-01-17 01:17:39 +01:00
2018-07-17 14:46:15 +02:00
2019-12-11 18:22:57 +00:00
2020-04-13 16:41:20 +02:00
2019-01-25 14:54:39 +01:00
2018-07-18 23:33:07 +02:00
2019-03-18 11:38:36 +01:00
2018-07-17 14:46:15 +02:00
2018-11-23 15:37:09 +01:00
2019-02-19 10:31:56 +01:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2019-03-14 10:18:24 +01:00
2018-07-17 14:46:15 +02:00
2018-10-10 23:38:22 +02:00
2018-07-17 14:46:15 +02:00
2019-02-20 13:52:11 +01:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2019-01-17 19:14:07 +01:00
2018-07-17 14:46:15 +02:00
2013-11-06 18:17:59 +01:00
2018-11-16 11:24:51 +01:00
2018-07-18 23:27:37 +02:00
2018-10-11 09:45:30 +02:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2019-12-03 14:51:14 +01:00
2020-01-11 15:02:29 +01:00
2018-07-17 14:46:15 +02:00
2019-05-31 15:26:06 -07:00
2019-08-27 11:30:31 -07:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2019-01-25 14:54:39 +01:00
2019-12-19 17:30:11 +01:00
2020-03-23 18:09:02 +00:00
2018-07-17 14:46:15 +02:00
2020-01-07 14:35:35 +00:00
2019-01-29 10:27:13 +01:00
2019-12-11 18:22:57 +00:00
2018-07-17 14:46:15 +02:00
2018-11-21 15:59:47 +01:00
2019-01-17 01:17:39 +01:00
2018-07-17 14:46:15 +02:00
2018-07-17 15:52:58 +02:00
2018-07-17 14:46:15 +02:00
2019-02-11 17:56:20 +01:00
2019-10-08 09:15:17 +02:00
2018-07-18 02:26:43 -07:00
2018-08-28 18:32:39 +02:00
2018-07-17 14:46:15 +02:00
2019-10-10 17:41:47 +02:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2018-07-26 00:01:24 +02:00
2019-08-07 14:19:00 -07:00
2019-01-17 18:27:25 +01:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2019-02-18 22:03:47 +01:00
2018-07-17 14:46:15 +02:00
2019-01-15 11:18:48 +01:00
2020-01-11 14:57:22 +01:00
2020-04-23 18:17:14 +00:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2019-11-13 21:16:53 +01:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2018-11-16 11:24:51 +01:00
2019-09-10 23:29:52 +02:00
2019-02-18 14:45:55 +01:00
2018-07-17 14:46:15 +02:00
2018-07-17 15:52:58 +02:00
2019-09-11 15:04:25 +02:00
2019-09-10 23:29:52 +02:00
2019-09-11 15:04:25 +02:00
2018-07-17 14:46:15 +02:00
2019-09-11 15:04:25 +02:00
2019-09-10 16:25:24 +02:00
2019-11-13 21:16:53 +01:00
2019-01-17 01:17:39 +01:00
2019-01-17 01:17:39 +01:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2018-10-10 13:23:52 -07:00
2019-03-03 15:25:25 +01:00
2019-01-17 17:35:32 +01:00
2018-07-17 14:46:15 +02:00
2019-09-24 11:09:58 +02:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2019-01-25 14:54:39 +01:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2019-01-17 01:17:39 +01:00
2019-01-28 17:29:50 +01:00
2018-07-18 23:33:07 +02:00
2018-07-17 14:46:15 +02:00
2018-07-20 17:51:17 +02:00
2018-07-17 14:46:15 +02:00
2018-10-16 00:43:44 +02:00
2019-01-16 15:24:59 +01:00
2018-07-17 14:46:15 +02:00
2019-02-20 13:59:34 +01:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2019-02-19 22:57:51 +01:00
2018-07-17 14:46:15 +02:00
2018-07-16 18:55:40 +02:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2018-10-07 21:54:49 +02:00
2018-10-07 21:54:49 +02:00
2018-10-07 21:54:49 +02:00
2018-10-07 21:54:49 +02:00
2018-10-07 21:54:49 +02:00
2018-10-07 21:54:49 +02:00
2020-04-07 19:48:28 +00:00
2018-07-17 14:46:15 +02:00
2019-01-17 01:17:39 +01:00
2018-07-12 17:16:40 +02:00
2018-07-17 14:46:15 +02:00
2019-01-15 10:51:03 +01:00
2019-05-10 14:57:05 +02:00
2019-02-20 15:23:23 +01:00
2018-11-23 15:12:06 +01:00
2018-07-17 14:46:15 +02:00
2018-07-17 14:46:15 +02:00
2018-09-21 14:40:26 +02:00
2018-07-17 14:46:15 +02:00
2020-02-25 01:07:04 +00:00
2019-01-17 16:55:42 +01:00
2018-07-17 14:46:15 +02:00
2019-01-15 15:21:14 +01:00