Ilya Tokar e1cb6369b0 Add AVX vector path to float2half/half2float
Makes e. g. matrix multiplication 2x faster:
name         old cpu/op  new cpu/op  delta
BM_convers   181ms ± 1%    62ms ± 9%  -65.82%  (p=0.016 n=4+5)

Tested on all possible input values (not adding tests, since they
take a long time).
2021-10-28 13:59:01 -04:00
..