Antonio Sanchez 3580a38298 Use native _Float16 for AVX512FP16 and update vectorization.
This allows us to do faster native scalar operations.  Also
updated half/quarter packets to use the native type if available.

Benchmark improvement:
```
Comparing ./2910_without_float16 to ./2910_with_float16
Benchmark                                               Time             CPU      Time Old      Time New       CPU Old       CPU New
------------------------------------------------------------------------------------------------------------------------------------
BM_CalcMat<float>/10000/768/500                      -0.0041         -0.0040      58276392      58039442      58273420      58039582
BM_CalcMat<_Float16>/10000/768/500                   +0.0073         +0.0073     642506339     647214446     642481384     647188303
BM_CalcMat<Eigen::half>/10000/768/500                -0.3170         -0.3170      92511115      63182101      92506771      63179258
BM_CalcVec<float>/10000/768/500                      +0.0022         +0.0022       5198157       5209469       5197913       5209334
BM_CalcVec<_Float16>/10000/768/500                   +0.0025         +0.0026      10133324      10159111      10132641      10158507
BM_CalcVec<Eigen::half>/10000/768/500                -0.7760         -0.7760      45337937      10156952      45336532      10156389
OVERALL_GEOMEAN                                      -0.2677         -0.2677             0             0             0             0
```

Fixes #2910.
2025-03-18 10:46:32 -07:00
..
2024-03-29 21:49:27 +00:00
2024-02-15 23:53:59 +00:00
2024-10-26 00:08:25 +00:00
2024-11-22 03:39:19 +00:00
2023-12-11 21:03:09 +00:00
2025-02-28 07:33:26 -08:00
2024-11-22 03:39:19 +00:00
2024-05-03 18:55:02 +00:00
2024-10-29 22:18:30 +00:00
2024-03-29 21:49:27 +00:00
2024-10-29 18:19:02 +00:00
2024-03-29 21:49:27 +00:00
2024-03-27 17:44:50 +00:00
2024-01-19 20:22:47 +00:00
2024-10-14 15:51:40 +00:00
2024-03-11 19:08:30 +00:00