Mark D Ryan 90a53ca6fd Fix the Packet16h version of ptranspose
The AVX512 version of ptranpose for PacketBlock<Packet16h,16> was
reordering the PacketBlock argument incorrectly.  This lead to errors in
the multiplication of matrices composed of 16 bit floats on AVX512
machines, if at least of the matrices was using RowMajor order.  This
error is responsible for one tensorflow unit test failure on AVX512
machines:

//tensorflow/python/kernel_tests:batch_matmul_op_test
2018-06-16 15:13:06 -07:00
..
2016-01-27 22:48:40 +01:00
2017-11-10 14:11:22 +01:00
LU
2017-08-17 21:58:39 +02:00
2017-11-27 22:11:57 +01:00
QR
2017-08-17 21:58:39 +02:00
2015-10-30 12:02:52 +01:00
2016-01-27 22:48:40 +01:00
SVD
2017-08-17 21:58:39 +02:00
2016-01-27 22:48:40 +01:00