Antonio Sanchez 9ee9ac81de Fix shfl* macros for CUDA/HIP
The `shfl*` functions are `__device__` only, and adjusted `#ifdef`s so
they are defined whenever the corresponding CUDA/HIP ones are.

Also changed the HIP/CUDA<9.0 versions to cast to int instead of
doing the conversion `half`<->`float`.

Fixes #2083
2020-12-04 17:18:32 +00:00
..
2020-12-04 17:18:32 +00:00
2017-11-10 14:11:22 +01:00
2017-11-27 22:11:57 +01:00
SVD
2017-08-17 21:58:39 +02:00