Antonio Sanchez 070d303d56 Add CUDA complex sqrt.
This is to support scalar `sqrt` of complex numbers `std::complex<T>` on
device, requested by Tensorflow folks.

Technically `std::complex` is not supported by NVCC on device
(though it is by clang), so the default `sqrt(std::complex<T>)` function only
works on the host. Here we create an overload to add back the
functionality.

Also modified the CMake file to add `--relaxed-constexpr` (or
equivalent) flag for NVCC to allow calling constexpr functions from
device functions, and added support for specifying compute architecture for
NVCC (was already available for clang).
2020-12-22 23:25:23 -08:00
..
2020-12-22 23:25:23 -08:00
2017-11-10 14:11:22 +01:00
2017-11-27 22:11:57 +01:00
SVD
2017-08-17 21:58:39 +02:00