This MR fixes a bunch of smaller issues, making the following changes:
* Template parameters in the documentation are documented with `\tparam` instead
of `\param`
* Superfluous semicolon warnings fixed
* Fixed the type of literals used to initialize float variables
Some checks used incorrect values, partly from copy-paste errors,
partly from the change in behaviour introduced in !398.
Modified results to match scipy, simplified tests by updating
`VERIFY_IS_CWISE_APPROX` to work for scalars.
This introduces new functions:
```
// returns kernel(args...) running on the CPU.
Eigen::run_on_cpu(Kernel kernel, Args&&... args);
// returns kernel(args...) running on the GPU.
Eigen::run_on_gpu(Kernel kernel, Args&&... args);
Eigen::run_on_gpu_with_hint(size_t buffer_capacity_hint, Kernel kernel, Args&&... args);
// returns kernel(args...) running on the GPU if using
// a GPU compiler, or CPU otherwise.
Eigen::run(Kernel kernel, Args&&... args);
Eigen::run_with_hint(size_t buffer_capacity_hint, Kernel kernel, Args&&... args);
```
Running on the GPU is accomplished by:
- Serializing the kernel inputs on the CPU
- Transferring the inputs to the GPU
- Passing the kernel and serialized inputs to a GPU kernel
- Deserializing the inputs on the GPU
- Running `kernel(inputs...)` on the GPU
- Serializing all output parameters and the return value
- Transferring the serialized outputs back to the CPU
- Deserializing the outputs and return value on the CPU
- Returning the deserialized return value
All inputs must be serializable (currently POD types, `Eigen::Matrix`
and `Eigen::Array`). The kernel must also be POD (though usually
contains no actual data).
Tested on CUDA 9.1, 10.2, 11.3, with g++-6, g++-8, g++-10 respectively.
This MR depends on !622, !623, !624.
The `Serializer<T>` class implements a binary serialization that
can write to (`serialize`) and read from (`deserialize`) a byte
buffer. Also added convenience routines for serializing
a list of arguments.
This will mainly be for testing, specifically to transfer data to
and from the GPU.
The original fails with nvcc+msvc - there's a static order of initialization
issue leading to registered tests being cleared. The test then fails on
```
VERIFY(EigenTest::all().size()>0);
```
since `EigenTest` no longer contains any tests. The singleton pattern
fixes this.
NVCC does not understand `__forceinline`, so we need to use `inline`
when compiling for GPU.
ICC specializes `std::complex` operators for `float` and `double`
by default, which cannot be used on device and conflict with Eigen's
workaround in CUDA/Complex.h. This can be prevented by defining
`_OVERRIDE_COMPLEX_SPECIALIZATION_` before including `<complex>`.
Added this define to the tests and to `Eigen/Core`, but this will
not work if the user includes `<complex>` before `<Eigen/Core>`.
ICC also seems to generate a duplicate `Map` symbol in
`PlainObjectBase`:
```
error: "Map" has already been declared in the current scope
static ConstMapType Map(const Scalar *data)
```
I tracked this down to `friend class Eigen::Map`. Putting the `friend`
statements at the bottom of the class seems to resolve this issue.
Fixes#2180
The macro `__cplusplus` is not defined correctly in MSVC unless building
with the the `/Zc:__cplusplus` flag. Instead, it defines `_MSVC_LANG` to the
specified c++ standard version number.
Here we introduce `EIGEN_CPLUSPLUS` which will contain the c++ version
number both for MSVC and otherwise. This simplifies checks for supported
features.
Also replaced most instances of standard version checking via `__cplusplus`
with the existing `EIGEN_COMP_CXXVER` macro for better clarity.
Fixes: #2170
This provide several advantages:
- more flexibility in designing unit tests
- unit tests can be glued to speed up compilation
- unit tests are compiled with same predefined macros, which is a requirement for zapcc
There are two major changes (and a few minor ones which are not listed here...see PR discussion for details)
1. Eigen::half implementations for HIP and CUDA have been merged.
This means that
- `CUDA/Half.h` and `HIP/hcc/Half.h` got merged to a new file `GPU/Half.h`
- `CUDA/PacketMathHalf.h` and `HIP/hcc/PacketMathHalf.h` got merged to a new file `GPU/PacketMathHalf.h`
- `CUDA/TypeCasting.h` and `HIP/hcc/TypeCasting.h` got merged to a new file `GPU/TypeCasting.h`
After this change the `HIP/hcc` directory only contains one file `math_constants.h`. That will go away too once that file becomes a part of the HIP install.
2. new macros EIGEN_GPUCC, EIGEN_GPU_COMPILE_PHASE and EIGEN_HAS_GPU_FP16 have been added and the code has been updated to use them where appropriate.
- `EIGEN_GPUCC` is the same as `(EIGEN_CUDACC || EIGEN_HIPCC)`
- `EIGEN_GPU_DEVICE_COMPILE` is the same as `(EIGEN_CUDA_ARCH || EIGEN_HIP_DEVICE_COMPILE)`
- `EIGEN_HAS_GPU_FP16` is the same as `(EIGEN_HAS_CUDA_FP16 or EIGEN_HAS_HIP_FP16)`