Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add Pascal support to JCUDF transcode (row_conversion):
rapidsai#8444 modified JCUDF transcoding logic (in Java/JNI) to use cudaMemcpyAsync() and cuda::barrier to allow for asynchronous memcpy on GPUs that support it. While this works for __CUDA_ARCH__ >= 700, for older GPUs (e.g. Pascal), JCUDF conversions cause CUDA errors and failures. E.g. ``` ai.rapids.cudf.CudfException: after reduction step 2: cudaErrorInvalidDeviceFunction: invalid device function ``` For older GPUs, rather than fail spectacularly, it would be good to provide a more stable (if less efficient) fallback implementation, via `memcpy()`. This commit adds code to conditionally use `cudaMemcpyAsync()` or `memcpy()`, depending on the GPU in play.
- Loading branch information