Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add Pascal support to JCUDF transcode (row_conversion) (#10329)
#8444 modified JCUDF transcoding logic (in Java/JNI) to use `cudaMemcpyAsync()` and `cuda::barrier` to allow for asynchronous memcpy on GPUs that support it. While this works for `__CUDA_ARCH__ >= 700`, for older GPUs (e.g. Pascal), JCUDF conversions cause CUDA errors and failures. E.g. ``` ai.rapids.cudf.CudfException: after reduction step 2: cudaErrorInvalidDeviceFunction: invalid device function ``` `cudaMemcpyAsync()` is not supported on Pascal GPUs or prior. (They lack the hardware, apparently.) For older GPUs, rather than fail spectacularly, it would be good to provide a more stable (if less efficient) fallback implementation, via `memcpy()`. This commit adds code to conditionally use `cudaMemcpyAsync()` or `memcpy()`, depending on the GPU in play. Authors: - MithunR (https://github.com/mythrocks) Approvers: - Robert (Bobby) Evans (https://github.com/revans2) - Nghia Truong (https://github.com/ttnghia) URL: #10329
- Loading branch information