Skip to content

Commit

Permalink
Add Pascal support to JCUDF transcode (row_conversion) (#10329)
Browse files Browse the repository at this point in the history
#8444 modified JCUDF transcoding logic (in Java/JNI) to use `cudaMemcpyAsync()`
and `cuda::barrier` to allow for asynchronous memcpy on GPUs that support it.
While this works for `__CUDA_ARCH__ >= 700`, for older GPUs (e.g. Pascal),
JCUDF conversions cause CUDA errors and failures. E.g.
```
ai.rapids.cudf.CudfException: after reduction step 2: cudaErrorInvalidDeviceFunction:
invalid device function
```
`cudaMemcpyAsync()` is not supported on Pascal GPUs or prior. (They lack the hardware,
apparently.)
For older GPUs, rather than fail spectacularly, it would be good to provide
a more stable (if less efficient) fallback implementation, via `memcpy()`.

This commit adds code to conditionally use `cudaMemcpyAsync()` or `memcpy()`,
depending on the GPU in play.

Authors:
  - MithunR (https://github.com/mythrocks)

Approvers:
  - Robert (Bobby) Evans (https://github.com/revans2)
  - Nghia Truong (https://github.com/ttnghia)

URL: #10329
  • Loading branch information
mythrocks authored Feb 21, 2022
1 parent 527d4ee commit 4d262ae
Showing 1 changed file with 107 additions and 58 deletions.
Loading

0 comments on commit 4d262ae

Please sign in to comment.