Skip to content

Commit

Permalink
Add Pascal support to JCUDF transcode (row_conversion):
Browse files Browse the repository at this point in the history
rapidsai#8444 modified JCUDF transcoding logic (in Java/JNI) to use cudaMemcpyAsync()
and cuda::barrier to allow for asynchronous memcpy on GPUs that support it.
While this works for __CUDA_ARCH__ >= 700, for older GPUs (e.g. Pascal),
JCUDF conversions cause CUDA errors and failures. E.g.
```
ai.rapids.cudf.CudfException: after reduction step 2: cudaErrorInvalidDeviceFunction:
invalid device function
```
For older GPUs, rather than fail spectacularly, it would be good to provide
a more stable (if less efficient) fallback implementation, via `memcpy()`.

This commit adds code to conditionally use `cudaMemcpyAsync()` or `memcpy()`,
depending on the GPU in play.
  • Loading branch information
mythrocks committed Feb 18, 2022
1 parent b28bad6 commit 61b155b
Showing 1 changed file with 107 additions and 61 deletions.
Loading

0 comments on commit 61b155b

Please sign in to comment.