-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Pascal support to JCUDF transcode (row_conversion) #10329
Add Pascal support to JCUDF transcode (row_conversion) #10329
Conversation
rapidsai#8444 modified JCUDF transcoding logic (in Java/JNI) to use cudaMemcpyAsync() and cuda::barrier to allow for asynchronous memcpy on GPUs that support it. While this works for __CUDA_ARCH__ >= 700, for older GPUs (e.g. Pascal), JCUDF conversions cause CUDA errors and failures. E.g. ``` ai.rapids.cudf.CudfException: after reduction step 2: cudaErrorInvalidDeviceFunction: invalid device function ``` For older GPUs, rather than fail spectacularly, it would be good to provide a more stable (if less efficient) fallback implementation, via `memcpy()`. This commit adds code to conditionally use `cudaMemcpyAsync()` or `memcpy()`, depending on the GPU in play.
Note that this is based on @hyperbolic2346's work, over on his branch. This was tested with a replica of the JCUDF logic made in |
Codecov Report
@@ Coverage Diff @@
## branch-22.04 #10329 +/- ##
=============================================
Coverage 10.63% 10.63%
=============================================
Files 122 122
Lines 20940 20940
=============================================
Hits 2228 2228
Misses 18712 18712 Continue to review full report at Codecov.
|
🚀 |
Ah, shoot. It appears that the compilation introduces warnings to the build. Best not merge this until the warnings are sorted out. |
I have managed to run tests, at last, against Pascal hardware:
Some test logs:
This should be safe to merge now. |
@gpucibot merge |
This change has been merged now. Thanks for the reviews, chaps. |
😍 |
#8444 modified JCUDF transcoding logic (in Java/JNI) to use
cudaMemcpyAsync()
and
cuda::barrier
to allow for asynchronous memcpy on GPUs that support it.While this works for
__CUDA_ARCH__ >= 700
, for older GPUs (e.g. Pascal),JCUDF conversions cause CUDA errors and failures. E.g.
cudaMemcpyAsync()
is not supported on Pascal GPUs or prior. (They lack the hardware,apparently.)
For older GPUs, rather than fail spectacularly, it would be good to provide
a more stable (if less efficient) fallback implementation, via
memcpy()
.This commit adds code to conditionally use
cudaMemcpyAsync()
ormemcpy()
,depending on the GPU in play.