Add Pascal support to JCUDF transcode (row_conversion) #10329

mythrocks · 2022-02-18T05:51:16Z

#8444 modified JCUDF transcoding logic (in Java/JNI) to use cudaMemcpyAsync()
and cuda::barrier to allow for asynchronous memcpy on GPUs that support it.
While this works for __CUDA_ARCH__ >= 700, for older GPUs (e.g. Pascal),
JCUDF conversions cause CUDA errors and failures. E.g.

ai.rapids.cudf.CudfException: after reduction step 2: cudaErrorInvalidDeviceFunction:
invalid device function

cudaMemcpyAsync() is not supported on Pascal GPUs or prior. (They lack the hardware,
apparently.)
For older GPUs, rather than fail spectacularly, it would be good to provide
a more stable (if less efficient) fallback implementation, via memcpy().

This commit adds code to conditionally use cudaMemcpyAsync() or memcpy(),
depending on the GPU in play.

rapidsai#8444 modified JCUDF transcoding logic (in Java/JNI) to use cudaMemcpyAsync() and cuda::barrier to allow for asynchronous memcpy on GPUs that support it. While this works for __CUDA_ARCH__ >= 700, for older GPUs (e.g. Pascal), JCUDF conversions cause CUDA errors and failures. E.g. ``` ai.rapids.cudf.CudfException: after reduction step 2: cudaErrorInvalidDeviceFunction: invalid device function ``` For older GPUs, rather than fail spectacularly, it would be good to provide a more stable (if less efficient) fallback implementation, via `memcpy()`. This commit adds code to conditionally use `cudaMemcpyAsync()` or `memcpy()`, depending on the GPU in play.

mythrocks · 2022-02-18T06:00:13Z

Note that this is based on @hyperbolic2346's work, over on his branch.

This was tested with a replica of the JCUDF logic made in libcudf, with a gtest using libcudftestutil. Those tests don't currently reside in java/, but are available on my impl branch for perusal. The test can be moved to java/src/test, once it supports the use of libcudftestutil.

codecov · 2022-02-18T07:12:39Z

Codecov Report

Merging #10329 (8a95f30) into branch-22.04 (b28bad6) will not change coverage.
The diff coverage is n/a.

@@              Coverage Diff              @@
##           branch-22.04   #10329   +/-   ##
=============================================
  Coverage         10.63%   10.63%           
=============================================
  Files               122      122           
  Lines             20940    20940           
=============================================
  Hits               2228     2228           
  Misses            18712    18712

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b28bad6...8a95f30. Read the comment docs.

hyperbolic2346 · 2022-02-18T15:38:45Z

🚀

mythrocks · 2022-02-18T19:47:48Z

Ah, shoot. It appears that the compilation introduces warnings to the build. Best not merge this until the warnings are sorted out.

java/src/main/native/src/row_conversion.cu

mythrocks · 2022-02-21T20:34:27Z

I have managed to run tests, at last, against Pascal hardware:

local-mithunr@ipp1-0837:~/work$ nvidia-smi
Mon Feb 21 20:32:58 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro P5000        Off  | 00000000:01:00.0 Off |                  Off |
| 26%   38C    P8     6W / 180W |      6MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A       819      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+

Some test logs:

local-mithunr@ipp1-0837:~/work$ ./ROW_CONVERSION
Running main() from gmock_main.cc
[==========] Running 28 tests from 2 test suites.
[----------] Global test environment set-up.
[----------] 14 tests from ColumnToRowTests
[ RUN      ] ColumnToRowTests.Single
[       OK ] ColumnToRowTests.Single (611 ms)
[ RUN      ] ColumnToRowTests.SimpleString
[       OK ] ColumnToRowTests.DoubleString (0 ms)
[ RUN      ] ColumnToRowTests.BigStrings
[       OK ] ColumnToRowTests.ManyStrings (2906 ms)
[ RUN      ] ColumnToRowTests.Simple
[       OK ] ColumnToRowTests.Simple (1 ms)
[ RUN      ] ColumnToRowTests.SingleByteCol
[       OK ] ColumnToRowTests.SingleByteCol (1 ms)
[ RUN      ] ColumnToRowTests.Tall
[       OK ] ColumnToRowTests.Tall (64 ms)
[ RUN      ] ColumnToRowTests.Wide
[       OK ] ColumnToRowTests.Wide (38 ms)
[ RUN      ] ColumnToRowTests.SingleByteWide
[       OK ] ColumnToRowTests.SingleByteWide (37 ms)
[ RUN      ] ColumnToRowTests.Non2Power
[       OK ] ColumnToRowTests.Non2Power (36 ms)
[ RUN      ] ColumnToRowTests.Big
[       OK ] ColumnToRowTests.Big (450 ms)
[ RUN      ] ColumnToRowTests.Bigger
[       OK ] ColumnToRowTests.Bigger (2059 ms)
[ RUN      ] ColumnToRowTests.Biggest
[       OK ] ColumnToRowTests.Biggest (4096 ms)
[----------] 14 tests from ColumnToRowTests (10304 ms total)

[----------] 14 tests from RowToColumnTests
[ RUN      ] RowToColumnTests.Single
[       OK ] RowToColumnTests.Single (1 ms)
[ RUN      ] RowToColumnTests.Simple
[       OK ] RowToColumnTests.Simple (0 ms)
[ RUN      ] RowToColumnTests.Seeded
[       OK ] RowToColumnTests.Seeded (2 ms)
[ RUN      ] RowToColumnTests.SingleByteCol
[       OK ] RowToColumnTests.SingleByteCol (0 ms)
[ RUN      ] RowToColumnTests.Tall
[       OK ] RowToColumnTests.Tall (58 ms)
[ RUN      ] RowToColumnTests.Wide
[       OK ] RowToColumnTests.Wide (36 ms)
[ RUN      ] RowToColumnTests.SingleByteWide
[       OK ] RowToColumnTests.SingleByteWide (37 ms)
[ RUN      ] RowToColumnTests.AllTypes
[       OK ] RowToColumnTests.AllTypes (2 ms)
[ RUN      ] RowToColumnTests.AllTypesLarge
[       OK ] RowToColumnTests.AllTypesLarge (23 ms)
[ RUN      ] RowToColumnTests.Non2Power
[       OK ] RowToColumnTests.Non2Power (36 ms)
[ RUN      ] RowToColumnTests.Big
[       OK ] RowToColumnTests.Big (436 ms)
[ RUN      ] RowToColumnTests.Bigger
[       OK ] RowToColumnTests.Bigger (1997 ms)
[ RUN      ] RowToColumnTests.Biggest
[       OK ] RowToColumnTests.Biggest (9924 ms)
[ RUN      ] RowToColumnTests.MythConditional
[       OK ] RowToColumnTests.MythConditional (0 ms)
[----------] 14 tests from RowToColumnTests (12552 ms total)
[----------] Global test environment tear-down
[==========] 28 tests from 2 test suites ran. (22856 ms total)
[  PASSED  ] 28 tests.

This should be safe to merge now.

mythrocks · 2022-02-21T20:55:58Z

@gpucibot merge

mythrocks · 2022-02-21T20:56:23Z

This change has been merged now. Thanks for the reviews, chaps.

hyperbolic2346 · 2022-02-21T21:04:27Z

😍

mythrocks requested a review from a team as a code owner February 18, 2022 05:51

mythrocks self-assigned this Feb 18, 2022

github-actions bot added the Java Affects Java cuDF API. label Feb 18, 2022

mythrocks added bug Something isn't working non-breaking Non-breaking change labels Feb 18, 2022

mythrocks added the DO NOT MERGE Hold off on merging; see PR for details label Feb 18, 2022

revans2 approved these changes Feb 18, 2022

View reviewed changes

Add back nv_diag_suppress for dyanmic init warning.

8a95f30

mythrocks removed the DO NOT MERGE Hold off on merging; see PR for details label Feb 18, 2022

ttnghia reviewed Feb 18, 2022

View reviewed changes

java/src/main/native/src/row_conversion.cu Show resolved Hide resolved

ttnghia approved these changes Feb 19, 2022

View reviewed changes

rapids-bot bot merged commit 4d262ae into rapidsai:branch-22.04 Feb 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Pascal support to JCUDF transcode (row_conversion) #10329

Add Pascal support to JCUDF transcode (row_conversion) #10329

mythrocks commented Feb 18, 2022

mythrocks commented Feb 18, 2022

codecov bot commented Feb 18, 2022 •

edited

Loading

hyperbolic2346 commented Feb 18, 2022

mythrocks commented Feb 18, 2022

mythrocks commented Feb 21, 2022

mythrocks commented Feb 21, 2022

mythrocks commented Feb 21, 2022

hyperbolic2346 commented Feb 21, 2022

Add Pascal support to JCUDF transcode (row_conversion) #10329

Add Pascal support to JCUDF transcode (row_conversion) #10329

Conversation

mythrocks commented Feb 18, 2022

mythrocks commented Feb 18, 2022

codecov bot commented Feb 18, 2022 • edited Loading

Codecov Report

hyperbolic2346 commented Feb 18, 2022

mythrocks commented Feb 18, 2022

mythrocks commented Feb 21, 2022

mythrocks commented Feb 21, 2022

mythrocks commented Feb 21, 2022

hyperbolic2346 commented Feb 21, 2022

codecov bot commented Feb 18, 2022 •

edited

Loading