Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Pascal support to JCUDF transcode (row_conversion) #10329

Merged
merged 2 commits into from
Feb 21, 2022

Conversation

mythrocks
Copy link
Contributor

#8444 modified JCUDF transcoding logic (in Java/JNI) to use cudaMemcpyAsync()
and cuda::barrier to allow for asynchronous memcpy on GPUs that support it.
While this works for __CUDA_ARCH__ >= 700, for older GPUs (e.g. Pascal),
JCUDF conversions cause CUDA errors and failures. E.g.

ai.rapids.cudf.CudfException: after reduction step 2: cudaErrorInvalidDeviceFunction:
invalid device function

cudaMemcpyAsync() is not supported on Pascal GPUs or prior. (They lack the hardware,
apparently.)
For older GPUs, rather than fail spectacularly, it would be good to provide
a more stable (if less efficient) fallback implementation, via memcpy().

This commit adds code to conditionally use cudaMemcpyAsync() or memcpy(),
depending on the GPU in play.

rapidsai#8444 modified JCUDF transcoding logic (in Java/JNI) to use cudaMemcpyAsync()
and cuda::barrier to allow for asynchronous memcpy on GPUs that support it.
While this works for __CUDA_ARCH__ >= 700, for older GPUs (e.g. Pascal),
JCUDF conversions cause CUDA errors and failures. E.g.
```
ai.rapids.cudf.CudfException: after reduction step 2: cudaErrorInvalidDeviceFunction:
invalid device function
```
For older GPUs, rather than fail spectacularly, it would be good to provide
a more stable (if less efficient) fallback implementation, via `memcpy()`.

This commit adds code to conditionally use `cudaMemcpyAsync()` or `memcpy()`,
depending on the GPU in play.
@mythrocks mythrocks requested a review from a team as a code owner February 18, 2022 05:51
@mythrocks mythrocks self-assigned this Feb 18, 2022
@github-actions github-actions bot added the Java Affects Java cuDF API. label Feb 18, 2022
@mythrocks mythrocks added bug Something isn't working non-breaking Non-breaking change labels Feb 18, 2022
@mythrocks
Copy link
Contributor Author

Note that this is based on @hyperbolic2346's work, over on his branch.

This was tested with a replica of the JCUDF logic made in libcudf, with a gtest using libcudftestutil. Those tests don't currently reside in java/, but are available on my impl branch for perusal. The test can be moved to java/src/test, once it supports the use of libcudftestutil.

@codecov
Copy link

codecov bot commented Feb 18, 2022

Codecov Report

Merging #10329 (8a95f30) into branch-22.04 (b28bad6) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@              Coverage Diff              @@
##           branch-22.04   #10329   +/-   ##
=============================================
  Coverage         10.63%   10.63%           
=============================================
  Files               122      122           
  Lines             20940    20940           
=============================================
  Hits               2228     2228           
  Misses            18712    18712           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b28bad6...8a95f30. Read the comment docs.

@hyperbolic2346
Copy link
Contributor

🚀

@mythrocks mythrocks added the 5 - DO NOT MERGE Hold off on merging; see PR for details label Feb 18, 2022
@mythrocks
Copy link
Contributor Author

Ah, shoot. It appears that the compilation introduces warnings to the build. Best not merge this until the warnings are sorted out.

@mythrocks mythrocks removed the 5 - DO NOT MERGE Hold off on merging; see PR for details label Feb 18, 2022
@mythrocks
Copy link
Contributor Author

I have managed to run tests, at last, against Pascal hardware:

local-mithunr@ipp1-0837:~/work$ nvidia-smi
Mon Feb 21 20:32:58 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro P5000        Off  | 00000000:01:00.0 Off |                  Off |
| 26%   38C    P8     6W / 180W |      6MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A       819      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+

Some test logs:

local-mithunr@ipp1-0837:~/work$ ./ROW_CONVERSION
Running main() from gmock_main.cc
[==========] Running 28 tests from 2 test suites.
[----------] Global test environment set-up.
[----------] 14 tests from ColumnToRowTests
[ RUN      ] ColumnToRowTests.Single
[       OK ] ColumnToRowTests.Single (611 ms)
[ RUN      ] ColumnToRowTests.SimpleString
[       OK ] ColumnToRowTests.DoubleString (0 ms)
[ RUN      ] ColumnToRowTests.BigStrings
[       OK ] ColumnToRowTests.ManyStrings (2906 ms)
[ RUN      ] ColumnToRowTests.Simple
[       OK ] ColumnToRowTests.Simple (1 ms)
[ RUN      ] ColumnToRowTests.SingleByteCol
[       OK ] ColumnToRowTests.SingleByteCol (1 ms)
[ RUN      ] ColumnToRowTests.Tall
[       OK ] ColumnToRowTests.Tall (64 ms)
[ RUN      ] ColumnToRowTests.Wide
[       OK ] ColumnToRowTests.Wide (38 ms)
[ RUN      ] ColumnToRowTests.SingleByteWide
[       OK ] ColumnToRowTests.SingleByteWide (37 ms)
[ RUN      ] ColumnToRowTests.Non2Power
[       OK ] ColumnToRowTests.Non2Power (36 ms)
[ RUN      ] ColumnToRowTests.Big
[       OK ] ColumnToRowTests.Big (450 ms)
[ RUN      ] ColumnToRowTests.Bigger
[       OK ] ColumnToRowTests.Bigger (2059 ms)
[ RUN      ] ColumnToRowTests.Biggest
[       OK ] ColumnToRowTests.Biggest (4096 ms)
[----------] 14 tests from ColumnToRowTests (10304 ms total)

[----------] 14 tests from RowToColumnTests
[ RUN      ] RowToColumnTests.Single
[       OK ] RowToColumnTests.Single (1 ms)
[ RUN      ] RowToColumnTests.Simple
[       OK ] RowToColumnTests.Simple (0 ms)
[ RUN      ] RowToColumnTests.Seeded
[       OK ] RowToColumnTests.Seeded (2 ms)
[ RUN      ] RowToColumnTests.SingleByteCol
[       OK ] RowToColumnTests.SingleByteCol (0 ms)
[ RUN      ] RowToColumnTests.Tall
[       OK ] RowToColumnTests.Tall (58 ms)
[ RUN      ] RowToColumnTests.Wide
[       OK ] RowToColumnTests.Wide (36 ms)
[ RUN      ] RowToColumnTests.SingleByteWide
[       OK ] RowToColumnTests.SingleByteWide (37 ms)
[ RUN      ] RowToColumnTests.AllTypes
[       OK ] RowToColumnTests.AllTypes (2 ms)
[ RUN      ] RowToColumnTests.AllTypesLarge
[       OK ] RowToColumnTests.AllTypesLarge (23 ms)
[ RUN      ] RowToColumnTests.Non2Power
[       OK ] RowToColumnTests.Non2Power (36 ms)
[ RUN      ] RowToColumnTests.Big
[       OK ] RowToColumnTests.Big (436 ms)
[ RUN      ] RowToColumnTests.Bigger
[       OK ] RowToColumnTests.Bigger (1997 ms)
[ RUN      ] RowToColumnTests.Biggest
[       OK ] RowToColumnTests.Biggest (9924 ms)
[ RUN      ] RowToColumnTests.MythConditional
[       OK ] RowToColumnTests.MythConditional (0 ms)
[----------] 14 tests from RowToColumnTests (12552 ms total)
[----------] Global test environment tear-down
[==========] 28 tests from 2 test suites ran. (22856 ms total)
[  PASSED  ] 28 tests.

This should be safe to merge now.

@mythrocks
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 4d262ae into rapidsai:branch-22.04 Feb 21, 2022
@mythrocks
Copy link
Contributor Author

This change has been merged now. Thanks for the reviews, chaps.

@hyperbolic2346
Copy link
Contributor

😍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Java Affects Java cuDF API. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants