Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to run cuIO benchmarks with pinned buffers as input #15830

Merged
merged 12 commits into from
Jun 3, 2024

Conversation

vuule
Copy link
Contributor

@vuule vuule commented May 22, 2024

Description

Adds io_type::PINNED_BUFFER, which allows cuIO benchmarks to use a pinned buffer as an input. The output is still a std::vector in this case, same as with io_type::HOST_BUFFER.
Also stops the used of cudf::io::io_type in benchmarks, to allow benchmark-specific IO types, such as this one.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@vuule vuule added feature request New feature or request tests Unit testing for project cuIO cuIO issue non-breaking Non-breaking change labels May 22, 2024
@vuule vuule self-assigned this May 22, 2024
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label May 22, 2024
@vuule vuule force-pushed the fea-pinned-bm-io branch from ad17413 to 286bc3e Compare May 22, 2024 22:09
@vuule vuule marked this pull request as ready for review May 22, 2024 23:10
@vuule vuule requested a review from a team as a code owner May 22, 2024 23:10
@vuule vuule requested review from vyasr and davidwendt May 22, 2024 23:10
@vuule
Copy link
Contributor Author

vuule commented May 22, 2024

CC @GregoryKimball @nvdbaranec

@vuule vuule changed the base branch from branch-24.06 to branch-24.08 May 28, 2024 17:45
@vuule
Copy link
Contributor Author

vuule commented May 28, 2024

Parquet reader benchmarks (partial) show clear signal compared to pageable input:

|    io_type    | compression_type | cardinality | run_length | Samples |  CPU Time  | Noise |  GPU Time  | Noise | bytes_per_second | peak_memory_usage | encoded_file_size |
|---------------|------------------|-------------|------------|---------|------------|-------|------------|-------|------------------|-------------------|-------------------|
| PINNED_BUFFER |           SNAPPY |           0 |          1 |      6x |  94.025 ms | 0.38% |  94.016 ms | 0.38% |       5710404236 |         1.365 GiB |       463.356 MiB |
|   HOST_BUFFER |           SNAPPY |           0 |          1 |      5x | 109.794 ms | 0.35% | 109.785 ms | 0.35% |       4890200847 |         1.365 GiB |       463.356 MiB |
| DEVICE_BUFFER |           SNAPPY |           0 |          1 |      7x |  74.802 ms | 0.31% |  74.794 ms | 0.31% |       7178027874 |         1.365 GiB |       463.356 MiB |
| PINNED_BUFFER |             NONE |           0 |          1 |    285x |  52.570 ms | 1.84% |  52.561 ms | 1.84% |      10214162353 |       976.374 MiB |       472.458 MiB |
|   HOST_BUFFER |             NONE |           0 |          1 |    235x |  63.742 ms | 8.74% |  63.733 ms | 8.74% |       8423736635 |       976.374 MiB |       472.458 MiB |
| DEVICE_BUFFER |             NONE |           0 |          1 |    486x |  30.752 ms | 1.12% |  30.743 ms | 1.12% |      17462916129 |       976.374 MiB |       472.458 MiB |
| PINNED_BUFFER |           SNAPPY |        1000 |          1 |    303x |  49.492 ms | 0.87% |  49.483 ms | 0.87% |      10849505601 |       799.405 MiB |       149.632 MiB |
|   HOST_BUFFER |           SNAPPY |        1000 |          1 |     80x |  54.009 ms | 1.21% |  54.000 ms | 1.21% |       9941990707 |       799.451 MiB |       149.632 MiB |
| DEVICE_BUFFER |           SNAPPY |        1000 |          1 |     21x |  43.121 ms | 0.50% |  43.113 ms | 0.50% |      12452696478 |       799.405 MiB |       149.632 MiB |
| PINNED_BUFFER |             NONE |        1000 |          1 |    330x |  45.322 ms | 1.40% |  45.313 ms | 1.40% |      11847938588 |       660.763 MiB |       157.620 MiB |
|   HOST_BUFFER |             NONE |        1000 |          1 |    307x |  48.737 ms | 0.99% |  48.728 ms | 0.99% |      11017642711 |       660.763 MiB |       157.620 MiB |
| DEVICE_BUFFER |             NONE |        1000 |          1 |     14x |  37.741 ms | 0.35% |  37.732 ms | 0.35% |      14228494999 |       660.763 MiB |       157.620 MiB |
| PINNED_BUFFER |           SNAPPY |           0 |         32 |    240x |  46.794 ms | 0.89% |  46.785 ms | 0.89% |      11475211343 |       980.738 MiB |        64.295 MiB |
|   HOST_BUFFER |           SNAPPY |           0 |         32 |    305x |  49.157 ms | 1.48% |  49.148 ms | 1.48% |      10923512449 |       980.742 MiB |        64.295 MiB |
| DEVICE_BUFFER |           SNAPPY |           0 |         32 |     12x |  43.601 ms | 0.48% |  43.592 ms | 0.48% |      12315840649 |       980.738 MiB |        64.295 MiB |
| PINNED_BUFFER |             NONE |           0 |         32 |    325x |  46.055 ms | 0.92% |  46.046 ms | 0.92% |      11659360859 |       918.591 MiB |       413.967 MiB |
|   HOST_BUFFER |             NONE |           0 |         32 |     80x |  56.248 ms | 1.16% |  56.238 ms | 1.16% |       9546324040 |       918.591 MiB |       413.967 MiB |
| DEVICE_BUFFER |             NONE |           0 |         32 |    208x |  27.393 ms | 0.77% |  27.385 ms | 0.77% |      19604901646 |       918.591 MiB |       413.967 MiB |
| PINNED_BUFFER |           SNAPPY |        1000 |         32 |    383x |  39.060 ms | 1.18% |  39.052 ms | 1.18% |      13747741214 |       557.858 MiB |        24.034 MiB |
|   HOST_BUFFER |           SNAPPY |        1000 |         32 |     13x |  39.797 ms | 0.48% |  39.787 ms | 0.47% |      13493467556 |       557.865 MiB |        24.034 MiB |
| DEVICE_BUFFER |           SNAPPY |        1000 |         32 |    394x |  37.948 ms | 1.85% |  37.939 ms | 1.85% |      14150749943 |       557.858 MiB |        24.034 MiB |
| PINNED_BUFFER |             NONE |        1000 |         32 |    112x |  35.622 ms | 1.25% |  35.613 ms | 1.25% |      15074930029 |       533.921 MiB |        30.799 MiB |
|   HOST_BUFFER |             NONE |        1000 |         32 |    409x |  36.558 ms | 1.97% |  36.549 ms | 1.97% |      14689106036 |       533.921 MiB |        30.799 MiB |
| DEVICE_BUFFER |             NONE |        1000 |         32 |    272x |  33.879 ms | 1.07% |  33.870 ms | 1.07% |      15850926773 |       533.921 MiB |        30.799 MiB |

On the other hand, not observing a clear signal with the multithreaded Parquet benchmark, even in the single-threaded cases. Something we'll want to investigate as we look further into multithreaded scaling.

@vuule vuule requested a review from davidwendt May 31, 2024 21:52
@vuule vuule added the 5 - Ready to Merge Testing and reviews complete, ready to merge label May 31, 2024
@vuule
Copy link
Contributor Author

vuule commented Jun 3, 2024

/merge

@rapids-bot rapids-bot bot merged commit e66f4f5 into rapidsai:branch-24.08 Jun 3, 2024
70 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change tests Unit testing for project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants