Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

orc multithreaded benchmark #16009

Merged
merged 16 commits into from
Jun 14, 2024

Conversation

zpuller
Copy link
Contributor

@zpuller zpuller commented Jun 12, 2024

Description

Addresses: #15973

Adds multithreaded benchmarks for the ORC reader. Based off of the parquet equivalent in #15585

# Benchmark Results

## orc_multithreaded_read_decode_mixed

### [0] NVIDIA RTX 5880 Ada Generation

| cardinality | total_data_size | num_threads | num_cols | run_length | Samples | CPU Time  | Noise | GPU Time  | Noise | bytes_per_second | peak_memory_usage | encoded_file_size |
|-------------|-----------------|-------------|----------|------------|---------|-----------|-------|-----------|-------|------------------|-------------------|-------------------|
|        1000 |       536870912 |           1 |        4 |          8 |    338x | 44.348 ms | 1.18% | 44.343 ms | 1.18% |      12107185968 |       939.341 MiB |        39.557 MiB |
|        1000 |      1073741824 |           1 |        4 |          8 |     80x | 77.634 ms | 0.65% | 77.629 ms | 0.65% |      13831742649 |         1.834 GiB |        79.072 MiB |
|        1000 |       536870912 |           2 |        4 |          8 |    341x | 43.921 ms | 1.20% | 43.916 ms | 1.20% |      12224889363 |       825.333 MiB |        39.568 MiB |
|        1000 |      1073741824 |           2 |        4 |          8 |     80x | 75.418 ms | 0.70% | 75.414 ms | 0.70% |      14237999015 |         1.611 GiB |        79.113 MiB |
|        1000 |       536870912 |           4 |        4 |          8 |     80x | 42.682 ms | 1.18% | 42.678 ms | 1.18% |      12579566132 |       883.436 MiB |        39.587 MiB |
|        1000 |      1073741824 |           4 |        4 |          8 |      9x | 74.056 ms | 0.48% | 74.052 ms | 0.48% |      14499873867 |         1.724 GiB |        79.136 MiB |
|        1000 |       536870912 |           8 |        4 |          8 |     25x | 42.198 ms | 0.50% | 42.194 ms | 0.49% |      12723960975 |       940.562 MiB |        39.600 MiB |
|        1000 |      1073741824 |           8 |        4 |          8 |      8x | 73.933 ms | 0.49% | 73.929 ms | 0.49% |      14524042443 |         1.781 GiB |        79.175 MiB |

## orc_multithreaded_read_decode_fixed_width

### [0] NVIDIA RTX 5880 Ada Generation

| cardinality | total_data_size | num_threads | num_cols | run_length | Samples | CPU Time  | Noise | GPU Time  | Noise | bytes_per_second | peak_memory_usage | encoded_file_size |
|-------------|-----------------|-------------|----------|------------|---------|-----------|-------|-----------|-------|------------------|-------------------|-------------------|
|        1000 |       536870912 |           1 |        4 |          8 |     13x | 40.149 ms | 0.04% | 40.144 ms | 0.04% |      13373482726 |       643.390 MiB |        59.821 MiB |
|        1000 |      1073741824 |           1 |        4 |          8 |    211x | 71.216 ms | 0.67% | 71.211 ms | 0.67% |      15078297784 |         1.257 GiB |       119.650 MiB |
|        1000 |       536870912 |           2 |        4 |          8 |    378x | 39.662 ms | 1.31% | 39.658 ms | 1.31% |      13537590893 |       643.392 MiB |        59.833 MiB |
|        1000 |      1073741824 |           2 |        4 |          8 |    209x | 71.693 ms | 0.71% | 71.688 ms | 0.71% |      14978085376 |         1.257 GiB |       119.642 MiB |
|        1000 |       536870912 |           4 |        4 |          8 |    377x | 39.731 ms | 1.30% | 39.726 ms | 1.30% |      13514305239 |       643.394 MiB |        59.856 MiB |
|        1000 |      1073741824 |           4 |        4 |          8 |      8x | 70.766 ms | 0.08% | 70.761 ms | 0.08% |      15174115364 |         1.030 GiB |       119.665 MiB |
|        1000 |       536870912 |           8 |        4 |          8 |    379x | 39.486 ms | 1.27% | 39.482 ms | 1.27% |      13597888468 |       647.399 MiB |        59.928 MiB |
|        1000 |      1073741824 |           8 |        4 |          8 |    207x | 72.686 ms | 2.04% | 72.681 ms | 2.04% |      14773317833 |         1.143 GiB |       119.711 MiB |

## orc_multithreaded_read_decode_string

### [0] NVIDIA RTX 5880 Ada Generation

| cardinality | total_data_size | num_threads | num_cols | run_length | Samples | CPU Time  | Noise | GPU Time  | Noise | bytes_per_second | peak_memory_usage | encoded_file_size |
|-------------|-----------------|-------------|----------|------------|---------|-----------|-------|-----------|-------|------------------|-------------------|-------------------|
|        1000 |       536870912 |           1 |        4 |          8 |     80x | 22.933 ms | 2.13% | 22.928 ms | 2.13% |      23415352877 |       661.948 MiB |        10.879 MiB |
|        1000 |      1073741824 |           1 |        4 |          8 |    160x | 34.167 ms | 1.41% | 34.162 ms | 1.41% |      31430436877 |         1.293 GiB |        21.757 MiB |
|        1000 |       536870912 |           2 |        4 |          8 |    560x | 22.533 ms | 2.18% | 22.528 ms | 2.18% |      23830839172 |       609.407 MiB |        10.941 MiB |
|        1000 |      1073741824 |           2 |        4 |          8 |     80x | 34.311 ms | 1.54% | 34.307 ms | 1.54% |      31298288990 |         1.188 GiB |        21.758 MiB |
|        1000 |       536870912 |           4 |        4 |          8 |     23x | 22.179 ms | 0.11% | 22.175 ms | 0.11% |      24211151047 |       624.177 MiB |        10.947 MiB |
|        1000 |      1073741824 |           4 |        4 |          8 |     15x | 33.793 ms | 0.08% | 33.789 ms | 0.08% |      31777989791 |         1.190 GiB |        21.881 MiB |
|        1000 |       536870912 |           8 |        4 |          8 |    679x | 22.006 ms | 1.74% | 22.002 ms | 1.74% |      24401381631 |       624.524 MiB |        10.951 MiB |
|        1000 |      1073741824 |           8 |        4 |          8 |    160x | 33.320 ms | 1.57% | 33.316 ms | 1.57% |      32229227026 |         1.207 GiB |        21.894 MiB |

## orc_multithreaded_read_decode_list

### [0] NVIDIA RTX 5880 Ada Generation

| cardinality | total_data_size | num_threads | num_cols | run_length | Samples |  CPU Time  | Noise  |  GPU Time  | Noise  | bytes_per_second | peak_memory_usage | encoded_file_size |
|-------------|-----------------|-------------|----------|------------|---------|------------|--------|------------|--------|------------------|-------------------|-------------------|
|        1000 |       536870912 |           1 |        4 |          8 |     96x |  74.437 ms |  0.68% |  74.433 ms |  0.68% |       7212831148 |       600.751 MiB |        60.245 MiB |
|        1000 |      1073741824 |           1 |        4 |          8 |      7x |  80.994 ms |  0.49% |  80.990 ms |  0.49% |      13257745936 |         1.173 GiB |       120.549 MiB |
|        1000 |       536870912 |           2 |        4 |          8 |     80x |  79.234 ms |  4.57% |  79.229 ms |  4.57% |       6776190522 |       600.950 MiB |        60.250 MiB |
|        1000 |      1073741824 |           2 |        4 |          8 |    166x |  90.437 ms | 17.19% |  90.432 ms | 17.19% |      11873413959 |         1.173 GiB |       120.489 MiB |
|        1000 |       536870912 |           4 |        4 |          8 |     80x |  78.613 ms |  2.98% |  78.608 ms |  2.98% |       6829702014 |       602.764 MiB |        60.323 MiB |
|        1000 |      1073741824 |           4 |        4 |          8 |    127x | 118.629 ms | 22.67% | 118.624 ms | 22.67% |       9051644873 |         1.174 GiB |       120.499 MiB |
|        1000 |       536870912 |           8 |        4 |          8 |    112x | 133.950 ms |  4.45% | 133.945 ms |  4.45% |       4008135293 |       603.471 MiB |        60.353 MiB |
|        1000 |      1073741824 |           8 |        4 |          8 |     90x | 167.850 ms | 15.93% | 167.844 ms | 15.93% |       6397248426 |         1.177 GiB |       120.646 MiB |

## orc_multithreaded_read_decode_chunked_mixed

### [0] NVIDIA RTX 5880 Ada Generation

| cardinality | total_data_size | num_threads | num_cols | run_length | input_limit | output_limit | Samples | CPU Time  | Noise | GPU Time  | Noise | bytes_per_second | peak_memory_usage | encoded_file_size |
|-------------|-----------------|-------------|----------|------------|-------------|--------------|---------|-----------|-------|-----------|-------|------------------|-------------------|-------------------|
|        1000 |       536870912 |           1 |        4 |          8 |   671088640 |    671088640 |    333x | 45.009 ms | 1.10% | 45.005 ms | 1.10% |      11929261073 |       939.341 MiB |        39.557 MiB |
|        1000 |      1073741824 |           1 |        4 |          8 |   671088640 |    671088640 |     96x | 81.524 ms | 0.61% | 81.519 ms | 0.61% |      13171640865 |         1.834 GiB |        79.072 MiB |
|        1000 |       536870912 |           2 |        4 |          8 |   671088640 |    671088640 |    339x | 44.183 ms | 0.96% | 44.179 ms | 0.96% |      12152252271 |       825.333 MiB |        39.568 MiB |
|        1000 |      1073741824 |           2 |        4 |          8 |   671088640 |    671088640 |      7x | 79.051 ms | 0.02% | 79.046 ms | 0.02% |      13583676002 |         1.611 GiB |        79.113 MiB |
|        1000 |       536870912 |           4 |        4 |          8 |   671088640 |    671088640 |     12x | 43.276 ms | 0.09% | 43.272 ms | 0.09% |      12407024794 |       883.436 MiB |        39.587 MiB |
|        1000 |      1073741824 |           4 |        4 |          8 |   671088640 |    671088640 |     19x | 78.019 ms | 0.49% | 78.014 ms | 0.49% |      13763433041 |         1.724 GiB |        79.136 MiB |
|        1000 |       536870912 |           8 |        4 |          8 |   671088640 |    671088640 |     80x | 42.803 ms | 1.22% | 42.799 ms | 1.22% |      12543864010 |       911.993 MiB |        39.600 MiB |
|        1000 |      1073741824 |           8 |        4 |          8 |   671088640 |    671088640 |    193x | 77.856 ms | 0.59% | 77.852 ms | 0.59% |      13792063986 |         1.837 GiB |        79.175 MiB |

## orc_multithreaded_read_decode_chunked_fixed_width

### [0] NVIDIA RTX 5880 Ada Generation

| cardinality | total_data_size | num_threads | num_cols | run_length | input_limit | output_limit | Samples | CPU Time  | Noise | GPU Time  | Noise | bytes_per_second | peak_memory_usage | encoded_file_size |
|-------------|-----------------|-------------|----------|------------|-------------|--------------|---------|-----------|-------|-----------|-------|------------------|-------------------|-------------------|
|        1000 |       536870912 |           1 |        4 |          8 |   671088640 |    671088640 |    112x | 40.497 ms | 1.23% | 40.493 ms | 1.23% |      13258480947 |       643.390 MiB |        59.821 MiB |
|        1000 |      1073741824 |           1 |        4 |          8 |   671088640 |    671088640 |      7x | 75.440 ms | 0.09% | 75.435 ms | 0.09% |      14234033611 |         1.648 GiB |       119.651 MiB |
|        1000 |       536870912 |           2 |        4 |          8 |   671088640 |    671088640 |     80x | 39.793 ms | 1.36% | 39.789 ms | 1.36% |      13493067216 |       643.392 MiB |        59.833 MiB |
|        1000 |      1073741824 |           2 |        4 |          8 |   671088640 |    671088640 |     69x | 74.499 ms | 0.50% | 74.494 ms | 0.50% |      14413864845 |         1.336 GiB |       119.642 MiB |
|        1000 |       536870912 |           4 |        4 |          8 |   671088640 |    671088640 |    381x | 39.273 ms | 1.11% | 39.269 ms | 1.11% |      13671742653 |       643.394 MiB |        59.856 MiB |
|        1000 |      1073741824 |           4 |        4 |          8 |   671088640 |    671088640 |    204x | 73.755 ms | 0.60% | 73.751 ms | 0.60% |      14559012350 |         1.648 GiB |       119.665 MiB |
|        1000 |       536870912 |           8 |        4 |          8 |   671088640 |    671088640 |     80x | 39.490 ms | 1.31% | 39.486 ms | 1.31% |      13596333864 |       631.980 MiB |        59.928 MiB |
|        1000 |      1073741824 |           8 |        4 |          8 |   671088640 |    671088640 |    203x | 73.907 ms | 1.34% | 73.903 ms | 1.34% |      14529071322 |         1.454 GiB |       119.711 MiB |

## orc_multithreaded_read_decode_chunked_string

### [0] NVIDIA RTX 5880 Ada Generation

| cardinality | total_data_size | num_threads | num_cols | run_length | input_limit | output_limit | Samples | CPU Time  | Noise | GPU Time  | Noise | bytes_per_second | peak_memory_usage | encoded_file_size |
|-------------|-----------------|-------------|----------|------------|-------------|--------------|---------|-----------|-------|-----------|-------|------------------|-------------------|-------------------|
|        1000 |       536870912 |           1 |        4 |          8 |   671088640 |    671088640 |     80x | 23.022 ms | 1.96% | 23.017 ms | 1.96% |      23324556592 |       661.948 MiB |        10.879 MiB |
|        1000 |      1073741824 |           1 |        4 |          8 |   671088640 |    671088640 |     80x | 37.687 ms | 1.37% | 37.682 ms | 1.37% |      28494755419 |         1.659 GiB |        21.757 MiB |
|        1000 |       536870912 |           2 |        4 |          8 |   671088640 |    671088640 |     80x | 22.703 ms | 2.30% | 22.699 ms | 2.30% |      23652118769 |       609.407 MiB |        10.941 MiB |
|        1000 |      1073741824 |           2 |        4 |          8 |   671088640 |    671088640 |     80x | 37.581 ms | 1.42% | 37.577 ms | 1.42% |      28574723179 |         1.658 GiB |        21.758 MiB |
|        1000 |       536870912 |           4 |        4 |          8 |   671088640 |    671088640 |    544x | 22.296 ms | 1.56% | 22.293 ms | 1.56% |      24082840350 |       631.319 MiB |        10.947 MiB |
|        1000 |      1073741824 |           4 |        4 |          8 |   671088640 |    671088640 |     14x | 36.990 ms | 0.14% | 36.985 ms | 0.14% |      29031484389 |         1.554 GiB |        21.881 MiB |
|        1000 |       536870912 |           8 |        4 |          8 |   671088640 |    671088640 |    676x | 22.114 ms | 1.22% | 22.110 ms | 1.22% |      24281965280 |       627.616 MiB |        10.951 MiB |
|        1000 |      1073741824 |           8 |        4 |          8 |   671088640 |    671088640 |     80x | 37.409 ms | 1.40% | 37.405 ms | 1.40% |      28706077426 |         1.562 GiB |        21.894 MiB |

## orc_multithreaded_read_decode_chunked_list

### [0] NVIDIA RTX 5880 Ada Generation

| cardinality | total_data_size | num_threads | num_cols | run_length | input_limit | output_limit | Samples |  CPU Time  | Noise  |  GPU Time  | Noise  | bytes_per_second | peak_memory_usage | encoded_file_size |
|-------------|-----------------|-------------|----------|------------|-------------|--------------|---------|------------|--------|------------|--------|------------------|-------------------|-------------------|
|        1000 |       536870912 |           1 |        4 |          8 |   671088640 |    671088640 |     80x |  74.780 ms |  0.67% |  74.776 ms |  0.67% |       7179747067 |       600.751 MiB |        60.245 MiB |
|        1000 |      1073741824 |           1 |        4 |          8 |   671088640 |    671088640 |    175x |  86.040 ms |  0.56% |  86.035 ms |  0.56% |      12480222210 |         1.576 GiB |       120.549 MiB |
|        1000 |       536870912 |           2 |        4 |          8 |   671088640 |    671088640 |    186x |  80.668 ms |  4.14% |  80.664 ms |  4.14% |       6655685080 |       600.951 MiB |        60.250 MiB |
|        1000 |      1073741824 |           2 |        4 |          8 |   671088640 |    671088640 |    143x | 105.217 ms | 21.56% | 105.212 ms | 21.56% |      10205531345 |         1.576 GiB |       120.489 MiB |
|        1000 |       536870912 |           4 |        4 |          8 |   671088640 |    671088640 |    128x |  80.087 ms |  3.05% |  80.082 ms |  3.05% |       6704042147 |       602.764 MiB |        60.323 MiB |
|        1000 |      1073741824 |           4 |        4 |          8 |   671088640 |    671088640 |    135x | 111.556 ms | 21.88% | 111.551 ms | 21.88% |       9625546746 |         1.489 GiB |       120.499 MiB |
|        1000 |       536870912 |           8 |        4 |          8 |   671088640 |    671088640 |    112x | 134.677 ms |  4.14% | 134.672 ms |  4.14% |       3986513604 |       603.471 MiB |        60.353 MiB |
|        1000 |      1073741824 |           8 |        4 |          8 |   671088640 |    671088640 |     80x | 178.735 ms | 14.17% | 178.730 ms | 14.17% |       6007630497 |         1.520 GiB |       120.646 MiB |

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@zpuller zpuller requested a review from a team as a code owner June 12, 2024 16:51
@zpuller zpuller requested review from mythrocks and PointKernel June 12, 2024 16:52
Copy link

copy-pr-bot bot commented Jun 12, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added libcudf Affects libcudf (C++/CUDA) code. CMake CMake build issue labels Jun 12, 2024
@zpuller
Copy link
Contributor Author

zpuller commented Jun 12, 2024

Ah I forgot to sign the commit

@zpuller
Copy link
Contributor Author

zpuller commented Jun 12, 2024

/ok to test

@vuule vuule added tests Unit testing for project non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels Jun 12, 2024
@vuule
Copy link
Contributor

vuule commented Jun 12, 2024

/ok to test

Signed-off-by: Zach Puller <[email protected]>
@vuule
Copy link
Contributor

vuule commented Jun 12, 2024

/ok to test

Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a small concern with the use of nvtx otherwise looks great!

cpp/benchmarks/io/orc/orc_reader_multithreaded.cpp Outdated Show resolved Hide resolved
Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some non-blocking cleanups. LGTM

Thanks for the contribution!

cpp/benchmarks/io/orc/orc_reader_multithreaded.cpp Outdated Show resolved Hide resolved
zpuller and others added 2 commits June 12, 2024 15:36
@PointKernel
Copy link
Member

/ok to test

@mythrocks
Copy link
Contributor

mythrocks commented Jun 13, 2024

Ah I forgot to sign the commit

we-dont-do-that-around-here.gif
I stand corrected. Thanks, @zpuller.

That does apply to the Spark RAPIDS projects, though.

Copy link
Contributor

@mythrocks mythrocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! A couple of minor nitpicks.

cpp/benchmarks/io/orc/orc_reader_multithreaded.cpp Outdated Show resolved Hide resolved
cpp/benchmarks/io/orc/orc_reader_multithreaded.cpp Outdated Show resolved Hide resolved
cpp/benchmarks/io/orc/orc_reader_multithreaded.cpp Outdated Show resolved Hide resolved
cpp/benchmarks/io/orc/orc_reader_multithreaded.cpp Outdated Show resolved Hide resolved
@mythrocks
Copy link
Contributor

/ok to test

@mythrocks
Copy link
Contributor

/ok to test

@mythrocks
Copy link
Contributor

/merge

@rapids-bot rapids-bot bot merged commit 34227d3 into rapidsai:branch-24.08 Jun 14, 2024
75 checks passed
@mythrocks
Copy link
Contributor

Thanks for doing this one, @zpuller. I've merged this change.

@zpuller zpuller deleted the orc_multithreaded_benchmark branch September 18, 2024 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CMake build issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change tests Unit testing for project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants