Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable fractional null probability for hashing benchmark #13967

Merged
merged 9 commits into from
Aug 30, 2023

Conversation

Blonck
Copy link
Contributor

@Blonck Blonck commented Aug 25, 2023

In the past, the HASING_NVBENCH benchmark treated the nulls parameter as a boolean. Any value other than 0.0 resulted in a null probability of 1.0.

Now, the nulls parameter directly determines the null probability. For instance, a value of 0.1 will generate 10% of the data as null. Moreover, setting nulls to 0.0 produces data without a null bitmask.

Additionally, bytes_per_second are added to the benchmark.

This patch relates to #13735.

Checklist

In the past, the `HASING_NVBENCH` benchmark treated the `nulls` parameter as
a boolean. Any value other than 0.0 resulted in a null probability of
100% for the generated data.

Now, the `nulls` parameter directly determines the null probability. For
instance, a value of 0.1 will generate 10% of the data as null.
Moreover, setting nulls to 0.0 produces data without a null bitmask.

Additionally, `bytes_per_second` are added to the benchmark.

This patch relates to rapidsai#13735.
@rapids-bot
Copy link

rapids-bot bot commented Aug 25, 2023

Pull requests from external contributors require approval from a rapidsai organization member with write permissions or greater before CI can begin.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Aug 25, 2023
@Blonck
Copy link
Contributor Author

Blonck commented Aug 25, 2023

Hi, I wasn't sure about this one, thus the draft.

For me, it looked like the nulls parameter was accidentally cast to bool. Thus either all or no values of the table were invalid. In particular, the benchmark was configured for nulls = [0, 0.1]. Also if nulls == 0.0, there will no longer be a null bitmask generated.

Hope that makes sense, if not I will just remove my changes and just add the bytes_per_second calculation 😄.

@davidwendt
Copy link
Contributor

Hi, I wasn't sure about this one, thus the draft.

For me, it looked like the nulls parameter was accidentally cast to bool. Thus either all or no values of the table were invalid. In particular, the benchmark was configured for nulls = [0, 0.1]. Also if nulls == 0.0, there will no longer be a null bitmask generated.

Hope that makes sense, if not I will just remove my changes and just add the bytes_per_second calculation 😄.

Yes. It looks like you fixed a bug here.

@Blonck Blonck marked this pull request as ready for review August 25, 2023 19:18
@Blonck Blonck requested a review from a team as a code owner August 25, 2023 19:18
@PointKernel PointKernel added feature request New feature or request non-breaking Non-breaking change labels Aug 25, 2023
@PointKernel
Copy link
Member

/ok to test

Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several typos otherwise look great. @Blonck Can you please also paste the console output of those benchmarks in the PR just for reference?

cpp/benchmarks/hashing/hash.cpp Outdated Show resolved Hide resolved
cpp/benchmarks/hashing/hash.cpp Outdated Show resolved Hide resolved
cpp/benchmarks/hashing/hash.cpp Outdated Show resolved Hide resolved
@Blonck
Copy link
Contributor Author

Blonck commented Aug 26, 2023

Sure @PointKernel, here is the log. Please note that I'm currently using WSL to compile and run the code. Therefore, the performance metrics might not be fully representative. I'm uncertain about the extent to which WSL/Windows affects performance. At least it significantly impacts compile time :).

# Devices

## [0] `NVIDIA GeForce RTX 4070 Ti`
* SM Version: 890 (PTX Version: 860)
* Number of SMs: 60
* SM Default Clock Rate: 18446744072024 MHz
* Global Memory: 11032 MiB Free / 12281 MiB Total
* Global Memory Bus Peak: 504 GB/sec (192-bit DDR @10501MHz)
* Max Shared Memory: 100 KiB/SM, 48 KiB/Block
* L2 Cache Size: 49152 KiB
* Maximum Active Blocks: 24/SM
* Maximum Active Threads: 1536/SM, 1024/Block
* Available Registers: 65536/SM, 65536/Block
* ECC Enabled: No

# Log

RMM memory resource = pool
Run:  [1/12] hashing [Device=0 num_rows=65536 nulls=0 hash_name=murmurhash3_x86_32]
Pass: Cold: 0.171495ms GPU, 0.204221ms CPU, 0.50s total GPU, 1.16s total wall, 2928x 
Run:  [2/12] hashing [Device=0 num_rows=16777216 nulls=0 hash_name=murmurhash3_x86_32]
Pass: Cold: 1.457714ms GPU, 1.524122ms CPU, 3.76s total GPU, 4.37s total wall, 2576x 
Run:  [3/12] hashing [Device=0 num_rows=65536 nulls=0.1 hash_name=murmurhash3_x86_32]
Pass: Cold: 0.382451ms GPU, 0.410197ms CPU, 0.50s total GPU, 0.79s total wall, 1312x 
Run:  [4/12] hashing [Device=0 num_rows=16777216 nulls=0.1 hash_name=murmurhash3_x86_32]
Pass: Cold: 1.614988ms GPU, 1.670856ms CPU, 0.96s total GPU, 1.09s total wall, 592x 
Run:  [5/12] hashing [Device=0 num_rows=65536 nulls=0 hash_name=md5]
Pass: Cold: 0.394157ms GPU, 0.425295ms CPU, 0.50s total GPU, 0.76s total wall, 1280x 
Run:  [6/12] hashing [Device=0 num_rows=16777216 nulls=0 hash_name=md5]
Warn: Current measurement timed out (15.00s) while over noise threshold (2.04% > 0.50%)
Pass: Cold: 12.810189ms GPU, 12.892377ms CPU, 14.67s total GPU, 15.00s total wall, 1145x 
Run:  [7/12] hashing [Device=0 num_rows=65536 nulls=0.1 hash_name=md5]
Pass: Cold: 0.444732ms GPU, 0.474608ms CPU, 0.51s total GPU, 0.74s total wall, 1136x 
Run:  [8/12] hashing [Device=0 num_rows=16777216 nulls=0.1 hash_name=md5]
Warn: Current measurement timed out (15.00s) while over noise threshold (4.29% > 0.50%)
Pass: Cold: 13.082924ms GPU, 13.164643ms CPU, 14.67s total GPU, 15.00s total wall, 1121x 
Run:  [9/12] hashing [Device=0 num_rows=65536 nulls=0 hash_name=spark_murmurhash3_x86_32]
Pass: Cold: 0.156733ms GPU, 0.191434ms CPU, 0.50s total GPU, 1.21s total wall, 3216x 
Run:  [10/12] hashing [Device=0 num_rows=16777216 nulls=0 hash_name=spark_murmurhash3_x86_32]
Pass: Cold: 1.478015ms GPU, 1.533411ms CPU, 1.77s total GPU, 2.05s total wall, 1200x 
Run:  [11/12] hashing [Device=0 num_rows=65536 nulls=0.1 hash_name=spark_murmurhash3_x86_32]
Pass: Cold: 0.402708ms GPU, 0.437331ms CPU, 0.50s total GPU, 0.79s total wall, 1248x 
Run:  [12/12] hashing [Device=0 num_rows=16777216 nulls=0.1 hash_name=spark_murmurhash3_x86_32]
Pass: Cold: 1.618771ms GPU, 1.678639ms CPU, 1.37s total GPU, 1.56s total wall, 848x 

# Benchmark Results

## hashing

### [0] NVIDIA GeForce RTX 4070 Ti

| num_rows | nulls |        hash_name         | Samples |  CPU Time  | Noise  |  GPU Time  | Noise  | GlobalMem BW | BWUtil |
|----------|-------|--------------------------|---------|------------|--------|------------|--------|--------------|--------|
|    65536 |     0 |       murmurhash3_x86_32 |   2928x | 204.221 us | 88.92% | 171.495 us | 87.16% |  10.697 GB/s |  2.12% |
| 16777216 |     0 |       murmurhash3_x86_32 |   2576x |   1.524 ms | 17.96% |   1.458 ms | 16.59% | 321.973 GB/s | 63.88% |
|    65536 |   0.1 |       murmurhash3_x86_32 |   1312x | 410.197 us | 43.65% | 382.451 us | 45.51% |   4.398 GB/s |  0.87% |
| 16777216 |   0.1 |       murmurhash3_x86_32 |    592x |   1.671 ms | 10.91% |   1.615 ms |  9.95% | 266.156 GB/s | 52.80% |
|    65536 |     0 |                      md5 |   1280x | 425.295 us | 41.19% | 394.157 us | 42.69% |   9.310 GB/s |  1.85% |
| 16777216 |     0 |                      md5 |   1145x |  12.892 ms |  2.18% |  12.810 ms |  2.04% |  73.309 GB/s | 14.54% |
|    65536 |   0.1 |                      md5 |   1136x | 474.608 us | 36.73% | 444.732 us | 38.77% |   7.908 GB/s |  1.57% |
| 16777216 |   0.1 |                      md5 |   1121x |  13.165 ms |  4.33% |  13.083 ms |  4.29% |  68.761 GB/s | 13.64% |
|    65536 |     0 | spark_murmurhash3_x86_32 |   3216x | 191.434 us | 61.58% | 156.733 us | 59.51% |  11.704 GB/s |  2.32% |
| 16777216 |     0 | spark_murmurhash3_x86_32 |   1200x |   1.533 ms | 32.23% |   1.478 ms | 31.72% | 317.550 GB/s | 63.00% |
|    65536 |   0.1 | spark_murmurhash3_x86_32 |   1248x | 437.331 us | 44.13% | 402.708 us | 45.15% |   4.177 GB/s |  0.83% |
| 16777216 |   0.1 | spark_murmurhash3_x86_32 |    848x |   1.679 ms | 10.75% |   1.619 ms |  9.30% | 265.534 GB/s | 52.68% |

@davidwendt
Copy link
Contributor

/ok to test

Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@PointKernel
Copy link
Member

/ok to test

@copy-pr-bot
Copy link

copy-pr-bot bot commented Aug 29, 2023

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

cpp/benchmarks/hashing/hash.cpp Outdated Show resolved Hide resolved
Code review suggestions.

Co-authored-by: David Wendt <[email protected]>
@davidwendt
Copy link
Contributor

/ok to test

@PointKernel
Copy link
Member

/ok to test

@PointKernel
Copy link
Member

/merge

@rapids-bot rapids-bot bot merged commit c73ff70 into rapidsai:branch-23.10 Aug 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants