Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #3711: Add Parquet fixed length string benchmark #3712

Merged
merged 2 commits into from
Aug 30, 2024

Conversation

bmcdonald3
Copy link
Contributor

Add a Parquet string read benchmark that compares the performance of
reading Parquet files with a fixed number of characters with and
without the fixed length optimization. This will test (1) single
file reads (2) multi-file reads with larger files (3) multi-file
reads with smaller files and (4) all of the previous with the fixed
length optimization.

Here are results demonstrating ~2x speedup with using the fixed
length string optimization collected on a Cray XC with a Lustre file system:

test sec
single-file 39.388
fixed-single 20.664
scaled-five 17.283
fixed-scaled-five 8.814
five 85.332
fixed-five 43.743
scaled-ten 8.998
fixed-scaled-ten 4.794
ten 86.761
fixed-ten 45.498

Closes #3711

Add a Parquet string read benchmark that compares the performance of
reading Parquet files with a fixed number of characters with and
without the fixed length optimization. This will test (1) single
file reads (2) multi-file reads with larger files (3) multi-file
reads with smaller files and (4) all of the previous with the fixed
length optimization.

Here are results demonstrating ~2x speedup with using the fixed
length string optimization collected on a Cray XC with a Lustre file system:
| test              |    sec |
|:------------------|-------:|
| single-file       | 39.388 |
| fixed-single      | 20.664 |
| scaled-five       | 17.283 |
| fixed-scaled-five |  8.814 |
| five              | 85.332 |
| fixed-five        | 43.743 |
| scaled-ten        |  8.998 |
| fixed-scaled-ten  |  4.794 |
| ten               | 86.761 |
| fixed-ten         | 45.498 |
@bmcdonald3 bmcdonald3 requested a review from stress-tess August 28, 2024 23:33
Copy link
Contributor

@ajpotts ajpotts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Cool results. Thanks!

Copy link
Member

@stress-tess stress-tess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great!! thanks ben!

@stress-tess stress-tess added this pull request to the merge queue Aug 30, 2024
Merged via the queue into Bears-R-Us:master with commit 4673453 Aug 30, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a fixed length string Parquet benchmark
3 participants