Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a fixed length string Parquet benchmark #3711

Closed
bmcdonald3 opened this issue Aug 28, 2024 · 0 comments · Fixed by #3712
Closed

Add a fixed length string Parquet benchmark #3711

bmcdonald3 opened this issue Aug 28, 2024 · 0 comments · Fixed by #3712
Assignees

Comments

@bmcdonald3
Copy link
Contributor

Add a Parquet string read benchmark that compares the performance of reading Parquet files with a fixed number of characters with and without the fixed length optimization. This will test (1) single file reads (2) multi-file reads with larger files (3) multi-file reads with smaller files and (4) all of the previous with the fixed length optimization.

Here are results demonstrating ~2x speedup with using the fixed length string optimization collected on a Cray XC with a Lustre file system:

test sec
single-file 39.388
fixed-single 20.664
scaled-five 17.283
fixed-scaled-five 8.814
five 85.332
fixed-five 43.743
scaled-ten 8.998
fixed-scaled-ten 4.794
ten 86.761
fixed-ten 45.498
@bmcdonald3 bmcdonald3 self-assigned this Aug 28, 2024
github-merge-queue bot pushed a commit that referenced this issue Aug 30, 2024
* Add Parquet fixed length string benchmark
Add a Parquet string read benchmark that compares the performance of
reading Parquet files with a fixed number of characters with and
without the fixed length optimization. This will test (1) single
file reads (2) multi-file reads with larger files (3) multi-file
reads with smaller files and (4) all of the previous with the fixed
length optimization.

Here are results demonstrating ~2x speedup with using the fixed
length string optimization collected on a Cray XC with a Lustre file system:
| test              |    sec |
|:------------------|-------:|
| single-file       | 39.388 |
| fixed-single      | 20.664 |
| scaled-five       | 17.283 |
| fixed-scaled-five |  8.814 |
| five              | 85.332 |
| fixed-five        | 43.743 |
| scaled-ten        |  8.998 |
| fixed-scaled-ten  |  4.794 |
| ten               | 86.761 |
| fixed-ten         | 45.498 |

* Change default size
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant