You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In order to improve performance, this PR switches from reading the byte sizes of each string in a Parquet file in batches, rather than one at a time, which results in a significant performance improvement.
Here are results from a Cray XC with Lustre filesystem:
Batch byte calculation:
test
sec
single-file
4.981
fixed-single
4.037
scaled-five
2.135
fixed-scaled-five
1.744
five
10.525
fixed-five
9.068
scaled-ten
1.094
fixed-scaled-ten
0.971
ten
11.532
fixed-ten
9.747
Old:
test
sec
single-file
7.907
fixed-single
4.026
scaled-five
4.021
fixed-scaled-five
1.754
five
17.076
fixed-five
4.997
scaled-ten
1.782
fixed-scaled-ten
0.978
ten
17.802
fixed-ten
9.499
Noting that none of the "fixed" rows are impacted, since this only affects byte calculation, which is skipped when using the fixed length optimizaiton.
The text was updated successfully, but these errors were encountered:
In order to improve performance, this PR switches from reading the byte sizes of each string in a Parquet file in batches, rather than one at a time, which results in a significant performance improvement.
Here are results from a Cray XC with Lustre filesystem:
Batch byte calculation:
Old:
Noting that none of the "fixed" rows are impacted, since this only affects byte calculation, which is skipped when using the fixed length optimizaiton.
The text was updated successfully, but these errors were encountered: