-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change the default dictionary policy in Parquet writer from ALWAYS
to ADAPTIVE
#15570
Change the default dictionary policy in Parquet writer from ALWAYS
to ADAPTIVE
#15570
Conversation
ALWAYS
to ADAPTIVE
ALWAYS
to ADAPTIVE
ALWAYS
to ADAPTIVE
CC: @GregoryKimball |
There's also the Python side to be changed. |
Thanks, I have pushed the ^suggested change. This is only a testing branch as you guessed. |
Do you think this is would be a "breaking" change? @mhaseeb123 @vuule |
Probably not, functionality wise at least. 🤔 |
/ok to test |
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just a few nits and a question.
/ok to test |
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for iterating on this.
Looks great!
Co-authored-by: Bradley Dice <[email protected]>
/ok to test |
/merge |
Description
This PR changes the default dictionary policy in parquet from
ALWAYS
toADAPTIVE
and adds an argumentmax_dictionary_size
to control theADAPTIVE
-ness of the dictionary policy. This change prevents a silent fallback toUNCOMPRESSED
when writing parquet files withZSTD
compression leading to better performance for several use cases.Partially closes #15501.
Checklist