-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compression when writing bools with parquet #2579
Comments
I verified the The specific error we are receiving is
|
I was able to get the write to work properly by removing the use of I think to fix this we should add a check if the column is of type bool and only set |
One thing to note is that we may have to remove RLE compression altogether for the multi-column case. I am going to look into ways to avoid this if possible. However, it does appear that if no encoding is explicitly set, it will be inferred. We may want to just switch to this in all cases. |
I ran a bunch of tests this morning to confirm that if no specific encoding is specified Parquet will automatically select the "best" encoding. I was able to verify this as file sizes when specifying an encoding are identical to those when the encoding is not specified. As a result, I believe that it will be best to remove the specification in our code to use |
While working #2539, it seems like we don't have compression support for writing boolean arrays with parquet. We should look into if that is supported by
pyarrow
and if it is we should figure out how to add itThe text was updated successfully, but these errors were encountered: