-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Parquet writer nullable
option application to single table writes
#12933
Make Parquet writer nullable
option application to single table writes
#12933
Conversation
…pq-single-write-force-nullable
…pq-single-write-force-nullable
nullability
option application to single table writesnullable
option application to single table writes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behavior change makes sense. Is it "breaking"? Code looks good.
The Parquet writer used to throw when called with the |
/merge |
Requires: #12933 This PR adds `nullability` parameter to parquet writer. When it is `True`, all columns are written as `null` in the schema. When `False`, all columns are written as `not null` in the schema, however, if a column contains null values, this parameter is ignored. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Bradley Dice (https://github.com/bdice) URL: #12952
Description
When writing multiple tables into a single Parquet file, users can run into issues if a column does not have nulls in the first table, but have some in other tables. The
nullable
member ofcolumn_in_metadata
was originally added to address this and allow users to enforce nullability of columns from multiple tables. Because of this, thenullable
option is only applied to chunked writes.Recently, a different use for the option has been identified, where tables are stored into individual Parquet files, which are later read and the read tables are concatenated. Without the option to enforce nullability, Parquet files can end up with different nullabilities, i.e. different schemas, causing concatenation to fail.
This PR allows the nullable option to apply to single writes as well. The write call throws if user tried to write a column with nulls as non-nullable.
Checklist