You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Parquet writer seems too stringent with its checks to write columns as not nullable.
The following cases fail to write when set_nullability(false) is called on the input metadata:
Column with no nulls;
Sliced column that only has nulls outside of the slice.
This behaviors limits the use of the nullable option.
… as non-nullable (#13675)
Issue #7654, #13010
Writers have a strict check for nullability when applying the user metadata's nullability options, because checking the actual number of nulls is (was) not cheap.
Since we now create all columns with a known number of nulls, the `null_count` check became cheap and we have no reason to prevent columns without nulls to be written as non-nullable.
This PR changes the condition to allow this case.
The PR does not address the issue with sliced columns, where it's not possible to write sliced column as non-nullable, even if the slice has no nulls. That check is still not cheap :)
Authors:
- Vukasin Milovanovic (https://github.com/vuule)
Approvers:
- Mark Harris (https://github.com/harrism)
- Nghia Truong (https://github.com/ttnghia)
URL: #13675
Parquet writer seems too stringent with its checks to write columns as not nullable.
The following cases fail to write when
set_nullability(false)
is called on the input metadata:This behaviors limits the use of the
nullable
option.Repro code:
To test the case where the column has no nulls, modify
valids
to always returntrue
.The text was updated successfully, but these errors were encountered: