You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
cuDF's parquet writer writes incorrect file when columns have a null mask and an offset.
In [1]: importcudfIn [2]: df=cudf.DataFrame({'a':[1,None,3,4,5]})
In [3]: df2=df[2:]
In [4]: df2Out[4]:
a233445In [5]: df2.to_parquet("sliced.parquet")
In [6]: cudf.read_parquet("sliced.parquet")
Out[6]:
a031<NA>25
The above example demonstrates that the validity of the output was written without taking the input's offset into account and hence the element at index 1 in the output is null.
This is possibly an issue with other cuDF writers as well. I haven't tried to repro with ORC, although the code suggests this might be the case.
The text was updated successfully, but these errors were encountered:
cuDF's parquet writer writes incorrect file when columns have a null mask and an offset.
The above example demonstrates that the validity of the output was written without taking the input's offset into account and hence the element at index 1 in the output is null.
This is possibly an issue with other cuDF writers as well. I haven't tried to repro with ORC, although the code suggests this might be the case.
The text was updated successfully, but these errors were encountered: