Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Parquet writer does not apply offset to nullmask #6642

Closed
devavret opened this issue Nov 2, 2020 · 0 comments · Fixed by #6889
Closed

[BUG] Parquet writer does not apply offset to nullmask #6642

devavret opened this issue Nov 2, 2020 · 0 comments · Fixed by #6889
Assignees
Labels
bug Something isn't working cuIO cuIO issue

Comments

@devavret
Copy link
Contributor

devavret commented Nov 2, 2020

cuDF's parquet writer writes incorrect file when columns have a null mask and an offset.

In [1]: import cudf

In [2]: df = cudf.DataFrame({'a':[1,None,3,4,5]})

In [3]: df2 = df[2:]

In [4]: df2
Out[4]: 
   a
2  3
3  4
4  5

In [5]: df2.to_parquet("sliced.parquet")

In [6]: cudf.read_parquet("sliced.parquet")
Out[6]: 
      a
0     3
1  <NA>
2     5

The above example demonstrates that the validity of the output was written without taking the input's offset into account and hence the element at index 1 in the output is null.

This is possibly an issue with other cuDF writers as well. I haven't tried to repro with ORC, although the code suggests this might be the case.

@devavret devavret added bug Something isn't working Needs Triage Need team to review and classify labels Nov 2, 2020
@devavret devavret added cuIO cuIO issue and removed Needs Triage Need team to review and classify labels Nov 2, 2020
@kaatish kaatish self-assigned this Nov 24, 2020
rapids-bot bot pushed a commit that referenced this issue Dec 13, 2020
Fixes #6642

Authors:
  - Kumar Aatish <[email protected]>
  - skirui-source <[email protected]>

Approvers:
  - Vukasin Milovanovic
  - Devavret Makkar
  - Ram (Ramakrishna Prabhu)

URL: #6889
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuIO cuIO issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants