Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix null hive-partition behavior in dask-cudf parquet #12866

Merged
merged 17 commits into from
Mar 10, 2023

Conversation

rjzamora
Copy link
Member

@rjzamora rjzamora commented Feb 28, 2023

Description

This PR includes a few simple changes to fix the handling of null hive partitions in dask_cudf.
Depends on dask/dask#10007

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@rjzamora rjzamora added 2 - In Progress Currently a work in progress Python Affects Python cuDF API. dask Dask issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 28, 2023
@rjzamora rjzamora marked this pull request as ready for review March 6, 2023 17:54
@rjzamora rjzamora requested review from a team as code owners March 6, 2023 17:54
@rjzamora rjzamora requested review from wence- and isVoid March 6, 2023 17:54
@rjzamora rjzamora changed the title [WIP] Fix null hive-partition behavior in dask-cudf parquet Fix null hive-partition behavior in dask-cudf parquet Mar 6, 2023
@rjzamora rjzamora added 3 - Ready for Review Ready for review by team 4 - Needs Dask Reviewer and removed 2 - In Progress Currently a work in progress labels Mar 7, 2023
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, with some minor comments.

python/cudf/cudf/io/parquet.py Outdated Show resolved Hide resolved
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I don't know a lot about hive partitioning to verify the test, but the code appears fine.

@rjzamora rjzamora added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team 4 - Needs Dask Reviewer labels Mar 10, 2023
@rjzamora
Copy link
Member Author

/merge

@rapids-bot rapids-bot bot merged commit 4da6b19 into rapidsai:branch-23.04 Mar 10, 2023
@rjzamora rjzamora deleted the null-hive-partition branch March 10, 2023 18:42
rapids-bot bot pushed a commit that referenced this pull request Mar 15, 2023
…12930)

This is a follow-up "fix" for #12866
While that PR enables the writing/reading of null hive partitions using `dask_cudf`, it does not preserve the type of integer partition columns containing nulls. This PR should address the remaining issue.

Authors:
  - Richard (Rick) Zamora (https://github.com/rjzamora)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Lawrence Mitchell (https://github.com/wence-)

URL: #12930
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge dask Dask issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants