Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix read_parquet bug for extended dtypes from remote storage #9638

Merged
merged 2 commits into from
Nov 11, 2021

Conversation

rjzamora
Copy link
Member

@rjzamora rjzamora commented Nov 9, 2021

This fixes a read_parquet bug discovered while iterating on #9589

Without this fix, the optimized read_parquet code path will fail when the pandas metadata includes index-column information. It may also fail when the data includes list or struct columns (depending on the engine that wrote the parquet file).

@rjzamora rjzamora added bug Something isn't working 2 - In Progress Currently a work in progress Python Affects Python cuDF API. non-breaking Non-breaking change labels Nov 9, 2021
@rjzamora rjzamora requested a review from a team as a code owner November 9, 2021 19:46
@rjzamora rjzamora self-assigned this Nov 9, 2021
@rjzamora rjzamora requested review from trxcllnt and shwina November 9, 2021 19:46
@rjzamora rjzamora added 3 - Ready for Review Ready for review by team 4 - Needs cuDF (Python) Reviewer and removed 2 - In Progress Currently a work in progress labels Nov 9, 2021
@codecov
Copy link

codecov bot commented Nov 9, 2021

Codecov Report

Merging #9638 (77807e7) into branch-21.12 (ab4bfaa) will decrease coverage by 0.09%.
The diff coverage is n/a.

❗ Current head 77807e7 differs from pull request most recent head 75e3fb6. Consider uploading reports for the commit 75e3fb6 to get more accurate results
Impacted file tree graph

@@               Coverage Diff                @@
##           branch-21.12    #9638      +/-   ##
================================================
- Coverage         10.79%   10.69%   -0.10%     
================================================
  Files               116      117       +1     
  Lines             18869    19849     +980     
================================================
+ Hits               2036     2123      +87     
- Misses            16833    17726     +893     
Impacted Files Coverage Δ
python/dask_cudf/dask_cudf/sorting.py 92.90% <0.00%> (-1.21%) ⬇️
python/cudf/cudf/io/csv.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/hdf.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/orc.py 0.00% <0.00%> (ø)
python/cudf/cudf/__init__.py 0.00% <0.00%> (ø)
python/cudf/cudf/_version.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/abc.py 0.00% <0.00%> (ø)
python/cudf/cudf/api/types.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/dlpack.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/frame.py 0.00% <0.00%> (ø)
... and 67 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3280be2...75e3fb6. Read the comment docs.

@rjzamora
Copy link
Member Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 1e4afd1 into rapidsai:branch-21.12 Nov 11, 2021
@rjzamora rjzamora deleted the fix-remote-parquet-bug branch November 11, 2021 14:35
@vyasr vyasr added 4 - Needs Review Waiting for reviewer to review or respond and removed 4 - Needs cuDF (Python) Reviewer labels Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team 4 - Needs Review Waiting for reviewer to review or respond bug Something isn't working non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants