Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid passing NativeFileDatasource to pyarrow in read_parquet #9608

Merged

Conversation

rjzamora
Copy link
Member

@rjzamora rjzamora commented Nov 4, 2021

Closes #9599

Saves input NativeFile objects before converting them to NativeFileDatasource, and uses the saved objects to read/parse metadata with pyarrow.

@rjzamora rjzamora added bug Something isn't working Python Affects Python cuDF API. Cython non-breaking Non-breaking change labels Nov 4, 2021
@rjzamora rjzamora self-assigned this Nov 4, 2021
@rjzamora rjzamora requested a review from a team as a code owner November 4, 2021 16:57
@rjzamora rjzamora added the 3 - Ready for Review Ready for review by team label Nov 4, 2021
@codecov
Copy link

codecov bot commented Nov 4, 2021

Codecov Report

Merging #9608 (959b392) into branch-21.12 (ab4bfaa) will decrease coverage by 0.13%.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff                @@
##           branch-21.12    #9608      +/-   ##
================================================
- Coverage         10.79%   10.65%   -0.14%     
================================================
  Files               116      117       +1     
  Lines             18869    19738     +869     
================================================
+ Hits               2036     2104      +68     
- Misses            16833    17634     +801     
Impacted Files Coverage Δ
python/dask_cudf/dask_cudf/sorting.py 92.90% <0.00%> (-1.21%) ⬇️
python/cudf/cudf/io/csv.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/hdf.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/orc.py 0.00% <0.00%> (ø)
python/cudf/cudf/__init__.py 0.00% <0.00%> (ø)
python/cudf/cudf/_version.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/abc.py 0.00% <0.00%> (ø)
python/cudf/cudf/api/types.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/dlpack.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/frame.py 0.00% <0.00%> (ø)
... and 66 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f041a47...959b392. Read the comment docs.

@rjzamora
Copy link
Member Author

rjzamora commented Nov 8, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit eda31b6 into rapidsai:branch-21.12 Nov 8, 2021
@rjzamora rjzamora deleted the fix-pyarrow-metadata-bug branch November 8, 2021 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team bug Something isn't working non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] cudf.read_parquet fails to read from remote storage with use_python_file_object=True
2 participants