Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Fix columns ordering issue in parquet reader #10066

Merged
merged 4 commits into from
Jan 19, 2022

Conversation

galipremsagar
Copy link
Contributor

Fixes: #10062

This PR fixes issue where the order of columns and parquet metadata columns(i.e., meta['columns']) can differ and both are not guaranteed to be in the same order always. In this PR, removed the code that has this assumption and created a new dict that contains the metadata of columns which are later used to update the column metadata in dataframe.

@galipremsagar galipremsagar added bug Something isn't working 3 - Ready for Review Ready for review by team 4 - Needs cuDF (Python) Reviewer cuIO cuIO issue non-breaking Non-breaking change labels Jan 18, 2022
@galipremsagar galipremsagar requested a review from a team as a code owner January 18, 2022 16:40
@galipremsagar galipremsagar self-assigned this Jan 18, 2022
@github-actions github-actions bot added the Python Affects Python cuDF API. label Jan 18, 2022
@galipremsagar galipremsagar requested review from vuule and removed request for marlenezw January 18, 2022 16:51
Copy link
Contributor

@rgsl888prabhu rgsl888prabhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small suggestion, rest looks good

python/cudf/cudf/_lib/parquet.pyx Outdated Show resolved Hide resolved
@galipremsagar galipremsagar added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team 4 - Needs cuDF (Python) Reviewer labels Jan 18, 2022
@vuule
Copy link
Contributor

vuule commented Jan 18, 2022

rerun tests

@galipremsagar
Copy link
Contributor Author

rerun tests

@codecov
Copy link

codecov bot commented Jan 18, 2022

Codecov Report

Merging #10066 (fc48e89) into branch-22.02 (967a333) will decrease coverage by 0.07%.
The diff coverage is n/a.

❗ Current head fc48e89 differs from pull request most recent head 8183a1c. Consider uploading reports for the commit 8183a1c to get more accurate results
Impacted file tree graph

@@               Coverage Diff                @@
##           branch-22.02   #10066      +/-   ##
================================================
- Coverage         10.49%   10.41%   -0.08%     
================================================
  Files               119      119              
  Lines             20305    20541     +236     
================================================
+ Hits               2130     2139       +9     
- Misses            18175    18402     +227     
Impacted Files Coverage Δ
python/custreamz/custreamz/kafka.py 29.16% <0.00%> (-0.63%) ⬇️
python/dask_cudf/dask_cudf/sorting.py 92.66% <0.00%> (-0.25%) ⬇️
python/dask_cudf/dask_cudf/core.py 70.85% <0.00%> (-0.17%) ⬇️
python/cudf/cudf/__init__.py 0.00% <0.00%> (ø)
python/cudf/cudf/api/types.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/frame.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/index.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/parquet.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/dtypes.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/scalar.py 0.00% <0.00%> (ø)
... and 31 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e416188...8183a1c. Read the comment docs.

@galipremsagar
Copy link
Contributor Author

rerun tests

1 similar comment
@vuule
Copy link
Contributor

vuule commented Jan 19, 2022

rerun tests

@vuule
Copy link
Contributor

vuule commented Jan 19, 2022

dask_cudf.tests.test_groupby.test_groupby_basic[True-var] failed about five times in a row. I know it can pass since it passed once on my PR, and on #10067. Might be better to admin merge these instead to retrying so many times.

@galipremsagar
Copy link
Contributor Author

galipremsagar commented Jan 19, 2022

dask_cudf.tests.test_groupby.test_groupby_basic[True-var] failed about five times in a row. I know it can pass since it passed once on my PR, and on #10067. Might be better to admin merge these instead to retrying so many times.

This was a cupy flaky issue, #10071 has a fix for it. I'll get it merged ASAP.

@galipremsagar galipremsagar removed the request for review from vuule January 19, 2022 14:58
@galipremsagar
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 8e88adc into rapidsai:branch-22.02 Jan 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge bug Something isn't working cuIO cuIO issue non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Exception TypeError: 'NoneType' object is not subscriptable
4 participants