Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove internal columns usage #10315

Merged

Conversation

vyasr
Copy link
Contributor

@vyasr vyasr commented Feb 17, 2022

This PR uses the feature introduced in #10263 to remove the usage of DataFrame.columns where possible.

Currently this removes a large number of internal uses but not all of them to verify that CI does indeed fail as expected since some changes were made on #10263 after the last test of CI by reviewer request. I will convert this from a draft once I see CI fail and once all the necessary changes have been made for tests to pass.

@vyasr vyasr added 2 - In Progress Currently a work in progress Python Affects Python cuDF API. Performance Performance related issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 17, 2022
@vyasr vyasr added this to the CuDF Python Refactoring milestone Feb 17, 2022
@vyasr vyasr self-assigned this Feb 17, 2022
@vyasr vyasr requested a review from a team as a code owner February 17, 2022 00:26
@vyasr vyasr requested review from isVoid and charlesbluca February 17, 2022 00:26
@vyasr vyasr marked this pull request as draft February 17, 2022 00:26
@vyasr vyasr added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Feb 17, 2022
@vyasr vyasr marked this pull request as ready for review February 17, 2022 02:34
@vyasr
Copy link
Contributor Author

vyasr commented Feb 17, 2022

See the failures on 0d459e7 such as this run as verification that CI will fail as expected now.

@vyasr vyasr requested review from isVoid and charlesbluca February 17, 2022 03:29
@codecov
Copy link

codecov bot commented Feb 17, 2022

Codecov Report

Merging #10315 (ae168f4) into branch-22.04 (c163886) will decrease coverage by 0.00%.
The diff coverage is 0.00%.

Impacted file tree graph

@@               Coverage Diff                @@
##           branch-22.04   #10315      +/-   ##
================================================
- Coverage         10.62%   10.62%   -0.01%     
================================================
  Files               122      122              
  Lines             20961    20973      +12     
================================================
  Hits               2228     2228              
- Misses            18733    18745      +12     
Impacted Files Coverage Δ
python/cudf/cudf/core/_base_index.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/_internals/where.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/dataframe.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/df_protocol.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/frame.py 0.00% <ø> (ø)
python/cudf/cudf/core/groupby/groupby.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/indexed_frame.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/multiindex.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/reshape.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/window/rolling.py 0.00% <0.00%> (ø)
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c163886...ae168f4. Read the comment docs.

@vyasr vyasr force-pushed the refactor/remove_internal_columns_usage branch from 3cb216e to 102ccef Compare February 22, 2022 21:09
Copy link
Contributor

@isVoid isVoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't spot any major issues. Do you think _set_column_names_like can be added to an overload of _from_columns_like_self in dataframe?

python/cudf/cudf/core/dataframe.py Show resolved Hide resolved
python/cudf/cudf/core/groupby/groupby.py Outdated Show resolved Hide resolved
@vyasr
Copy link
Contributor Author

vyasr commented Feb 23, 2022

Can't spot any major issues. Do you think _set_column_names_like can be added to an overload of _from_columns_like_self in dataframe?

That's definitely a good idea for a follow-up PR, but I think it's out of scope for this one.

@vyasr vyasr requested a review from isVoid February 23, 2022 22:53
@vyasr
Copy link
Contributor Author

vyasr commented Feb 24, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit aa746ae into rapidsai:branch-22.04 Feb 24, 2022
@vyasr vyasr deleted the refactor/remove_internal_columns_usage branch March 9, 2022 17:21
rapids-bot bot pushed a commit that referenced this pull request Mar 10, 2022
A quick fix to set column names in the `_from_columns_like_self` factory. Since dataframe column names are slightly more complex than simple Frame, this fix makes sure column name metadata (such as multi level names) are properly copied to the result dataframe.

Addresses this comment: #10315 (comment)

Authors:
  - Michael Wang (https://github.com/isVoid)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

URL: #10400
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Performance Performance related issue Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants