Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove _source_index from MultiIndex #9191

Merged
merged 13 commits into from
Sep 16, 2021

Conversation

vyasr
Copy link
Contributor

@vyasr vyasr commented Sep 8, 2021

This PR is the first in a sequence refactoring the MultiIndex class, which is currently in a halfway state of making use of some facilities common to other Frames without using them consistently. This PR removes the internal _source_data parameter, which creates a thin DataFrame wrapper around a MultiIndex to provide access to those methods. Since MultiIndex is now clearly defined as a Frame, this PR moves us towards a much simpler and more uniform design where we use inherited Frame methods directly. This approach is significantly easier to maintain than treating a MultiIndex as being "backed" by a DataFrame. This approach also comes with various performance benefits. While relatively cheap, the internal conversion of a MultiIndex to a DataFrame is not free, and in cases where we need to return a modified MultiIndex the conversion back from a DataFrame is significantly more complex. Moreover, in a number of cases we were using less efficient methods than those we could use directly on the MultiIndex.

@vyasr vyasr added 3 - Ready for Review Ready for review by team Python Affects Python cuDF API. tech debt improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 8, 2021
@vyasr vyasr added this to the CuDF Python Refactoring milestone Sep 8, 2021
@vyasr vyasr self-assigned this Sep 8, 2021
@vyasr vyasr requested a review from a team as a code owner September 8, 2021 00:10
@vyasr
Copy link
Contributor Author

vyasr commented Sep 8, 2021

rerun tests

@codecov
Copy link

codecov bot commented Sep 8, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.10@42a70f3). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head e629b37 differs from pull request most recent head f5493ae. Consider uploading reports for the commit f5493ae to get more accurate results
Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.10    #9191   +/-   ##
===============================================
  Coverage                ?   10.84%           
===============================================
  Files                   ?      115           
  Lines                   ?    18761           
  Branches                ?        0           
===============================================
  Hits                    ?     2035           
  Misses                  ?    16726           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 42a70f3...f5493ae. Read the comment docs.

python/cudf/cudf/core/multiindex.py Show resolved Hide resolved
python/cudf/cudf/core/multiindex.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/multiindex.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/multiindex.py Show resolved Hide resolved
python/cudf/cudf/core/multiindex.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/multiindex.py Show resolved Hide resolved
python/cudf/cudf/core/multiindex.py Show resolved Hide resolved
@vyasr vyasr requested a review from galipremsagar September 13, 2021 21:23
Copy link
Contributor

@thomcom thomcom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job refactoring the dreaded MultiIndex, I know all about it. :D Looks pretty much good, I'm going to double check a few tests I thought of that might have issues, then approve assuming nothing comes up.

python/cudf/cudf/core/multiindex.py Show resolved Hide resolved
@thomcom thomcom self-requested a review September 14, 2021 22:12
@galipremsagar galipremsagar added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Sep 15, 2021
@galipremsagar
Copy link
Contributor

rerun tests

@galipremsagar
Copy link
Contributor

@gpucibot merge

@vyasr
Copy link
Contributor Author

vyasr commented Sep 15, 2021

rerun tests

@rapids-bot rapids-bot bot merged commit a3e0280 into rapidsai:branch-21.10 Sep 16, 2021
rapids-bot bot pushed a commit that referenced this pull request Sep 22, 2021
This PR is a follow-up to #9191 to further simplify the MultiIndex class. It removes various unused functions, inlines and simplifies a number of other single-use internal functions, and moves more methods to Frame that can be shared. It also makes numerous other miscellaneous improvements to the code in MultiIndex to simplify further rewrites.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Marlene  (https://github.com/marlenezw)
  - H. Thomson Comer (https://github.com/thomcom)
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Ashwin Srinath (https://github.com/shwina)

URL: #9243
@vyasr vyasr deleted the refactor/multiindex_part1 branch January 14, 2022 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants