Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move more methods into SingleColumnFrame #8253

Conversation

vyasr
Copy link
Contributor

@vyasr vyasr commented May 14, 2021

Continuation of #8115 and #8166. Moves more logic out of the Index/Series classes into the new common parent class to reduce code duplication and ensure feature parity.

@vyasr vyasr added 3 - Ready for Review Ready for review by team code quality Python Affects Python cuDF API. non-breaking Non-breaking change labels May 14, 2021
@vyasr vyasr self-assigned this May 14, 2021
@vyasr vyasr requested a review from a team as a code owner May 14, 2021 21:00
@vyasr vyasr requested review from isVoid and brandon-b-miller May 14, 2021 21:00
@vyasr vyasr added the improvement Improvement / enhancement to an existing function label May 14, 2021
@codecov
Copy link

codecov bot commented May 14, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.06@92a432d). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.06    #8253   +/-   ##
===============================================
  Coverage                ?   82.88%           
===============================================
  Files                   ?      105           
  Lines                   ?    17910           
  Branches                ?        0           
===============================================
  Hits                    ?    14845           
  Misses                  ?     3065           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 92a432d...ea9331f. Read the comment docs.

@vyasr vyasr changed the title Move more methods into SingleColumnFrame Move more methods into S ingleColumnFrame May 17, 2021
@vyasr vyasr changed the title Move more methods into S ingleColumnFrame Move more methods into SingleColumnFrame May 17, 2021
Copy link
Contributor

@brandon-b-miller brandon-b-miller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :)

Copy link
Contributor

@isVoid isVoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general. In addition I would love to have more insight into how MultiIndex is (completely) shadowed away from methods in SingleColumnFrame. Could you elaborate if you don't mind?

@vyasr
Copy link
Contributor Author

vyasr commented May 18, 2021

LGTM in general. In addition I would love to have more insight into how MultiIndex is (completely) shadowed away from methods in SingleColumnFrame. Could you elaborate if you don't mind?

The short answer is that I'm not doing anything, because nothing has been done in the past to fix this. MultiIndex in its current state works, but is fragile and there are lots of things that probably break in unexpected ways (like binary ops, for example). The critical methods to ensure that creating a DataFrame/Series with a MultiIndex have been correctly overridden, but try to do anything beyond that right now and you're playing with fire.

The longer answer is that I have a plan for how to resolve this issue, but it will take some more careful refactoring. In the long term, I think we should aim for a structure that looks more like this:

class Frame
class SingleColumnFrame(Frame)
class Index(Frame)
class MultiColumnFrame(Frame) # Maybe worthwhile, _if_ we can find good ways to share DataFrame and MultiIndex code
class MultiIndex(Index, MultiColumnFrame) # _If_ we did this, we would have to figure out the right order
class RangeIndex(Index)
class SingleColumnIndex(Index, SingleColumnFrame) # We would have to figure out the right order
# all the other index types inherit from Index

This inheritance would resolve your concerns. However, before we can do that we need to make it a lot easier to inherit from Index types, which is a huge pain right now because of the way we need to override __new__. After #8254 is merged I plan to make a subsequent PR that renames Index to something like BaseIndex, then remove its __new__ method and puts it into a new class Index(BaseIndex). This class will sit at the bottom of the hierarchy rather than the top, but I'll use a little metaclass hacking to hide that detail from users (basically overriding __instancecheck__ and __subclasscheck__. Assuming that works, it should then be possible to substantially simplify all the constructor logic for Index types, and I won't feel so apprehensive about more substantial modifications to the inheritance hierarchy.

@vyasr
Copy link
Contributor Author

vyasr commented May 18, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 8834ed6 into rapidsai:branch-21.06 May 18, 2021
@vyasr vyasr added this to the cuDF Python Refactoring milestone Jul 22, 2021
@vyasr vyasr deleted the refactor/reduce_index_series_duplication branch January 14, 2022 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants