-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor cleanup of unused Python functions #9974
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-22.02 #9974 +/- ##
================================================
- Coverage 10.49% 10.43% -0.06%
================================================
Files 119 119
Lines 20305 20444 +139
================================================
+ Hits 2130 2134 +4
- Misses 18175 18310 +135
Continue to review full report at Codecov.
|
# Note that both Series and DataFrame return Series objects from this | ||
# calculation, necessitating the unfortunate circular reference to the | ||
# child class here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this somehow worse than the previous code referring to Series
within the Series.hash_values
? It doesn't seem like a problem to me - the comment block might be unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because Series inherits from IndexedFrame. This is code in a parent class that relies on knowing a specific child exists, how to instantiate it, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying this is circular (or awkward) from the perspective of Python imports, or from a perspective of object-oriented purity? This implementation looks correct and feels only slightly awkward to me. This could be phrased as, "A _ThingContainer
has derived classes SingleThingContainer
and MultiThingContainer
. A _ThingContainer
has a method collapseThings
that returns a SingleThingContainer
, which can be called by its derived classes," with which I have no qualms.
Regardless, I don't think there is a good way around it. If this were indicative of a design flaw that we could fix, then I might say we should keep the comment (as a sort of TODO that indicates the circularity should be removed with some improved design). However, I hesitate to call this "unfortunate" if it is necessary.
(This is not a big deal - happy to close the conversation and let you resolve however you wish. Just wanted to clarify my thoughts.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My issue with this is that your description isn't quite correct. The result of DataFrame.hash_values
is reduced in dimensionality from the input because it results in a Series
, whereas Series.hash_values
produces an output of the same dimensionality because a Series
is in this case treated like a single column, not a single row. SingleThingContainer.collapseThings()
does not actually collapse things. So yes, this is about object-oriented purity, but it is also something of a design flaw because in some sense we're saying that IndexedFrame.hash_values
is actually semantically different for different subclasses of IndexedFrame
.
From a purist perspective, this discussion is actually making me want to undo this one change in this PR and move the method back to the two child classes for this reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting viewpoint. What you're describing sounds more like a choice of how to leverage a type system (e.g. principles of covariance / contravariance) with the class hierarchy and methods' return types.
Here's why I think this is fine to keep in IndexedFrame
, if you wish:
- Many methods of
IndexedFrame
(or its parentFrame
) have covariant return types and return an instance ofself.__class__
(e.g. round) - The return type of
hash_values
is invariant to the choice of subclass. Other return-type-invariant methods likeFrame.__len__
can return anint
or astr
or any type that doesn't depend on the choice of subclass. It just happens that the invariant return type ofIndexedFrame.hash_values
isSeries
, a child class ofIndexedFrame
.
It is partly a question of what semantics we choose to adhere to in IndexedFrame
. Regardless, not all methods of IndexedFrame
have covariant return types (and it would be silly to require that), so I'm not sure if any reasoning from the type system would justify removing this from IndexedFrame
, especially since the underlying Cython hash doesn't care what subclass of Frame
it receives. Methods with invariant return types are perfectly fine to include, and I think this is one of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to keep here. I don't think the following is necessarily true:
IndexedFrame.hash_values is actually semantically different for different subclasses of IndexedFrame.
hash_values
hashes each row of a Frame. A Series
is viewed as Frame
with N
rows, rather than a Frame
with N
columns. I would consider it surprising behaviour if Series.hash_values
instead returned a scalar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@gpucibot merge |
This PR just removes some unused internal functions and inlines some single-use functions that were defined at the wrong levels of the class hierarchy (largely
Frame
internal methods that were exclusively called in a singleDataFrame
method).