-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add downcast as method to DataFrame and Series #51641
Conversation
Do we have an idea how often anything other than "infer" is used? im not wild about a try/except for the non-infer cases, and without it users might as well just user .astype |
Fair point. I don't think very often, but not sure. We could just try it with infer and see if there are any reports that people are using it? |
you mean just enable "infer" in the new method and see if anyone asks for the other options? that seems fine |
Yes exactly |
Removed the argument |
if using_copy_on_write(): | ||
result = self.copy(deep=False) | ||
else: | ||
result = self.copy(deep=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would have expected this to be handled within the Manager method. am i wrong to be surprised?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn’t really matter, when we supported dict like inputs this was better, but can move it to the manager now
|
||
|
||
class TestDowncast: | ||
def test_downcast(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parametrize over frame_or_series?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes it more complicated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the alternative is to implement an analogous test in the series tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just do
if frame_or_series is Series:
obj = obj["A"]
expected = expected["A"]
result = ...
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a series test already, should have commented. Are you ok with that?
couple comments, otherwise LGTM. id like to make sure we have buy-in from the rest of the team on doing this and deprecating the keyword |
Sounds good to me. Should we ping on the issue? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this what to_numeric does ?
we rejected this method in the past for that reason
what's the rationale here for adding?
+1 on the deprecation itself |
to_numeric doesn't work well on DataFrames, but also it changes the size of your dtype if possible, downcast here only coerces from float to int if possible |
sure but maybe that's a better option eg enhancing to_numeric to accept a frame and also a less strict option |
Hm I think calling to_numeric on an already numeric object is a bit counterintuitive? Should we rather mirror the functionality from to_numeric here and centralise down casting logic here? |
that's another option yeah just shouldnt be different very similar options - |
@jbrockmendel thoughts? |
This might be confusing for a method called "downcast"? When I saw this issue, I assumed it would be about a method that would actually downcast things like int64 -> int32 etc (that's also what the Adding the actual downcasting logic that to_numeric has might make sense, then. But, personally, I am not sure this is worth a completely new DataFrame/Series method. And if we want something like this, I am wondering if it would be more interesting to have something more general like a "infer dtypes" (or "optimize" dtypes) method (although it would then be a but unfortunate that we already have both an |
Yeah I agree that we should add the down casting logic from to_numeric. Right now all of them are doing something different. Goal is to centralise this here and also get rid of the inconsistent behavior in fillna and friends. |
How to centralize/standardize downcasting in a minimal set of methods is hard, as is naming. The "real" motivation is getting the downcasting out of fillna etc. I think the lazy thing to do is deprecate that first then see if anyone complains, then add this if necessary. That avoids the difficult part (choosing a name). |
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
i've stumbled upon the annoying downcast-inside-fillna/where again. Gonna go ahead and deprecate at least some of that. |
Sounds good, let's close this for now till we get actual complaints. Should I take care of downcast keywords? |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.@jbrockmendel adding downcast as first step towards getting rid of the keyword for fillna and related