ENH: Add downcast as method to DataFrame and Series #51641

phofl · 2023-02-26T00:14:09Z

xref DEPR: Deprecate downcast keyword for fillna #40988 (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

@jbrockmendel adding downcast as first step towards getting rid of the keyword for fillna and related

jbrockmendel · 2023-02-26T01:51:18Z

Do we have an idea how often anything other than "infer" is used? im not wild about a try/except for the non-infer cases, and without it users might as well just user .astype

phofl · 2023-02-26T17:41:34Z

Fair point. I don't think very often, but not sure. We could just try it with infer and see if there are any reports that people are using it?

jbrockmendel · 2023-02-26T22:12:50Z

We could just try it with infer and see if there are any reports that people are using it?

you mean just enable "infer" in the new method and see if anyone asks for the other options? that seems fine

phofl · 2023-02-27T00:32:15Z

Yes exactly

phofl · 2023-02-27T00:38:02Z

Removed the argument

jbrockmendel · 2023-03-05T18:19:20Z

pandas/core/generic.py

+        if using_copy_on_write():
+            result = self.copy(deep=False)
+        else:
+            result = self.copy(deep=True)


i would have expected this to be handled within the Manager method. am i wrong to be surprised?

Doesn’t really matter, when we supported dict like inputs this was better, but can move it to the manager now

jbrockmendel · 2023-03-05T18:20:16Z

pandas/tests/frame/methods/test_downcast.py

+
+
+class TestDowncast:
+    def test_downcast(self):


parametrize over frame_or_series?

I think this makes it more complicated

the alternative is to implement an analogous test in the series tests

just do

if frame_or_series is Series: obj = obj["A"] expected = expected["A"] result = ...

?

I added a series test already, should have commented. Are you ok with that?

jbrockmendel · 2023-03-05T18:23:21Z

couple comments, otherwise LGTM. id like to make sure we have buy-in from the rest of the team on doing this and deprecating the keyword

phofl · 2023-03-05T19:31:54Z

Sounds good to me. Should we ping on the issue?

jreback

isn't this what to_numeric does ?
we rejected this method in the past for that reason

what's the rationale here for adding?

jreback · 2023-03-05T20:25:45Z

+1 on the deprecation itself

phofl · 2023-03-05T20:42:59Z

to_numeric doesn't work well on DataFrames, but also it changes the size of your dtype if possible, downcast here only coerces from float to int if possible

jreback · 2023-03-05T20:56:37Z

sure but maybe that's a better option

eg enhancing to_numeric to accept a frame and also a less strict option

phofl · 2023-03-05T20:58:44Z

Hm I think calling to_numeric on an already numeric object is a bit counterintuitive? Should we rather mirror the functionality from to_numeric here and centralise down casting logic here?

jreback · 2023-03-05T21:00:58Z

that's another option

yeah just shouldnt be different very similar options -

phofl · 2023-03-05T23:23:34Z

@jbrockmendel thoughts?

jorisvandenbossche · 2023-03-13T22:42:26Z

downcast here only coerces from float to int if possible

This might be confusing for a method called "downcast"? When I saw this issue, I assumed it would be about a method that would actually downcast things like int64 -> int32 etc (that's also what the downcast keyword in to_numeric does).
The current docstring in the PR also isn't very clear about that this is the only thing it does.

Adding the actual downcasting logic that to_numeric has might make sense, then.

But, personally, I am not sure this is worth a completely new DataFrame/Series method.

And if we want something like this, I am wondering if it would be more interesting to have something more general like a "infer dtypes" (or "optimize" dtypes) method (although it would then be a but unfortunate that we already have both an infer_objects and convert_dtypes that both do something a bit different). "Inferring dtypes" could have options to enable several kinds of dtype optimizations, like float->int, smaller bitwidths for float/int, inferring object, ... (and later maybe things like converting to categorical for certain characteristics such as low cardinality strings)

phofl · 2023-03-13T22:46:25Z

Yeah I agree that we should add the down casting logic from to_numeric.

Right now all of them are doing something different. Goal is to centralise this here and also get rid of the inconsistent behavior in fillna and friends.

jbrockmendel · 2023-03-14T20:22:48Z

How to centralize/standardize downcasting in a minimal set of methods is hard, as is naming.

The "real" motivation is getting the downcasting out of fillna etc. I think the lazy thing to do is deprecate that first then see if anyone complains, then add this if necessary. That avoids the difficult part (choosing a name).

github-actions · 2023-04-14T00:05:07Z

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

jbrockmendel · 2023-05-05T02:42:55Z

The "real" motivation is getting the downcasting out of fillna etc. I think the lazy thing to do is deprecate that first then see if anyone complains, then add this if necessary. That avoids the difficult part (choosing a name).

i've stumbled upon the annoying downcast-inside-fillna/where again. Gonna go ahead and deprecate at least some of that.

phofl · 2023-05-05T10:33:14Z

Sounds good, let's close this for now till we get actual complaints. Should I take care of downcast keywords?

phofl added 9 commits February 22, 2023 22:59

ENH: Add downcast as method to df and Series

b7ba887

CoW implementation

bcad0a6

Merge remote-tracking branch 'upstream/main' into downcast

a8404bd

Add whatsnew

f6d7ef0

Revert

bacc5a1

Revert

c44949d

Fix copy

b9850b7

Add gh ref

e67bd06

Fix mypy

19ccd66

Remove dtype arg

49fca8d

phofl and others added 3 commits February 27, 2023 11:47

Fix docstring

1e79841

Merge remote-tracking branch 'upstream/main' into downcast

60c61aa

Merge branch 'main' into downcast

b006219

jbrockmendel reviewed Mar 5, 2023

View reviewed changes

jreback reviewed Mar 5, 2023

View reviewed changes

github-actions bot added the Stale label Apr 14, 2023

phofl closed this May 5, 2023

phofl deleted the downcast branch August 28, 2023 21:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add downcast as method to DataFrame and Series #51641

ENH: Add downcast as method to DataFrame and Series #51641

phofl commented Feb 26, 2023 •

edited

Loading

jbrockmendel commented Feb 26, 2023

phofl commented Feb 26, 2023

jbrockmendel commented Feb 26, 2023

phofl commented Feb 27, 2023

phofl commented Feb 27, 2023

jbrockmendel Mar 5, 2023

phofl Mar 5, 2023

jbrockmendel Mar 5, 2023

phofl Mar 5, 2023

jbrockmendel Mar 5, 2023

jbrockmendel Mar 5, 2023

phofl Mar 5, 2023

jbrockmendel commented Mar 5, 2023

phofl commented Mar 5, 2023

jreback left a comment

jreback commented Mar 5, 2023

phofl commented Mar 5, 2023

jreback commented Mar 5, 2023

phofl commented Mar 5, 2023

jreback commented Mar 5, 2023

phofl commented Mar 5, 2023

jorisvandenbossche commented Mar 13, 2023

phofl commented Mar 13, 2023

jbrockmendel commented Mar 14, 2023

github-actions bot commented Apr 14, 2023

jbrockmendel commented May 5, 2023

phofl commented May 5, 2023

ENH: Add downcast as method to DataFrame and Series #51641

ENH: Add downcast as method to DataFrame and Series #51641

Conversation

phofl commented Feb 26, 2023 • edited Loading

jbrockmendel commented Feb 26, 2023

phofl commented Feb 26, 2023

jbrockmendel commented Feb 26, 2023

phofl commented Feb 27, 2023

phofl commented Feb 27, 2023

jbrockmendel Mar 5, 2023

Choose a reason for hiding this comment

phofl Mar 5, 2023

Choose a reason for hiding this comment

jbrockmendel Mar 5, 2023

Choose a reason for hiding this comment

phofl Mar 5, 2023

Choose a reason for hiding this comment

jbrockmendel Mar 5, 2023

Choose a reason for hiding this comment

jbrockmendel Mar 5, 2023

Choose a reason for hiding this comment

phofl Mar 5, 2023

Choose a reason for hiding this comment

jbrockmendel commented Mar 5, 2023

phofl commented Mar 5, 2023

jreback left a comment

Choose a reason for hiding this comment

jreback commented Mar 5, 2023

phofl commented Mar 5, 2023

jreback commented Mar 5, 2023

phofl commented Mar 5, 2023

jreback commented Mar 5, 2023

phofl commented Mar 5, 2023

jorisvandenbossche commented Mar 13, 2023

phofl commented Mar 13, 2023

jbrockmendel commented Mar 14, 2023

github-actions bot commented Apr 14, 2023

jbrockmendel commented May 5, 2023

phofl commented May 5, 2023

phofl commented Feb 26, 2023 •

edited

Loading