-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: standardize fill_value behavior across the API #15533
Comments
this is kind of like a lot of the validation is prob occuring now but at a lower level and with no consistency of error messages. A lot of the routines expect certain types for filling, IOW filling floats needs a compatible float/int (or would raise). So a friendly high level check would be nice. The hard part about this issue is not the code changes, but the tests :> Also collecting these tests into a standard place would be fine as well (this is tricky because we like to keep the with the types, e.g. in |
I'm hopeful I can figure out how to implement. Why are the tests the hard part? I assume you mean figuring out where to put them, which does sound challenging...might collect them in a separate file instead, just for the meantime. |
When a In other cases (e.g. Should It wouldn't be too hard to add a separate check to prevent this sort of input from reaching |
(sorry about the close/open, fat-fingered the wrong button there) |
you might be able to add a check here internal routines just have implicit (or better yet explicit guarantees that are in the docstring) |
FYI, this is legal in
Which is counter-factual w.r.t a (separate)
(to fix this you could do |
yeah you prob have to check inclusion rather than exclusion e.g. is_scalar, is_dict_like, is_list_like |
Yeah. Funny little bug with that:
|
In
The type checker naively assumes that if you passed it an object, it must have been a string! |
|
@ResidentMario note that imports from |
@ResidentMario Is the description at the top of this issue still up to date with how you are trying to implement things in #15587 ? |
Hmm. This list is incomplete, and I think there's been a couple of changes there:
A big question right now is whether or not in the case of a |
yes this is a special case atm, you you can simply use
yes |
I think a reasonable way to do this is to:
this would give nice behavior by default of filling things that can take that value and providing error checking otherwise (with an option for The current situation is effectively
I suppose we could also make the default |
Ok so then:
I suggest also adding a new |
Problem
In the PR for #15486, I found that type validation for the
fill_value
parameters strewn across a large number ofpandas
API methods is done ad-hoc. This results in a wide variety of possible accepted inputs. I think it would be good to standardize this so that all of these methods use the same behavior, the one currently used byfillna
.Implementation Details
Partially the point of providing a
fill_value
is to avoid having to do a slow-down type conversion otherwise (using.fillna().astype()
). However, specifying other formats is nevertheless a useful convenience to have. Implementation would roughly be:Before executing the rest of the method body, check whether or not the
fill_value
is valid (using a centralizedmaybe_fill
method). If it is not, throw aValueError
. If it is, check whether or not incorporating thefill_value
would result in an upcast in the columndtype.
If it would not, follow a code path where the column never gets type-converted. If it would, follow that same code path, then do something like afilla
operation at the end before returning.Target Implementation
The same as what
fillna
currently does. Which follows.Invalid:
categorical
fill for a category not in the categories will raise aValueError
.sparse
matrices refuse upcasting.Valid, upcast:
int
fill will promotebool
dtypes toint
.float
fill will promoteint
andbool
dtypes tofloat
(this is what happens withnp.nan
already).object
(str
) fill would promote lesser dtypes toobject
.int
,float
, andbool
fill to adatetime
dtype will be treated as a UNIX-like timestamp and promoted todatetime
.object
fill will promotedatetime
dtype toobject
.Valid, no-cast:
Current Implementation
...is ad-hoc. The following are the methods which currently provide a
fill_value
input, as well as where they deviate from the model above.Series.combine
,DataFrame.combine
,Series.to_sparse
: These are unique usages offill_value
which aren't compatible with the rest of them.Series.unstack
,DataFrame.unstack
: anyfill_value
is allowed. You can pass an object if you'd like, or even anotherDataFrame
(yo dawg...).DataFrame.align
: Anyfill_value
is allowed.DataFrame.reindex_axis
: Lists and dicts are allowed, objects are not.DataFrame.asfreq
,Series.asfreq
: anyfill_value
is allowed.pd.pivot_table
: ...Series.add
,DataFrame.add
: ...Series.subtract
,DataFrame.substract
: ...Probably others, there's a lot of these.
The text was updated successfully, but these errors were encountered: