ENH: fill_value argument for shift #15527

ResidentMario · 2017-02-27T23:40:08Z

closes ENH: fill_value argument for shift #15486
tests added / passed
passes git diff upstream/master | flake8 --diff
whatsnew entry

ResidentMario · 2017-02-27T23:40:48Z

Tests+ are a WIP.

jreback · 2017-02-27T23:47:10Z

@ResidentMario tip. write tests first! This validates that you are actually testing the right thing.

ResidentMario · 2017-02-28T00:00:07Z

Will try to follow TDD when here going forward.

Right now, this looks like it works for non-datetimes, but with datetimes the fill value is treated as UNIX time. So e.g. pd.Series(pd.date_range('1/1/2015', '1/30/2015')).shift(1, fill_value=0) replaces the first instance with 1970-01-01, not 0.

What behavior should be followed in that case? Trying to read the fill_value input as a datetime maybe?

ResidentMario · 2017-02-28T02:26:51Z

It seems to me that the behavior the emulate here w.r.t. df.shift(fill_value=0) is that of df.iloc[:, 0] = 0. Which does the following by column:

int and object columns get 0 (int) and retain their dtype.
float columns get 0.0 (float) and retain their dtype.
datetime columns get 1970-01-01 (datetime; UNIX timestamp) and retain their dtype.
period columns get 0 (int) and become object dtype. (!)
categorical columns fail out with ValueError: Cannot setitem on a Categorical with a new category, set the categories first.
sparse columns fail with a NotImplementedError.

jreback · 2017-02-28T02:38:48Z

no the fill value has to be a valid value for that dtype (this is checked already internally)

ResidentMario · 2017-02-28T04:14:27Z

The tests for shift are kind of all over the place (?). For the moment, I'm keeping the one's I'm writing in tests/frame/test_timeseries::TestDataFrameTimeSeriesMethods.test_shift_fillna (for DataFrame.shift) and tests/series/test_timeseries::TestSeriesTimeSeriesMethods.test_shift_fillna (for Series.shift).

As you stated above, the implementation is from a surface level as simple as passing the fill_value argument down to the requisite shift methods (several exist, one for each "kind" of axis), where in most cases there's a _maybe_upcast function which accepts the new fill_value as a keyword argument.

A couple of things I'm seeing in testing, however. I'm finding that in this implementation non-valid values actually get converted to np.nan: such is the fate of ts.shift(1, fill_value="Hello World"), which results in a first value of NaT. I'd have expected a ValueError instead, however...is this the desired behavior?

The other thing is that Categorical and Sparse don't make a call to _maybe_upcast internally. Should something else be done in these cases instead? Maybe these column types should raise a NotImplemented when passed a fill_value instead?

jreback · 2017-02-28T13:29:09Z

yes .shift tests are a bit of a mixed bag (always glad to consolidate things though - generally separate PR for things like that is good).

So we certainly can raise on a not-None fill_value for certain types (categorical; sparse in theory could work though, but don't spend too much time on, you can simply open an issue)

and to your last part, we are prob not validating things as much as we should so help there would be great as well!

ResidentMario · 2017-02-28T16:52:34Z

Would it be OK to make reworking the fill_value validation a separate issue? This seems like a broader issue, as other functions which have their own validation code paths. For example, with Series.add:

pd.Series([0, 1]).add(pd.Series([np.nan, np.nan]), fill_value='Hello')

Raises:

ValueError: invalid literal for int() with base 10: 'Hello'

On:

.../series.py in _binop(self, other, func, level, fill_value)
   1639             # one but not both
   1640             mask = this_mask ^ other_mask
-> 1641             this_vals[this_mask & mask] = fill_value
   1642             other_vals[other_mask & mask] = fill_value
   1643

Whereas this implementation results in an upcast to object type:

 >>> pd.Series([0, 1]).shift(1, fill_value="he")
 0    he
 1     0
 dtype: object

It would be good to write up something like a _maybe_fill method that would handle this check across all of the APIs that implement a fill_value and make this behavior consistent.

With this PR, I suggest:

Raise a NotImplemented error for categorical/sparse.
Write the tests where they are, for now.
Merge the current working implementation for non-categorical/non-sparse.
File an issue for consolidating shift tests.
File an issue for a _maybe_fill consolidator.
File an issue for extending fill_value to sparse.

jreback · 2017-02-28T16:59:51Z

@ResidentMario absolutely!

in general, 1 PR -> 1 'issue'. It makes it easier to reason about / review.

So for example I would leave this PR, and do another that does the validation as you suggest. You can then rebase this one on your new one (until it is merged), then rebase on master.

Very happy with independent (or even dependent PR's as I suggest above), just indicate if that is the case in the top of the PR.

and certainly filing issues is good as well; for bugs / features even if you plan to do them. It keeps things organized.

jreback · 2017-08-01T22:53:18Z

closing as stale but good idea. master has changed a bit since this PR, might be easier now.

ResidentMario added 2 commits February 27, 2017 18:33

Enable fill-value in Series.shift, DataFrame.shift.

4aa01fc

flake8 fixes.

8ef9305

jreback added the Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate label Feb 28, 2017

ResidentMario mentioned this pull request Feb 28, 2017

ENH: standardize fill_value behavior across the API #15533

Open

linebp mentioned this pull request Apr 20, 2017

ENH: fill_value argument for shift #15486

Closed

jreback closed this Aug 1, 2017

luyaor mentioned this pull request Jul 26, 2018

ENH: fill_value argument for shift #15486 #20460

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: fill_value argument for shift #15527

ENH: fill_value argument for shift #15527

ResidentMario commented Feb 27, 2017 •

edited

Loading

ResidentMario commented Feb 27, 2017 •

edited

Loading

jreback commented Feb 27, 2017 •

edited

Loading

ResidentMario commented Feb 28, 2017 •

edited

Loading

ResidentMario commented Feb 28, 2017

jreback commented Feb 28, 2017

ResidentMario commented Feb 28, 2017 •

edited

Loading

jreback commented Feb 28, 2017

ResidentMario commented Feb 28, 2017

jreback commented Feb 28, 2017 •

edited

Loading

jreback commented Aug 1, 2017

ENH: fill_value argument for shift #15527

ENH: fill_value argument for shift #15527

Conversation

ResidentMario commented Feb 27, 2017 • edited Loading

ResidentMario commented Feb 27, 2017 • edited Loading

jreback commented Feb 27, 2017 • edited Loading

ResidentMario commented Feb 28, 2017 • edited Loading

ResidentMario commented Feb 28, 2017

jreback commented Feb 28, 2017

ResidentMario commented Feb 28, 2017 • edited Loading

jreback commented Feb 28, 2017

ResidentMario commented Feb 28, 2017

jreback commented Feb 28, 2017 • edited Loading

jreback commented Aug 1, 2017

ResidentMario commented Feb 27, 2017 •

edited

Loading

ResidentMario commented Feb 27, 2017 •

edited

Loading

jreback commented Feb 27, 2017 •

edited

Loading

ResidentMario commented Feb 28, 2017 •

edited

Loading

ResidentMario commented Feb 28, 2017 •

edited

Loading

jreback commented Feb 28, 2017 •

edited

Loading