Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Added max_gap keyword for series.interpolate #25141

Closed
wants to merge 36 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
5e4b2ee
Added maxgap keyword for series.interpolate
cchwala Feb 4, 2019
b752602
minor pep8 fixes
cchwala Feb 4, 2019
839b11a
fixed parameter order
cchwala Feb 4, 2019
fcdc4e4
Merge remote-tracking branch 'upstream/master' into interpolate_maxgap
cchwala Mar 26, 2019
20b70b7
Merge remote-tracking branch 'upstream/master' into interpolate_maxgap
cchwala Jun 11, 2019
3cb371e
Changed parameter name from `maxgap` to `max_gap`
cchwala Jun 11, 2019
8c6ff7a
Moved code to derive indices of "NaNs to preserve" in separate function
cchwala Jun 11, 2019
4aaf8dc
Tests for errors extended and moved to own function
cchwala Jun 11, 2019
1f0406f
added blank lines in docstring as requested
cchwala Jun 11, 2019
eaacefd
Added test which fails for method='pad'
cchwala Jun 11, 2019
f274d16
Merge remote-tracking branch 'upstream/master' into interpolate_maxgap
cchwala Aug 30, 2019
c72acdb
manually add black code formating
cchwala Aug 30, 2019
e0aee3a
First WIP but working version to fix issue with `pad` and `limit_area`
cchwala Sep 5, 2019
af15eaf
fix: do not decide based on dimension but on crucial kwargs which int…
cchwala Sep 5, 2019
12d2e5b
some clean up
cchwala Sep 5, 2019
c25d1f8
Make it work with NaT and test for that
cchwala Sep 17, 2019
4d40722
Added comment on why two interpolate fill functions are needed
cchwala Sep 17, 2019
255518e
fix typo
cchwala Sep 17, 2019
2015e84
Added tests for DataFrames
cchwala Sep 17, 2019
4d7b0f1
Added failing test for https://github.com/pandas-dev/pandas/issues/12918
cchwala Sep 17, 2019
cbf7388
Now using 1D pad and backfill functions in `interpolate_1d_fill()`
cchwala Sep 17, 2019
5128b9d
Merge remote-tracking branch 'upstream/master' into interpolate_maxgap
cchwala Nov 11, 2019
3c55e1e
Additional required adjustments after merge with upstream/master
cchwala Nov 19, 2019
f9e4044
Merge remote-tracking branch 'upstream/master' into interpolate_maxgap
cchwala Nov 19, 2019
d1bbcd6
Removed test for bug with pad which should be solved in a separate PR
cchwala Nov 19, 2019
21b3091
removed trailing whitespaces
cchwala Nov 19, 2019
c96c604
fixed formating for black and flake8
cchwala Nov 19, 2019
bd84fc9
updated docstring for interpolat with max_gap
cchwala Nov 19, 2019
908ffe5
added max_gap info and example to documentation
cchwala Nov 20, 2019
380ef7c
added info to whatsnew file
cchwala Nov 20, 2019
5a1718a
flake8
cchwala Nov 20, 2019
16755bd
update docs with info on limit_direction and method pad
cchwala Nov 20, 2019
b58d721
better test for https://github.com/pandas-dev/pandas/issues/26796
cchwala Nov 20, 2019
aa58ffa
typo, black, flake8
cchwala Nov 20, 2019
ae16124
update to doc
cchwala Nov 20, 2019
28b442c
fix wrong behavior when combining max_gap and limit_direction
cchwala Nov 20, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 23 additions & 2 deletions doc/source/user_guide/missing_data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -339,6 +339,10 @@ Interpolation

The ``limit_area`` keyword argument was added.

.. versionadded:: 1.0.0

The ``max_gap`` keyword argument was added.

Both Series and DataFrame objects have :meth:`~DataFrame.interpolate`
that, by default, performs linear interpolation at missing data points.

Expand Down Expand Up @@ -481,8 +485,9 @@ filled since the last valid observation:

.. ipython:: python

ser = pd.Series([np.nan, np.nan, 5, np.nan, np.nan,
np.nan, 13, np.nan, np.nan])
ser = pd.Series([np.nan, np.nan, 2, np.nan, np.nan,
3, np.nan, np.nan, np.nan,
13, np.nan, np.nan])
ser

# fill all consecutive values in a forward direction
Expand All @@ -491,8 +496,24 @@ filled since the last valid observation:
# fill one consecutive value in a forward direction
ser.interpolate(limit=1)

If an interpolation should only be carried out for consecutive ``NaN`` values
of a certain maximum length, the ``max_gap`` keyword, introduced in v1.0.0,
can be used. Any ``NaN`` gap longer than ``max_gap`` will not be modified.
This can be useful, e.g. if an interpolation using the ``scipy`` methods
should be restricted to short NaN-gaps because the expected variation over
longer NaN-gaps forbids using interpolated values.

.. ipython:: python

ser
# interpolate in forward direction but only NaN-gaps with a maximum 2 consecutive NaN values
ser.interpolate(max_gap=2)

By default, ``NaN`` values are filled in a ``forward`` direction. Use
``limit_direction`` parameter to fill ``backward`` or from ``both`` directions.
Note that for methods `pad`, `ffill`, `backfill` and `bfill` ``limit_direction``
must not be set as these fill methods implicitly are meant to work only in one
direction.

.. ipython:: python

Expand Down
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v1.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@ Other enhancements
- Roundtripping DataFrames with nullable integer or string data types to parquet
(:meth:`~DataFrame.to_parquet` / :func:`read_parquet`) using the `'pyarrow'` engine
now preserve those data types with pyarrow >= 1.0.0 (:issue:`20612`).
- :meth:`Series.interpolate` added the ``max_gap`` keyword to limit interpolation to NaN-gaps of a certain length (:issue:`25141`)

Build Changes
^^^^^^^^^^^^^
Expand Down Expand Up @@ -300,6 +301,7 @@ Performance improvements
Bug fixes
~~~~~~~~~

- ``limit_area`` and ``limit_direction`` now work in :meth:`Series.interpolate` if ``method`` is ``pad`` (:issue:`25141`)

Categorical
^^^^^^^^^^^
Expand Down
67 changes: 65 additions & 2 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -6894,7 +6894,9 @@ def replace(
Update the data in place if possible.
limit_direction : {'forward', 'backward', 'both'}, default 'forward'
If limit is specified, consecutive NaNs will be filled in this
direction.
direction. If the methods 'pad' or 'ffill' are used it must be
None or 'forward'. If 'backfill' or 'bfill' are use it must be
None or 'backwards'.
limit_area : {`None`, 'inside', 'outside'}, default None
If limit is specified, consecutive NaNs will be filled with this
restriction.
Expand All @@ -6906,6 +6908,13 @@ def replace(

.. versionadded:: 0.23.0

max_gap : int, optional
Maximum number of consecutive NaN values up to which a NaN-gap
will be interpolated. All longer NaN-gaps will be left unchanged.
Must be greater than 0.

.. versionadded:: 1.0.0

downcast : optional, 'infer' or None, defaults to None
Downcast dtypes if possible.
**kwargs
Expand Down Expand Up @@ -6990,6 +6999,36 @@ def replace(
8 4.71
dtype: object

Similar to the examples above. Filling in ``NaN`` in a Series
by padding, but here filling only NaN-gaps smaller than a specific
gap width using the kwarg `max_gap`.

>>> s = pd.Series([np.nan, "single_one", np.nan,
... "fill_two_more", np.nan, np.nan, np.nan,
... 4.71, np.nan])
>>> s
0 NaN
1 single_one
2 NaN
3 fill_two_more
4 NaN
5 NaN
6 NaN
7 4.71
8 NaN
dtype: object
>>> s.interpolate(method='pad', max_gap=2)
0 NaN
1 single_one
2 single_one
3 fill_two_more
4 NaN
5 NaN
6 NaN
7 4.71
8 4.71
dtype: object

Filling in ``NaN`` in a Series via polynomial interpolation or splines:
Both 'polynomial' and 'spline' methods require that you also specify
an ``order`` (int).
Expand Down Expand Up @@ -7045,8 +7084,9 @@ def interpolate(
axis=0,
limit=None,
inplace=False,
limit_direction="forward",
limit_direction=None,
limit_area=None,
max_gap=None,
downcast=None,
**kwargs,
):
Expand Down Expand Up @@ -7085,6 +7125,28 @@ def interpolate(
"column to a numeric dtype."
)

# Set `limit_direction` depending on `method`
if (method == "pad") or (method == "ffill"):
if (limit_direction == "backward") or (limit_direction == "both"):
raise ValueError(
"`limit_direction` must not be `%s` for method `%s`"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use f-strings for this and the one on L 7140

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay

% (limit_direction, method)
)
else:
limit_direction = "forward"
elif (method == "backfill") or (method == "bfill"):
if (limit_direction == "forward") or (limit_direction == "both"):
raise ValueError(
"`limit_direction` must not be `%s` for method `%s`"
% (limit_direction, method)
)
else:
limit_direction = "backward"
else:
# Set default
if limit_direction is None:
limit_direction = "forward"

# create/use the index
if method == "linear":
# prior default
Expand Down Expand Up @@ -7120,6 +7182,7 @@ def interpolate(
limit=limit,
limit_direction=limit_direction,
limit_area=limit_area,
max_gap=max_gap,
inplace=inplace,
downcast=downcast,
**kwargs,
Expand Down
48 changes: 39 additions & 9 deletions pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -1083,6 +1083,7 @@ def interpolate(
values=None,
inplace=False,
limit=None,
max_gap=None,
limit_direction="forward",
limit_area=None,
fill_value=None,
Expand Down Expand Up @@ -1117,6 +1118,8 @@ def check_int_bool(self, inplace):
axis=axis,
inplace=inplace,
limit=limit,
max_gap=max_gap,
limit_area=limit_area,
fill_value=fill_value,
coerce=coerce,
downcast=downcast,
Expand All @@ -1133,6 +1136,7 @@ def check_int_bool(self, inplace):
values=values,
axis=axis,
limit=limit,
max_gap=max_gap,
limit_direction=limit_direction,
limit_area=limit_area,
fill_value=fill_value,
Expand All @@ -1147,6 +1151,8 @@ def _interpolate_with_fill(
axis=0,
inplace=False,
limit=None,
max_gap=None,
limit_area=None,
fill_value=None,
coerce=False,
downcast=None,
Expand All @@ -1169,16 +1175,38 @@ def _interpolate_with_fill(
# We only get here for non-ExtensionBlock
fill_value = convert_scalar(self.values, fill_value)

values = missing.interpolate_2d(
values,
method=method,
axis=axis,
limit=limit,
fill_value=fill_value,
dtype=self.dtype,
)
# We have to distinguish two cases:
# 1. When kwargs `max_gap` or `limit_area` are used: They are not
# supported by `missing.interpolate_2d()`. Using these kwargs only
# works by applying the fill along a certain axis.
# 2. All other cases: Then, `missing.interpolate_2d()` can be used.
if (max_gap is not None) or (limit_area is not None):

def func(x):
return missing.interpolate_1d_fill(
x,
method=method,
axis=axis,
limit=limit,
max_gap=max_gap,
limit_area=limit_area,
fill_value=fill_value,
dtype=self.dtype,
)

interp_values = np.apply_along_axis(func, axis, values)

else:
interp_values = missing.interpolate_2d(
values,
method=method,
axis=axis,
limit=limit,
fill_value=fill_value,
dtype=self.dtype,
)

blocks = [self.make_block_same_class(values, ndim=self.ndim)]
blocks = [self.make_block_same_class(interp_values, ndim=self.ndim)]
return self._maybe_downcast(blocks, downcast)

def _interpolate(
Expand All @@ -1189,6 +1217,7 @@ def _interpolate(
fill_value=None,
axis=0,
limit=None,
max_gap=None,
limit_direction="forward",
limit_area=None,
inplace=False,
Expand Down Expand Up @@ -1227,6 +1256,7 @@ def func(x):
x,
method=method,
limit=limit,
max_gap=max_gap,
limit_direction=limit_direction,
limit_area=limit_area,
fill_value=fill_value,
Expand Down
Loading