Pandas interpolation enhancement request : specifying the maximum gap to interpolate. #12187

delie0512 · 2016-01-31T02:01:50Z

Currently, Pandas interpolation interpolates all gaps, regardless of there size and the limit parameter is used to limit the number of replacement : if there is a gap of 3 values and limit=2, pandas replaces the first 2 values.

I have difficulty understanding why someone would want to do interpolation on only a few missing items in a consecutive series of missing.

Personally, depending on the length of the gap, I would like to decide to interpolate the whole gap or none of it. For example, in an hourly time-series, interpolation of missing hours up to a maximum of 3 consecutive hours:
gaps <= 3 would be interpolated
gaps > 3 remain untouched.

I would appreciated a option for interpolation such as R "na.approx" in which a "maxgap" parameters is available. maxgap: maximum number of consecutive NAs to fill. Any longer gaps will be left unchanged.

jreback · 2016-01-31T15:32:33Z

you can do something this. Not sure how useful this is generally as IIRC this is first request about this type of filling

In [79]: s = Series([0]+[np.nan]*2+[3,4,5]+[np.nan]*4+[10],index=pd.date_range('20130101',periods=11,freq='T'))

In [80]: s
Out[80]: 
2013-01-01 00:00:00     0
2013-01-01 00:01:00   NaN
2013-01-01 00:02:00   NaN
2013-01-01 00:03:00     3
2013-01-01 00:04:00     4
2013-01-01 00:05:00     5
2013-01-01 00:06:00   NaN
2013-01-01 00:07:00   NaN
2013-01-01 00:08:00   NaN
2013-01-01 00:09:00   NaN
2013-01-01 00:10:00    10
Freq: T, dtype: float64

In [81]: indexer = s.groupby(pd.Grouper(freq='3T')).filter(lambda x: x.isnull().sum()==len(x))

In [82]: indexer
Out[82]: 
2013-01-01 00:06:00   NaN
2013-01-01 00:07:00   NaN
2013-01-01 00:08:00   NaN
Freq: T, dtype: float64

In [83]: s2 = s.interpolate()                   

In [84]: s.interpolate().loc[indexer.index]
Out[84]: 
2013-01-01 00:06:00    6
2013-01-01 00:07:00    7
2013-01-01 00:08:00    8
Freq: T, dtype: float64

In [86]: s2 = s.copy()

In [87]: s2.loc[indexer.index] = s.interpolate()

In [88]: s2
Out[88]: 
2013-01-01 00:00:00     0
2013-01-01 00:01:00   NaN
2013-01-01 00:02:00   NaN
2013-01-01 00:03:00     3
2013-01-01 00:04:00     4
2013-01-01 00:05:00     5
2013-01-01 00:06:00     6
2013-01-01 00:07:00     7
2013-01-01 00:08:00     8
2013-01-01 00:09:00   NaN
2013-01-01 00:10:00    10
Freq: T, dtype: float64

In [89]: s.resample('3T',how='sum')
Out[89]: 
2013-01-01 00:00:00     0
2013-01-01 00:03:00    12
2013-01-01 00:06:00   NaN
2013-01-01 00:09:00    10
Freq: 3T, dtype: float64

In [90]: s2.resample('3T',how='sum')
Out[90]: 
2013-01-01 00:00:00     0
2013-01-01 00:03:00    12
2013-01-01 00:06:00    21
2013-01-01 00:09:00    10
Freq: 3T, dtype: float64

delie0512 · 2016-02-01T03:06:17Z

Based on the original values of "s" above, the results that I looking for "interpolating gaps up to 3 hours" in a hourly time-series would be as follows. The first gap (2 values) is interpolated, but for the second gap ( 4 values) no interpolation is performed at all. I could write code to do what I want, but I would prefer using an existing tool for which the user can specific the maximum gap to interpolate (maxgap).

2013-01-01 00:00:00     0
2013-01-01 00:01:00     1 interpolated, since gap = =2 <=maxgap
2013-01-01 00:02:00     2 interpolated, since gap = =2 <=maxgap
2013-01-01 00:03:00     3
2013-01-01 00:04:00     4
2013-01-01 00:05:00     5
2013-01-01 00:06:00   NaN   not interpolated, since gap = 4 > maxgap
2013-01-01 00:07:00   NaN   not interpolated, since gap = 4 > maxgap
2013-01-01 00:08:00   NaN   not interpolated, since gap = 4 > maxgap
2013-01-01 00:09:00   NaN   not interpolated, since gap = 4 > maxgap
2013-01-01 00:10:00    10
Freq: T, dtype: float64

kawochen · 2016-02-01T05:49:11Z

this is a bit like consolidating fillna and interpolate

tomchor · 2016-05-24T15:35:04Z

I've actually had to do something like this before. See this answer to check out how it was achieved.

I can actually say this is pretty useful. I know a number of quality-control techniques that use this kind of reasoning: if gap < maxgap do nothing, do something else otherwise (or vice-versa).

I would recommend something like passing an extra maxgap and mingap keywords to interpolate, that control maximum number of consecutive gaps to consider when interpolating and the minimum. The OP's original example would become something like

interpolated = df.interpolate(maxgap=3, mingap=None)

@delie0512 @jreback

jreback · 2016-05-24T19:39:28Z

yeah iterative grouping is usefule, see the last example here: http://pandas.pydata.org/pandas-docs/stable/cookbook.html#grouping

I suppose if someone wanted to contribute a maxgaps would take it.

AdamRJensen · 2018-07-06T08:26:23Z

This seems like a very real issue, as from my perspective the whole point of having the "limit" argument is to avoid filling spaces which are too large, i.e. the assumption for filling does not hold.

Is anyone still looking at this issue?

Another very similar issue is #16457 and on stackoverflow: https://stackoverflow.com/questions/43077166/interpolate-only-if-single-nan/43079055#43079055
https://stackoverflow.com/questions/43082316/mask-only-where-consecutive-nans-exceeds-x#

notuntoward · 2018-08-02T10:54:03Z

I've had to do this many, many times, and adding this capability would be a good improvement.

The most common use case is when you can justify interpolating short missing data gaps within a signal's temporal autcorrelation range, but cannot justify interpolating long gaps.

Another use case is when data is being upsampled using pd.resample(), which inserts NaNs for timestamps to be interpolated. Say you're upsampling by a factor of three. In this case, you can do the upsample by interpolating across at most 2 consecutive NaNs. Three or more consecutive NaNs indicate the presence of missing data in the original time series; you don't want to interpolate across that.

donglai96 · 2019-11-19T04:28:41Z

Is someone still working on this problem?

veenstrajelmer · 2021-07-28T08:48:07Z

I read this was already implemented, but as of today this still seems to be not the case. It would be a valuable improvement I think, would it be possible to make this available?

jreback · 2021-07-28T11:07:55Z

@veenstrajelmer this is an open issue and we would likely take a PR for implementation

temeddix · 2021-12-23T03:01:34Z

Hope this becomes possible soon...

jreback · 2021-12-23T03:27:43Z

community contributions are how things get done

fdelcab · 2023-03-06T15:17:41Z

Hi community, do we already have a solution for this issue?

delie0512 · 2023-03-08T15:36:44Z

I dont think so. You can use xarray.DataArray.interpolate_na instead.

delie0512 changed the title ~~Pandas interpolation enhanvcment request : specifying the maximum gap to interpolate.~~ Pandas interpolation enhancement request : specifying the maximum gap to interpolate. Jan 31, 2016

jreback added Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate API Design Enhancement Difficulty Intermediate labels Jan 31, 2016

delie0512 closed this as completed Feb 1, 2016

delie0512 reopened this Feb 1, 2016

jreback added this to the Next Major Release milestone May 24, 2016

dnowacki-usgs mentioned this issue Aug 30, 2018

Improving interpolate_na()'s limit argument pydata/xarray#2392

Closed

cchwala mentioned this issue Feb 4, 2019

ENH: Added max_gap keyword for series.interpolate #25141

Closed

4 tasks

jbrockmendel removed Effort Medium labels Oct 21, 2019

mroeschke removed API Design Groupby labels Apr 23, 2021

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

Ockenfuss mentioned this issue Mar 23, 2023

Interpolate_na: Rework 'limit' argument documentation/implementation pydata/xarray#7665

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pandas interpolation enhancement request : specifying the maximum gap to interpolate. #12187

Pandas interpolation enhancement request : specifying the maximum gap to interpolate. #12187

delie0512 commented Jan 31, 2016

jreback commented Jan 31, 2016

delie0512 commented Feb 1, 2016

kawochen commented Feb 1, 2016

tomchor commented May 24, 2016 •

edited

Loading

jreback commented May 24, 2016

AdamRJensen commented Jul 6, 2018 •

edited

Loading

notuntoward commented Aug 2, 2018 •

edited

Loading

donglai96 commented Nov 19, 2019

veenstrajelmer commented Jul 28, 2021

jreback commented Jul 28, 2021

temeddix commented Dec 23, 2021

jreback commented Dec 23, 2021

fdelcab commented Mar 6, 2023

delie0512 commented Mar 8, 2023

Pandas interpolation enhancement request : specifying the maximum gap to interpolate. #12187

Pandas interpolation enhancement request : specifying the maximum gap to interpolate. #12187

Comments

delie0512 commented Jan 31, 2016

jreback commented Jan 31, 2016

delie0512 commented Feb 1, 2016

kawochen commented Feb 1, 2016

tomchor commented May 24, 2016 • edited Loading

jreback commented May 24, 2016

AdamRJensen commented Jul 6, 2018 • edited Loading

notuntoward commented Aug 2, 2018 • edited Loading

donglai96 commented Nov 19, 2019

veenstrajelmer commented Jul 28, 2021

jreback commented Jul 28, 2021

temeddix commented Dec 23, 2021

jreback commented Dec 23, 2021

fdelcab commented Mar 6, 2023

delie0512 commented Mar 8, 2023

tomchor commented May 24, 2016 •

edited

Loading

AdamRJensen commented Jul 6, 2018 •

edited

Loading

notuntoward commented Aug 2, 2018 •

edited

Loading