You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In time series analyses, I may have missing values throughout my data that I would like to fill. In the general case, I can use Series.fillna to fill missing values with things like a scalar, the preceding valid value (forward fill), or the next valid value (backward fill).
When my data has a temporal trend, the effectiveness of these techniques can break down. In such situations, I often want to interpolate between the valid values to replace my missing values with a statistical approach rather than fill missing values with a single scalar, forward, or backward fill.
Pandas provides such functionality via an interpolate API that delegates to numpy.interp and the scipy.interpolate sub-package to support a variety of interpolation techniques (including the standard forward fill described above).
I'd like to be able to able to conduct linear interpolation like I can do in pandas. For linear interpolation, pandas delegates to numpy.interp which has an corresponding implementation in cupy (meaning this may be possible in the Python layer)
Adds Series and DataFrame level functions for linear interpolation of missing values, built around CuPy's `interp` method.
Pandas `interpolate` API supports somewhat varied functionality for filling `NaN`s. It currently does not work for actual `<NA>` values - pandas issue [here.](pandas-dev/pandas#40252). That said one might expect both kinds of missing data to be treated equally for the purposes of interpolation, and this PR does that.
While `cp.interp` is great for getting us off the ground, but only supports linear interpolation and its results aren't exactly what pandas produces. In particular pandas will not fill `NaN`s at the start of the series, because the default value of `limit_direction` is `forward` and the default `limit` is `None` which from my experimentation means 'unlimited'. This means that that despite this, the `NaN`s at the end WILL get filled. This means we need to actually figure out where the first NaN is and mask out that part of the series with `NaN`s.
Closes#8685.
Authors:
- https://github.com/brandon-b-miller
Approvers:
- Vyas Ramasubramani (https://github.com/vyasr)
- Ashwin Srinath (https://github.com/shwina)
URL: #8767
In time series analyses, I may have missing values throughout my data that I would like to fill. In the general case, I can use
Series.fillna
to fill missing values with things like a scalar, the preceding valid value (forward fill), or the next valid value (backward fill).When my data has a temporal trend, the effectiveness of these techniques can break down. In such situations, I often want to interpolate between the valid values to replace my missing values with a statistical approach rather than fill missing values with a single scalar, forward, or backward fill.
Pandas provides such functionality via an
interpolate
API that delegates to numpy.interp and the scipy.interpolate sub-package to support a variety of interpolation techniques (including the standard forward fill described above).I'd like to be able to able to conduct linear interpolation like I can do in pandas. For linear interpolation, pandas delegates to
numpy.interp
which has an corresponding implementation in cupy (meaning this may be possible in the Python layer)The text was updated successfully, but these errors were encountered: