BUG: Interpolate over time does not work with Int64 or Float64 #40252

pspeter · 2021-03-05T13:19:01Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

pd.DataFrame({"a": [1, None, 2]}, index=pd.to_datetime([1,2,3], unit="d")).convert_dtypes().interpolate(method="time")

Traceback (most recent call last):
  File "C:\venv\lib\site-packages\IPython\core\interactiveshell.py", line 3437, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-7-4eba02001155>", line 1, in <module>
    pd.DataFrame({"a": [1, None, 2]}, index=pd.to_datetime([1,2,3], unit="d")).convert_dtypes().interpolate(method="time")
  File "C:\venv\lib\site-packages\pandas\core\generic.py", line 7222, in interpolate
    new_data = obj._mgr.interpolate(
  File "C:\venv\lib\site-packages\pandas\core\internals\managers.py", line 593, in interpolate
    return self.apply("interpolate", **kwargs)
  File "C:\venv\lib\site-packages\pandas\core\internals\managers.py", line 427, in apply
    applied = getattr(b, f)(**kwargs)
  File "C:\venv\lib\site-packages\pandas\core\internals\blocks.py", line 1931, in interpolate
    values=values.fillna(value=fill_value, method=method, limit=limit),
  File "C:\venv\lib\site-packages\pandas\core\arrays\base.py", line 655, in fillna
    value, method = validate_fillna_kwargs(value, method)
  File "C:\venv\lib\site-packages\pandas\util\_validators.py", line 367, in validate_fillna_kwargs
    method = clean_fill_method(method)
  File "C:\venv\lib\site-packages\pandas\core\missing.py", line 82, in clean_fill_method
    raise ValueError(f"Invalid fill method. Expecting {expecting}. Got {method}")

ValueError: Invalid fill method. Expecting pad (ffill) or backfill (bfill). Got time

Problem description

Without the convert_dtypes() this works without any problems.

Expected Output

              a
1970-01-02  1.0
1970-01-03  1.5
1970-01-04  2.0

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : f2c8480
python : 3.8.7.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.18362
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 12, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English
pandas : 1.2.3
numpy : 1.19.5
pytz : 2021.1
dateutil : 2.8.1
pip : 21.0.1
setuptools : 49.2.1
Cython : None
pytest : 6.2.2
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.21.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 3.0.0
pyxlsb : None
s3fs : None
scipy : 1.6.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

The text was updated successfully, but these errors were encountered:

Adds Series and DataFrame level functions for linear interpolation of missing values, built around CuPy's `interp` method. Pandas `interpolate` API supports somewhat varied functionality for filling `NaN`s. It currently does not work for actual `<NA>` values - pandas issue [here.](pandas-dev/pandas#40252). That said one might expect both kinds of missing data to be treated equally for the purposes of interpolation, and this PR does that. While `cp.interp` is great for getting us off the ground, but only supports linear interpolation and its results aren't exactly what pandas produces. In particular pandas will not fill `NaN`s at the start of the series, because the default value of `limit_direction` is `forward` and the default `limit` is `None` which from my experimentation means 'unlimited'. This means that that despite this, the `NaN`s at the end WILL get filled. This means we need to actually figure out where the first NaN is and mask out that part of the series with `NaN`s. Closes #8685. Authors: - https://github.com/brandon-b-miller Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Ashwin Srinath (https://github.com/shwina) URL: #8767

janlugt · 2022-04-07T18:05:47Z

Ran into this today on pandas 1.4.1, so the bug still exists. My assumption would be that Float64 with pd.NA's works the same as a float64 with np.nan's, but I guess Float64 is still experimental, and that assumption is not always true.

ba05 · 2022-06-04T03:01:59Z

Ran in to this issue with pandas 1.4.2 when plotting with matplotlib. Had to cast the Series with .astype('float'). Original values were Float64 with some NaNs mixed in.

chukarsten · 2022-07-28T03:14:08Z

We also ran into this multiple times today!

tamargrey · 2023-02-15T16:45:14Z

Still running into this in pandas 1.5.3 with both Int64 and boolean dtypes - ValueError: Invalid fill method. Expecting pad (ffill) or backfill (bfill). Got linear

lrodriguezesc · 2023-06-14T21:38:27Z

Having this issue too, any news?

hadif1999 · 2023-07-03T20:46:19Z

I had the same issue when using method = "linear" . solved when I changed the dtypes to "float64" (not "Float64", in this case returned error). anyway it's a ridiculous bug.

lithomas1 · 2023-08-04T19:02:36Z

@jbrockmendel
Looks like we're almost there with your PR adding interpolate to ExtensionArray methods. I get this error now

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/thomasli/pandas/pandas/core/generic.py", line 8115, in interpolate
    new_data = obj._mgr.interpolate(
  File "/Users/thomasli/pandas/pandas/core/internals/base.py", line 265, in interpolate
    return self.apply_with_block(
  File "/Users/thomasli/pandas/pandas/core/internals/managers.py", line 355, in apply
    applied = getattr(b, f)(**kwargs)
  File "/Users/thomasli/pandas/pandas/core/internals/blocks.py", line 1469, in interpolate
    new_values = self.array_values.interpolate(
TypeError: ExtensionArray.interpolate() missing 1 required keyword-only argument: 'fill_value'

jbrockmendel · 2023-08-04T20:42:11Z

Definitely my fault. fill_value should be removed from the signature

mgreshake · 2023-08-30T13:52:40Z

@jbrockmendel Seems that fill_value is still part of the signature in version 2.1.0

jbrockmendel · 2023-08-30T15:55:15Z

A PR to fix it will be welcome.

mdruiter · 2024-11-20T11:32:24Z

Can be closed as fixed, right?

pspeter added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 5, 2021

jbrockmendel added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate NA - MaskedArrays Related to pd.NA and nullable extension arrays and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 24, 2021

mzeitlin11 mentioned this issue May 19, 2021

ENH: Better error message when interpolating dtype integer with method="linear" #41565

Open

brandon-b-miller mentioned this issue Jul 22, 2021

Linear Interpolation of nans via cupy rapidsai/cudf#8767

Merged

chukarsten mentioned this issue Aug 4, 2022

Woodwork 0.17.2 compatibility :( alteryx/evalml#3626

Merged

gsheni mentioned this issue Sep 9, 2022

Statically set woodwork typing in tests alteryx/evalml#3697

Merged

tamargrey mentioned this issue Feb 15, 2023

TimeSeriesImputer: Add nullable type handling for X and y when interpolate is impute method alteryx/evalml#4001

Closed

andrewgsavage mentioned this issue Sep 26, 2023

ExtensionArray.interpolate() method and tests #55297

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Interpolate over time does not work with Int64 or Float64 #40252

BUG: Interpolate over time does not work with Int64 or Float64 #40252

pspeter commented Mar 5, 2021

INSTALLED VERSIONS

janlugt commented Apr 7, 2022

ba05 commented Jun 4, 2022

chukarsten commented Jul 28, 2022

tamargrey commented Feb 15, 2023 •

edited

Loading

lrodriguezesc commented Jun 14, 2023

hadif1999 commented Jul 3, 2023 •

edited

Loading

lithomas1 commented Aug 4, 2023

jbrockmendel commented Aug 4, 2023

mgreshake commented Aug 30, 2023

jbrockmendel commented Aug 30, 2023

mdruiter commented Nov 20, 2024

BUG: Interpolate over time does not work with Int64 or Float64 #40252

BUG: Interpolate over time does not work with Int64 or Float64 #40252

Comments

pspeter commented Mar 5, 2021

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

janlugt commented Apr 7, 2022

ba05 commented Jun 4, 2022

chukarsten commented Jul 28, 2022

tamargrey commented Feb 15, 2023 • edited Loading

lrodriguezesc commented Jun 14, 2023

hadif1999 commented Jul 3, 2023 • edited Loading

lithomas1 commented Aug 4, 2023

jbrockmendel commented Aug 4, 2023

mgreshake commented Aug 30, 2023

jbrockmendel commented Aug 30, 2023

mdruiter commented Nov 20, 2024

Output of `pd.show_versions()`

tamargrey commented Feb 15, 2023 •

edited

Loading

hadif1999 commented Jul 3, 2023 •

edited

Loading