Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Better error message when interpolating dtype integer with method="linear" #41565

Open
AartGoossens opened this issue May 19, 2021 · 6 comments
Labels
Enhancement Error Reporting Incorrect or improved errors from pandas Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate NA - MaskedArrays Related to pd.NA and nullable extension arrays

Comments

@AartGoossens
Copy link

Is your feature request related to a problem?

When trying to interpolate a column with dtype integer with method="linear" the error message is not very descriptive (example to reproduce at the bottom of this issue):

ValueError: Invalid fill method. Expecting pad (ffill) or backfill (bfill). Got linear

As "linear" is the default interpolation method I think the error message should be more descriptive, even though IntegerArray is still experimental according to the docs.

Describe the solution you'd like

I think the error message should include a hint to the potential cause that the array you're trying to interpolate is of dtype integer. Something like this could work:

ValueError: Invalid fill method. Expecting pad (ffill) or backfill (bfill). Got linear. Are you trying to interpolate an integer column?

I would be happy to create a PR with the improvement myself.

API breaking implications

As this only requires a change to the error message it does not break any API.

Describe alternatives you've considered

An alternative solution would be to implement linear interpolation for integer arrays but I suspect that that is not compatible/unexpected with some use-cases of integer arrays.

Additional context

Reproducable example:

import pandas as pd


s = pd.Series([1, None, 3], dtype=pd.Int64Dtype())

s.interpolate(method="linear")
@AartGoossens AartGoossens added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels May 19, 2021
@mzeitlin11
Copy link
Member

Thanks for the clear report @AartGoossens! There are currently some issues with interpolation with the new nullable data types (#40252) - like you say implementing interpolation should likely be the end goal, but for now a more explicit error message would be an improvement.

@mzeitlin11 mzeitlin11 added Error Reporting Incorrect or improved errors from pandas Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate NA - MaskedArrays Related to pd.NA and nullable extension arrays and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 19, 2021
@mzeitlin11 mzeitlin11 added this to the Contributions Welcome milestone May 19, 2021
@AartGoossens
Copy link
Author

@mzeitlin11 Does the "Contributions Welcome" milestone mean that it's OK if I would create a PR with the suggested change?

@mzeitlin11
Copy link
Member

Yep! (though even without a milestone contributions are always welcome :)

@MichaelTiemannOSC
Copy link
Contributor

I just got this message from an attempt to use linear interpolation with ExtensionArrays. I don't think the error message is the problem, rather the wholesale lack of anything behind the implementation of linear interpolation in version 1.4.0 for ExtensionArrays. linear interpolation works great for floats (despite the gratuitous error message) but for ExtensionArrays, the code just calls to fillna, which doesn't have a linear method. That's a big gap!

@tamargrey
Copy link

Still running into the error supposedly fixed by #40252 mentioned above in pandas 1.5.3 with Int64 and boolean dtypes - ValueError: Invalid fill method. Expecting pad (ffill) or backfill (bfill).

Any updates on if there are any plans to support nullable dtypes for interpolate?

@tamargrey
Copy link

Also running into this error when the category dtype is used. I expect this may be the time when this error is expected, as a linear interpolation of categorical data doesn't make sense to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Error Reporting Incorrect or improved errors from pandas Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate NA - MaskedArrays Related to pd.NA and nullable extension arrays
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants