Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TimeSeriesImputer should not allow interpolate as strategy for boolean or categorical targets #4053

Open
tamargrey opened this issue Mar 6, 2023 · 0 comments

Comments

@tamargrey
Copy link
Contributor

tamargrey commented Mar 6, 2023

Currently the target_impute_strategy is applied to any kind of target data, independent of whether or not the strategy makes sense for that kind of data. This is only problematic for the interpolate strategy, as the other two can be used with any data.

Interpolate, however, should only be used with numeric values. Data with the category dtype will raise an error from pandas, and data with boolean values, with the nullalble type handling, will become Double with floating point values imputed, which doesn't make sense (this was actually happening prior to the nulalble type handling as well).

We should consider either not allowing interpolate (in which case we could remeove y from the _integer_nullable_incompatibilities) to be used for non numeric data or using one other other interpolate methods listed https://pandas.pydata.org/docs/reference/api/pandas.Series.interpolate.html.

This will not be seen in AutoML search, because we use the default impute strategies in _make_component_list_from_actions, so interpolate will not be the target_strategy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant