You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The following code will attempt to use the mean and median strategies with boolean data, which converts the values to floats and then imputes whatever the mean and median of the data is (which may very well be a floating point value that cannot then be converted back to BooleanNullable as the SimpleImputer currently attempts to do). Note, this is not reachable from AutoMLSearch currently, as the Imputer component keeps this from happening.
importwoodworkaswwfromevalml.pipelines.componentsimportSimpleImputerimportpandasaspdforstrategyin ["mean", "median"]:
X_train=pd.DataFrame(
{
"fully_bool": pd.Series([True, False, True, True, True] ),
"one_nan": pd.Series([True, False, pd.NA, False, True] ),
},
)
X_train.ww.init(
logical_types={
"fully_bool": "Boolean",
"one_nan": "BooleanNullable",
},
)
imp=SimpleImputer(
impute_strategy=strategy,
)
imp.fit(X_train)
withpytest.raises(ww.exceptions.TypeConversionError, match="Error converting datatype for one_nan from type object to type boolean."):
imp.transform(X_train)
We should handle this situation. We have several options for how to do this:
Explicitly disallow "mean" and "median" strategies for boolean values in the simple imputer - this would require adding logic that is, I assume, the reason we have a separate Imputer component in the first place
Implicitly disallow "mean" and "median" strategies for boolean data in the simple imputer. Note in the docstring the limitations. This might also be a good time to make it more clear that this component expects all columns to be of the same type.
Change those columns' types to Doubles in the new_schema prior to initializing woodwork like we do with IntegerNullable to Double. This doesn't make so much sense to me, as it implies a continuous relationship between boolean values, which doesn't make much sense to me, but if there's a use case for this that I'm missing, we can consider this.
The text was updated successfully, but these errors were encountered:
The following code will attempt to use the
mean
andmedian
strategies with boolean data, which converts the values to floats and then imputes whatever the mean and median of the data is (which may very well be a floating point value that cannot then be converted back to BooleanNullable as the SimpleImputer currently attempts to do). Note, this is not reachable from AutoMLSearch currently, as theImputer
component keeps this from happening.We should handle this situation. We have several options for how to do this:
Imputer
component in the first placenew_schema
prior to initializing woodwork like we do with IntegerNullable toDouble
. This doesn't make so much sense to me, as it implies a continuous relationship between boolean values, which doesn't make much sense to me, but if there's a use case for this that I'm missing, we can consider this.The text was updated successfully, but these errors were encountered: