Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oversampler: Add nullable type handling for nullable y #3974

Closed
tamargrey opened this issue Feb 2, 2023 · 3 comments · Fixed by #4046 or #4068
Closed

Oversampler: Add nullable type handling for nullable y #3974

tamargrey opened this issue Feb 2, 2023 · 3 comments · Fixed by #4046 or #4068
Assignees

Comments

@tamargrey
Copy link
Contributor

The following block of code will raise the ValueError: Unknown label type: 'unknown' error.

    import woodwork as ww
    X, y = X_y_binary
    y = ww.init_series(y, logical_type="BooleanNullable")
    sn = Oversampler()
    _ = sn.fit_transform(X, y)

This will not currently be seen in automl search because of the replace nullable types component, but we should consider adding nullable handling into the component class itself so that it can independently support nullable types.

Note this is likely related to #3923, #3922 , and #3910 , which all stem from the inability of sklearn's type_of_target to assign a proper type to nullable data

@tamargrey
Copy link
Contributor Author

Not fixed by updating to sklearn 1.2.1

@tamargrey
Copy link
Contributor Author

As part of implementing component-specific handling for the Oversampler, we need to remove the nullable type logic in the BaseSampler's _prepare_data.

Also worth noting - this wasn't even maintaining woodwork types causing us to rerun type inference, which would be unnecessary computation and potentially cause a bug if we lost some column types that were influencing the type of sampler we chose.

@tamargrey tamargrey changed the title Oversampler can raise ValueError: Unknown label type: when nullable y passed in Add nullable type handling for nullable y to Oversampler Feb 17, 2023
@tamargrey tamargrey changed the title Add nullable type handling for nullable y to Oversampler Oversampler: Add nullable type handling for nullable y Feb 17, 2023
@tamargrey
Copy link
Contributor Author

The Oversampler's nullable type incompatibility is fixed by upgrading to sklearn 1.2.2, but we should still rmeove the nullable type logic that is now doubly unnecessary in _prepare_data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant