-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle new oversampler nullable type incompatibility in X #4068
Conversation
Codecov Report
@@ Coverage Diff @@
## main #4068 +/- ##
=======================================
+ Coverage 99.7% 99.7% +0.1%
=======================================
Files 349 349
Lines 37559 37583 +24
=======================================
+ Hits 37441 37465 +24
Misses 118 118
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
dccb304
to
219d78c
Compare
), | ||
], | ||
) | ||
def test_oversampler_category_dtype_incompatibility( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually the same bug as the nullable type incompatibility, and it's why we convert categorical columns to "object" in base_sampler. There is a fix in the works scikit-learn/scikit-learn#25814 (because of the wonderful @thomasjpfan who has been a key contributor to our ability to support nullable types so widely), so I wanted to add this xfailing test to remind us to also remove that logic at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eccabay converting caegory
to object
was a bug fix of yours from a while back, so I just want to make sure you get a chance to take a look at this test in case I'm missing something here.
Note - this doesn't need to get in for the upcoming release |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a solid approach to me. Thanks for being so thorough in the documentation of the tests and what we need to undo when it succeeds.
38671f8
to
70b7935
Compare
70b7935
to
19e53a2
Compare
closes #3974
This incompatibility only shows up when string categorical columns are present with the nullable pandas dtypes, but I chose not to add a special
_handle_nullable_types
implementation since it's dataset dependent vs the time series imputer that needed a different handling but that would be applied to any dataset. This is a bit of a gray area, though, so I'm open to changing it.