-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Blocked] Use Scrub for data cleaning #218
base: main
Are you sure you want to change the base?
Conversation
Fixes issue #138: NA handling in text columns - Add skrub>=0.3.0 dependency to handle mixed string/NA data - Integrate TableVectorizer in TabPFNClassifier to properly process text columns with NA values - Add test to verify the solution works as expected
Okay we encountered problem, skrub 0.3.0 requires scipy 1.9.3 which isn't compatible with TabPFN |
Does it fail without |
I've simplified the implementation to only rely on TableVectorizer without needing the extra function. Also bumped scikit-learn minimum version to 1.2.1 for compatibility with skrub. Note that scikit-learn 1.2.1 was released in January 2023, so it's still more than 2 years old and should be a reasonable dependency. Same for pandas 1.5.3. |
Fix #138: NA handling in text columns
Fix #163
Summary
Test plan