-
Notifications
You must be signed in to change notification settings - Fork 550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support using CountVectorizer
& TfidVectorizer
in cuml.pipeline.Pipeline
#5034
Support using CountVectorizer
& TfidVectorizer
in cuml.pipeline.Pipeline
#5034
Conversation
Can one of the admins verify this patch? Admins can comment |
Pull requests from external contributors require approval from a |
ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for the contribution!
For some reason the Branch checker job is stuck in a couple of PRs, will work on solving that or admin merging later today |
CountVectorizer
& TfidVectorizer
in cuml.pipeline.Pipeline
CountVectorizer
& TfidVectorizer
in cuml.pipeline.Pipeline
@gpucibot merge |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## branch-23.02 #5034 +/- ##
===============================================
Coverage ? 81.83%
===============================================
Files ? 200
Lines ? 14892
Branches ? 0
===============================================
Hits ? 12187
Misses ? 2705
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
@gpucibot merge |
…Pipeline` (rapidsai#5034) It's not possible to use the vectorizers with a pipeline, because the pipeline calls the method `fit_transform` with 3 arguments and only 2 are supported. This fix is similar to the implementation in scikit-learn: https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/feature_extraction/text.py#L2095 And has already been implemented for the `HashingVectorizer`: https://github.com/rapidsai/cuml/blob/eeb4d7c47803ea8d4fe5f8bcfbc39394fd1b9bee/python/cuml/feature_extraction/_vectorizers.py#L867 Authors: - Lasse Hyldahl Jensen (https://github.com/lasse-it) Approvers: - Carl Simon Adorf (https://github.com/csadorf) - Dante Gama Dessavre (https://github.com/dantegd) - Victor Lafargue (https://github.com/viclafargue) URL: rapidsai#5034
It's not possible to use the vectorizers with a pipeline, because the pipeline calls the method
fit_transform
with 3 arguments and only 2 are supported.This fix is similar to the implementation in scikit-learn: https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/feature_extraction/text.py#L2095
And has already been implemented for the
HashingVectorizer
:cuml/python/cuml/feature_extraction/_vectorizers.py
Line 867 in 50716cf