-
Notifications
You must be signed in to change notification settings - Fork 546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic implementation of OrdinalEncoder
.
#5646
Basic implementation of OrdinalEncoder
.
#5646
Conversation
- Implement `OrdinalEncoder`. - Implement dask version. - Fix dask transformers with DataFrame input by using `dask_cudf` to construct return df.
I think the failure with dask is not related to this PR. |
I don't think the value should be ignored, but I am not sure how consistently it is being used within the dask implementation. To the best of my understanding, it is implemented correctly here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really only found some superficial issues and would like to see one more test to explicitly test array inputs. Other than that, I ran some local tests to check whether the output_types are actually respected and it seems to work as expected. Nice job!
I think an updated cudf is causing the device ordinal error:
|
After the first review, I picked some fixes from |
This is ready for another review. :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for addressing my comments!
/merge |
OrdinalEncoder
.dask_cudf
to construct return df.Some other scikit-learn features are not available yet, for instance,
encoded_missing_value
,min_frequency
, andmax_categories
.The implementation is mostly based on the existing one hot encoder and label encoder.
I'm a bit confused by the
output_type
parameter and not sure how strictly it's enforced. I looked around, it seems some estimators can ignore this parameter in their returns. Would be great if there's a guideline on how to handle this parameter, along with #5645 .Close #4456 .