-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using MultiLabelBinarizer #79
Comments
I don't quite understand the inner workings of I'd be happy to introduce |
Basically what i need is to transform a column with an iterate in it to a k-hot-encoding type mapping.
That prints:
And you can than easily use some sklearn classifier from there. |
Thanks - I think I've got the basic idea of In a nutshell, "iterColumn" is a collection-type feature/column, and the Collection-type features are a bit problematic from the PMML perspective, because it (typically-) operates with scalar-type features only. I guess the same "features should be scalars" limitation applies to the Scikit-Learn framework as well. You can have collection-type features in the incoming dataset, but you must transform them to scalar-type features in the very beginning of your Scikit-Learn pipeline. Will need to think about possible technical solutions. I could probably introduce collection-type feature support into JPMML-family of software pretty easily, but it would be pretty difficult to get it approved by DMG.org (that is responsible for maintaining the PMML standard). |
Coming back to your original question - how to deal with columns with multiple categorical features - then the temporary workaround would be to employ the following two-stage workflow:
SkLearn2PMML/JPMML-SkLearn is currently able to handle the second stage. You would need to maintain a separate Python/Java solution for handling the first stage. Despite the bad situation/outlook, let's keep this issue open - will remind me to think more about it. |
Another issue, where the original dataset contains collection-type features: jpmml/jpmml-sklearn#62 |
Thank for the quick response! |
@IdoZehori i met the same problem ,could you tell me how did you finally deal with this problem |
Hey,
The problem I've encountered is when trying to perform k-hot-encoding with sklearns MultiLabelBinarizer and got the following error.
how do you suggest dealing with columns with multiple categorical features?
The text was updated successfully, but these errors were encountered: