-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error on a pipeline with OneHotEncoder and xgboost #22
Comments
First of all - what is your XGBoost package version? If you upgrade to XGBoost 1.5.X or newer, then you shall be able to utilize XGBoost's new native One-Hot-Encoding (OHE) support. It's much more memory efficient than dealing with an external Even better, you might consider upgrading to XGBoost 1.6.X or newer, and you shall be able to utilize XGBoost's new native multi-category categorical splits. So, please upgrade your XGBoost package (and the SkLearn2PMML package as well!) to the latest, and simplify your Scikit-Learn pipeline to the following: mapper = DataFrameMapper(
[(col, None) for col in numerical_cols] +
[([col], None) for col in categorical_cols]
)
Just a sidenote - Scikit-Learn is willing to fit all kinds of pipelines, without checking if the sequence of computational steps makes any sense or not. For as long as your "number of columns" is good, you'll be getting predictions. However, the Scikit-Learn to PMML converter tries to understand the logic of each computational step. Therefore, if something does not make sense to it, it'll complain (eg. by raising an exception). You should heed to those complaints, and try to make your pipeline more information-rich.
Looks like the converter was unable to figure out the list of category values for some categorical feature. Internal note - it's interesting that the converter is complaining about a missing Could it be that your dataset contains a column with a You can make your pipeline more robust by collecting and storing category values using SkLearn2PMML domain decorator classes: from sklearn2pmml.decoration import CategoricalDomain, ContinuousDomain
mapper = DataFrameMapper(
[(col, ContinuousDomain()) for col in numerical_cols] +
[([col], CategoricalDomain()) for col in categorical_cols]
) At minimum, this should give you a different, more informative error. |
Leaving this issue open as a reminder to improve error diagnostics in this area. The current Java exception is void of any debugging information, because it is raised for a condition which is supposed to never trigger (a required attribute has not been set in JPMML-Converter library stack). |
Hello,
I trained a PMMLPipeline with OneHotEncoder and XGBClassifier using the following code snippet.
The pipeline seemed to work and I was able to use it to do predictions.
But I got an error when I tried to turn the pipeline into a pmml file
sklearn2pmml(pipeline, "testing.pmml", with_repr=True)
Can someone give me some advice on what I might have done wrong? Thanks.
The text was updated successfully, but these errors were encountered: