Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for DaskXGBClassifier to PMML #1

Open
apoorv22 opened this issue Apr 15, 2020 · 3 comments
Open

Support for DaskXGBClassifier to PMML #1

apoorv22 opened this issue Apr 15, 2020 · 3 comments

Comments

@apoorv22
Copy link

Is there a way currently or a possible plan in future to convert a DaskXBGClassifier to PMML

@vruusmann
Copy link
Member

vruusmann commented Apr 15, 2020

Where does this class come from? Is it from the Dask-ML project, or the Dask-XGBoost project?

In either case, there's probably a thin Python wrapper around the native XGBoost binary model file. If the former can be saved to the pickle file, then conversion should be trivial.

@apoorv22 Can you share a reproducible example about how you're training an XGBoost model using Dask? Specifically, how is the feature matrix defined/composed - I assume there's no Scikit-Learn pipeline involved.

@apoorv22 apoorv22 changed the title Support for DaskXBGClassifier to PMML Support for DaskXGBClassifier to PMML Apr 19, 2020
@apoorv22
Copy link
Author

Hi @vruusmann ,
This class is from dask-ml Refer this

Here is a reproducible example

from distributed import Client
from sklearn2pmml import PMMLPipeline, make_pmml_pipeline, sklearn2pmml
from xgboost.dask import DaskXGBClassifier
import pandas as pd
import dask.dataframe as dd

client = Client('localhost:9787')
df = pd.read_csv('/home/user/gender_voice_scikit_label_dataset.csv')

col_names = list(df)
dependant_var = 'label'
df = df[col_names]
col_names.remove(dependant_var)
ddf = dd.from_pandas(df,npartitions=-1)
xgb_model = DaskXGBClassifier(max_depth=1, learning_rate=0.1, n_estimators=100,
                              verbosity=1, objective='binary:logistic', booster='gbtree', n_jobs=1,
                              nthread=None, gamma=0, min_child_weight=1, max_delta_step=0,
                              colsample_bytree=1, colsample_bylevel=1, subsample=1,
                              reg_alpha=1, reg_lambda=0,
                              random_state=29, seed=29, missing=None)

# Training
model = xgb_model.fit(X=ddf, y=ddf[dependant_var])
ppl = PMMLPipeline([
    ('classifier', model)])

try:
    pipeline = make_pmml_pipeline(ppl, active_fields=col_names, target_fields=[dependant_var])
    sklearn2pmml(pipeline, '/home/user/scr.pmml', debug=True)
except Exception as e:
    print(e)

Dataset
gender_voice_scikit_label_dataset.zip

@vruusmann vruusmann transferred this issue from jpmml/sklearn2pmml May 30, 2020
@noahisch
Copy link

Have you tried converting the DaskXGBClassifier to an XGBClassifier and then converting?
dmlc/xgboost#6547

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants