-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] convert dask -> non-dask model #6547
Comments
@pseudotensor I don't see any technical hurdle in converting Dask model to an ordinary XGBoost model JSON file. We already have a callback function to serialize model at every boosting iteration, and the callback should work with Dask XGBoost: xgboost/tests/python/test_with_dask.py Lines 1017 to 1033 in 2231940
|
Could you please share the script that prediction is different? |
Prediction difference is serious bug, so far I haven't been able to reproduce it. But if you have a MRE I will not hesitate to fix it and push another patch release. |
Yes, @trivialfis , I'm not saying exactly that the predictions are off, just the way we access the tree structure via the internal format leads to very different predictions between dask and non-dask. I'll get a repro of some kind ASAP. |
To answer the original question, the booster returned by dask train function is exactly the same with single node. But if you want to convert the pickled model between skl interfaces of dask and single node, I don't think that's possible at this point. They are different Python classes and pickle stores Python bytecode. |
I think it was deduced that the primary problem here is that the dask model does not use ntree_limit so predictions can be off if trying to actually select the best_iterations model. If that is solved, this issue is no longer needed, so I'll close. Thanks! |
Actually, re-opening. dask is heavily feature incomplete, esp. for scikit-learn API. But most critically dask cannot do pred_contribs etc. So would be good to be able to convert to non-dask so can apply normal operations during prediction time. @hcho3 , thanks, I'll try your suggestion. |
Also, the non-sklearn dask API also fails to work for pred_contribs etc. even though supposed to work, so this also becomes important for trying to have that working. |
FYI this seems to work:
and other variations of this work too, e.g. going to sklearn model and having it load the booster:
|
Could you please help taking a look into #6582 ? |
rapidsai/cuml#3140 (comment)
Related to this general issue, I think xgboost team does a much better job and ensures dask can be pickled etc.
But I encountered some differences between non-dask and dask trees that makes reading them incompatible. Specifically we have poor accuracy when trying to read the tree structure to do predictions, but only for dask models.
I know xgboost can be used with treelite etc. Is it expected that dask models can be used too?
In general, if there is any complication, is there a way to convert a dask model (sckit and raw API) into non-dask form so non-dask tools can be used?
@trivialfis
The text was updated successfully, but these errors were encountered: