Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xgb.cv which round of prediction=TRUE #1188

Closed
jacquespeeters opened this issue May 11, 2016 · 6 comments
Closed

xgb.cv which round of prediction=TRUE #1188

jacquespeeters opened this issue May 11, 2016 · 6 comments

Comments

@jacquespeeters
Copy link

jacquespeeters commented May 11, 2016

Hey,

I'm using R and i used prediction=TRUE with xgb.cv()

However it isn't clear in the documentation on which round are the prediction are done. Basicly, are the predictions returned based:

  • on the last iteration (which is under or over fitting)
  • on the best iteration

I wasn't the single one wondering (#92) but i couldn't find any answer.

I investigated directly the code of the function https://github.com/dmlc/xgboost/blob/master/R-package/R/xgb.cv.R but i couldn't manage to have a clear answer by myself.

Therefore I did a local test and discovered that the predictions gave me the result (based on my metric) of the last iteration! Seems to me that this parameter is extremely dangerous to use and misleading. I don't know if it was intended or not (can understand that stocking predictions at each step is quite memory greedy).

Best regards,
Jacques

@khotilov
Copy link
Member

Yes, there is a bug which causes the predictions to be returned for the last round even when early.stop.round is specified, and I can fix that.

Note, though, that the approach of how the "early stopping" is implemented in xgb.cv is not really a standard CV procedure for a "model with early stopping". It's main practical use, as I see it, is to get a reasonable estimate for the number of iterations by using only a training sample. So, think carefully about whether and how the predictions from it could be used in your situation.

@jacquespeeters
Copy link
Author

What's the idea in order to solve it?

Keeping the old predictions in memory doesn't seem to be a good idea. Is it possible to train the model on let say 25 iterations, and only use the 15 first iterations of it then make the predictions of the true early stopping?

Yeah i use "early stopping" implementation in xgb.cv only to determine a good number of total iterations. However keeping the predictions might be useful in order to plot ROC curves or use'em for a metaClassifier. Because training once again the model with the right number of iteration n order to have the corrects predictions is a bit annoying and time consuming.

@jacquespeeters
Copy link
Author

Wow, same thing happens with xg.train(...) when there is a validation dataset.

I trained my model with a validation dataset. Predicted it and it gave me the result of the last iteration. Same behaviour as xgb.cv. And it is not intuitive at all!

Solving it would be ideal but don't know what was your idea and if it is hard or not. However stating it in the manual would be a good start.

@khotilov
Copy link
Member

khotilov commented Jun 4, 2016

With xgb.train, it's not the "same behaviour as xgb.cv". The xgb.train's result is a model which you would need to use with predict with the needed ntreelimit parameter to obtain the predictions. What's not intuitive about that? It is kind of what xgb.cv was supposed to do for you under the hood for its folds, but it wasn't taking the proper ntreelimit for predictions when early stopping was on. I am close to completing the R-interface refactoring to use callbacks #892 and it will be fixed there.

By the way, if you are using xgb.cv with early stopping for evaluating the model performance, you might be getting overly optimistic results. As I've mentioned before, it's good for estimating the number of iterations parameter, but it wouldn't be a proper performance evaluation, since the folds become somewhat "aware" of each other through the inter-folds optimum search that is used by the early stopping.

@jacquespeeters
Copy link
Author

I sometimes use xg.train for finding the right hyper-parameters. When the training set is really big, instead of making a 5 folds CV and therfore trainign 5 models at the same time, xg.train + a validation dataset is stable enough.

I meant that it wasn't intuitive that the model returned by xg.train is not the best one, but the one of the last iteration, save about keeping predictions with xgb.cv. It is dangerous if users poorly use XGBOOST, let's say the best iteration is around 50 and early_stopping=500.

I do agree with the fact that using xgb.cv for evaluating the model performance is a bit overly optimistic. However it seems to me to be "the less worst ideas" for finding hyper-parameters.

I wasn't aware about #892 , my bad. All my respects to you, your work is greatly appreciated.

@JoshuaC3
Copy link

JoshuaC3 commented Dec 8, 2016

Will prediction=TRUE with xgb.cv() be a python option as well at some point?

I have this working to print out the prediction through a callback but cannot get it to return it as a variable. How do I go about saving it as a variable?

@tqchen tqchen closed this as completed Jul 4, 2018
@lock lock bot locked as resolved and limited conversation to collaborators Oct 25, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants