-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R] Add evaluation set and early stopping for xgboost()
#11065
Conversation
#' @param early_stopping_rounds If `NULL`, the early stopping function is not triggered. | ||
#' If set to an integer `k`, training with a validation set will stop if the performance | ||
#' doesn't improve for `k` rounds. Setting this parameter engages the [xgb.cb.early.stop()] callback. | ||
#' @param early_stopping_rounds Number of boosting rounds after which training will be stopped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a defined behavior for using multiple metrics or multiple evals in R? In Python, the last metric and the last validation dataset is used for early stopping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the same in R. Updated the docs.
@@ -512,6 +516,9 @@ check.can.use.qdm <- function(x, params) { | |||
return(FALSE) | |||
} | |||
} | |||
if (NROW(eval_set)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this imply? If eval_set
has a valid number of rows then we can't use qdm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because it then slices the DMatrix that gets created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the slicing need to happen after the DMatrix is created?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because otherwise there'd be issues with things like needing to make sure categorical 'y' and features have the same encodings between the two sets, objects from package Matrix
returning a different class when they are sliced, and so on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, so from your perspective the DMatrix is more suitable for slicing than built-in classes...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are planning to work on CV with shared quantile cuts for improved computation performance. (Sharing the quantiles between QDM folds). It's a minor information leak but can significantly increase performance, especially with external memory.
As a result, I have to consider how this can be implemented. If we double down on the DMatrix slicing, it will prevent us from the optimization. It's very unlikely that we can slice an external memory DMatrix. Also, the slice method in XGBoost is quite slow and memory inefficient.
I can merge this PR as it's, but I think we might have more troubles when applying the optimization for CV.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the alternative would be to:
- Restructure the code such that 'x' and 'y' get processed earlier.
- Make it so that characters in 'x' get converted to factor before passing them to
xgb.DMatrix
, and so thatdata.frames
get their subclasses (likedata.table
) removed beforehand. - Add a custom slicer for
Matrix
classes. Either that, or additional castings of storage format, or pull an extra dependency for efficient slicing.
But it'd end up being inefficient either way. The moreso considering that on the R side, the slicing would happen with a vector of random indices on one of the following:
- A column-major matrix of 8-byte elements (likely slower than a CSR).
- A list of vectors (what a dataframe is behind the scenes).
- A CSC matrix, which first will get converted to that format from CSR (
Matrix
doesn't slice CSR directly), and the slice getting converted to COO.
It could in theory be more efficient to do the slicing in R for base matrix
classes, but probably not so much for the others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any suggestion for the future implementation of CV optimization previously mentioned? It's designed for the Qdm and the external memory version of Qdm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good and quite helpful for the CV function indeed. But I don't think it's very relevant here, since unlike xgb.cv
, (a) it's only doing two slicing operations and only one quantization/binning, (b) it doesn't accept an arbitrary xgb.DMatrix
- instead, it creates it internally from a small subset of allowed classes.
Hence, it doesn't need to consider special cases like external memory or distributed mode, and there isn't too much room for improvement in terms of speed savings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. We will have a second cv function in the future for high-level inputs (like data.table and iterator etc).
Just realized that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will merge after the CI is back online.
ref #9810
This PR adds the evaluation set and early stopping functionality to the new
xgboost()
function.Due to the data processings that this new interface does (e.g. encoding of factors for both 'x' and 'y'), it only allows specifying the evaluation data as a subset of the x/y data, either as a random fraction or by selected row indices, as otherwise taking lists of data would require a lot of hassle with encodings, variable reorderings, and so on.