-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R] Add predict method for new xgboost()
#11041
Conversation
R-package/R/xgboost.R
Outdated
#' Output will be a numeric matrix of shape `[nrows, nfeatures+1, nfeatures+1]`, or shape | ||
#' `[nrows, nscores, nfeatures+1, nfeatures+1]` (for objectives that produce more than one score | ||
#' per observation). | ||
#' - `"approxinteraction"`: similar to `"interaction"`, but uses a fast approximation for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the approxcontrib
method was added at the time to demonstrate that the traditional approach has a bias, whereas SHAP doesn't. It's not implemented as an optimization. If I am not mistaken, it's mostly for academic reasons to have a comparison.
We are considering whether it should continue to exist, and we definitely don't want to "advertise" it as a faster alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the approx options.
R-package/R/xgb.Booster.R
Outdated
@@ -226,7 +234,7 @@ xgb.get.handle <- function(object) { | |||
#' - For normal predictions, the dimension is `[nrows, ngroups]`. | |||
#' - For `predcontrib=TRUE`, the dimension is `[nrows, ngroups, nfeats+1]`. | |||
#' - For `predinteraction=TRUE`, the dimension is `[nrows, ngroups, nfeats+1, nfeats+1]`. | |||
#' - For `predleaf=TRUE`, the dimension is `[nrows, niter, ngroups, num_parallel_tree]`. | |||
#' - For `predleaf=TRUE`, the dimension is `[nrows, niter * ngroups * num_parallel_tree]`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this changed from array to matrix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not reshaped internally. It now takes the shape as it comes from the C interface, and that's how it currently comes. Before those changes (in a previous PR), it was applying its own shaping logic and that's how it used to be, but the docs had not been updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@david-cortes Do you think the strict_shape
parameter is better? (It makes the prediction output 4-dim array from the C API for predleaf)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it'd be nicer to have the output with 3 or more dimensions for leaf predictions from vector-valued trees in predict.xgboost
- will update it - but I wouldn't say that having arrays with empty dimensions would be preferrable for cases where the output is actually 2D, like binary classification.
For predict.xgb.Booster
, I'd nevertheless prefer it to stick to what the C interface outputs, so that it'd behave the same as the other language bindings and so that changes from the C side would not require changes in the R logic.
Considering that the next release will be a major version update (3.0), perhaps it could change the logic from the C side to output a 3D array for vector-valued tree leafs instead? It currently does add those extra dimensions for contributions and interactions for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hidden in the strict_shape
parameter of the predict
method. It's available for all prediction methods (csr, DMatrix ...). We can make it default for the R package if needed.
But yes, it might output an empty dimension, it's easier to index from my perspective, don't have to worry about the details of the model (I have a fear of array indexing with conditionally checking and calling np.squeeze
, reshape(-1)
, np.newaxis
etc, always wish the input for my code to have the deterministic shape).
This is opinionated but I'm open to change. Would love to learn your experience on when varying the dimension based on the model makes things easier or less confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to keep strict_shape=FALSE
as the default. I find it confusing to get arrays with dimensions that are not applicable to the given model type that one is using.
strict_shape=TRUE
would also not be in line with how other packages behave - for example, VGAM determines the output dimensions conditionally on the family
argument that the model used, returning 1D for binary classification and 2D for multi-class clasification.
Although it looks like reshaping higher-dim arrays to 2D might actually be more common - taking VGAM again as a reference, it squeezes to 2D when predicting type="terms"
for multinomial logistic, for example, and same for other packages like MGLM. I'd nevertheless still prefer to keep XGBoost's outputs as higher-dim arrays when they represent multiple dimensions.
Changed it to produce multi-dimensional outputs for leaf predictions and updated the docs accordingly, for both |
ref #9810
closes #7947
closes #7935
closes #7906
This PR adds a predict method for the
xgboost
model class as produced by the redesignedxgboost()
function.It's mostly just a wrapper over the old
predict.xgb.Booster
, with the difference that:type
argument to control which prediction type to make.Along the way, it also adds docs for
print.xgboost
which were missing it, and makes some corrections over the docs of outputs frompredict.xgb.Booster
to reflect current behavior.