[RFC] Making R interface more idiomatic #7906

david-cortes · 2022-05-15T18:56:29Z

I notice that there is a version 2.0 of xgboost in the plans, which among other things, is expected to include support for categorical features in the R interface.

Given that this is a major version release and as such is expected to introduce potentially breaking changes, I think this is a good opportunity to make the R interface more in line with base R and core/popular R modeling packages. Many people (including myself) find the R interface of xgboost to be inconvenient and unidiomatic, but changing the interface for xgboost() from its current state would be a rather big breaking change and would probably break lots of user scripts that depend on xgboost().

In short, xgboost() does not work with the most common data types used in R (data.frame) and does not follow R conventions in terms of e.g. function arguments. For people who are familiar with base R and with other R packages, there are many ways in which the R interface of xgboost could be improved for a better end-user experience, such as:

Offering an x/y interface as well as a formula interface.
Accepting data frames as inputs and handling categorical/factor variables from data frames.
Accepting factor variables as "y".
Accepting non-standard evaluation for column names (e.g. passing the weight variable as a column name without quotes).
Using base-1 numeration for integers as R does instead of base-0.
Controlling prediction types through a type argument.
Making the naming of function arguments more consistent with base R and core packages - for example, naming the weights as weights instead of weight, like base R does.
Changing default arguments by, for example, not dumping the model to a file in disk by default.

Among many others.

Would this project accept big breaking PRs for the R interface (particularly for xgboost() and predict.xgb.Booster()) for the 2.0 release that would make it more similar to base R and other R packages?

The text was updated successfully, but these errors were encountered:

RAMitchell · 2022-05-15T20:45:16Z

I don't think any current active maintainers are big R users so we welcome input. Could we just build a new interface behind a different namespace until it's ready? I don't think there's a need to immediate replace the old interface in a short space of time.

trivialfis · 2022-05-16T03:34:38Z

Would this project accept big breaking PRs for the R interface (particularly for xgboost() and predict.xgb.Booster()) for the 2.0 release that would make it more similar to base R and other R packages?

I would like to welcome these changes. The concern about breaking changes can be handled by running reverse dependency checks.

mayer79 · 2022-06-09T10:27:21Z

I suggest to keep xgboost() and predict() as they are and instead call the new functions differently, e.g. xgboost2() and predict2(). Too much code would break when changing the main functions.

Otherwise, great work @david-cortes.

trivialfis added the feature-request label May 19, 2022

david-cortes mentioned this issue May 26, 2022

[R] Use type argument to control prediction types #7947

Closed

trivialfis added the type: r-package label Oct 31, 2022

trivialfis mentioned this issue Nov 3, 2022

Address all lintr warnings. #8012

Closed

david-cortes mentioned this issue Dec 3, 2024

[R] Add predict method for new xgboost() #11041

Merged

trivialfis closed this as completed in #11041 Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Making R interface more idiomatic #7906

[RFC] Making R interface more idiomatic #7906

david-cortes commented May 15, 2022

RAMitchell commented May 15, 2022

trivialfis commented May 16, 2022

mayer79 commented Jun 9, 2022 •

edited

Loading

[RFC] Making R interface more idiomatic #7906

[RFC] Making R interface more idiomatic #7906

Comments

david-cortes commented May 15, 2022

RAMitchell commented May 15, 2022

trivialfis commented May 16, 2022

mayer79 commented Jun 9, 2022 • edited Loading

mayer79 commented Jun 9, 2022 •

edited

Loading