Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Making R interface more idiomatic #7906

Closed
david-cortes opened this issue May 15, 2022 · 3 comments · Fixed by #11041
Closed

[RFC] Making R interface more idiomatic #7906

david-cortes opened this issue May 15, 2022 · 3 comments · Fixed by #11041

Comments

@david-cortes
Copy link
Contributor

I notice that there is a version 2.0 of xgboost in the plans, which among other things, is expected to include support for categorical features in the R interface.

Given that this is a major version release and as such is expected to introduce potentially breaking changes, I think this is a good opportunity to make the R interface more in line with base R and core/popular R modeling packages. Many people (including myself) find the R interface of xgboost to be inconvenient and unidiomatic, but changing the interface for xgboost() from its current state would be a rather big breaking change and would probably break lots of user scripts that depend on xgboost().

In short, xgboost() does not work with the most common data types used in R (data.frame) and does not follow R conventions in terms of e.g. function arguments. For people who are familiar with base R and with other R packages, there are many ways in which the R interface of xgboost could be improved for a better end-user experience, such as:

  • Offering an x/y interface as well as a formula interface.
  • Accepting data frames as inputs and handling categorical/factor variables from data frames.
  • Accepting factor variables as "y".
  • Accepting non-standard evaluation for column names (e.g. passing the weight variable as a column name without quotes).
  • Using base-1 numeration for integers as R does instead of base-0.
  • Controlling prediction types through a type argument.
  • Making the naming of function arguments more consistent with base R and core packages - for example, naming the weights as weights instead of weight, like base R does.
  • Changing default arguments by, for example, not dumping the model to a file in disk by default.

Among many others.

Would this project accept big breaking PRs for the R interface (particularly for xgboost() and predict.xgb.Booster()) for the 2.0 release that would make it more similar to base R and other R packages?

@RAMitchell
Copy link
Member

I don't think any current active maintainers are big R users so we welcome input. Could we just build a new interface behind a different namespace until it's ready? I don't think there's a need to immediate replace the old interface in a short space of time.

@trivialfis
Copy link
Member

Would this project accept big breaking PRs for the R interface (particularly for xgboost() and predict.xgb.Booster()) for the 2.0 release that would make it more similar to base R and other R packages?

I would like to welcome these changes. The concern about breaking changes can be handled by running reverse dependency checks.

@mayer79
Copy link
Contributor

mayer79 commented Jun 9, 2022

I suggest to keep xgboost() and predict() as they are and instead call the new functions differently, e.g. xgboost2() and predict2(). Too much code would break when changing the main functions.

Otherwise, great work @david-cortes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants