Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] Bump requirement to 4.3.0. #9847

Merged
merged 5 commits into from
Dec 6, 2023
Merged

Conversation

trivialfis
Copy link
Member

@trivialfis
Copy link
Member Author

cc @david-cortes .

@david-cortes
Copy link
Contributor

Thanks for looking into this.

I don't think 4.3.2 is necessary though, just 4.3.

@jameslamb
Copy link
Contributor

I don't know the context of this PR (I'm sure I've missed a comment on one of the threads), so I apologize if I'm saying something that's already been considered but just in case... I don't think a version of {xgboost} with a requirement R (>= 4.3) could be released to CRAN currently.

r-oldrel on CRAN is still R 4.2.x.

See my comments in this related LightGBM discussion: microsoft/LightGBM#6213 (review).

@david-cortes
Copy link
Contributor

I don't know the context of this PR (I'm sure I've missed a comment on one of the threads), so I apologize if I'm saying something that's already been considered but just in case... I don't think a version of {xgboost} with a requirement R (>= 4.3) could be released to CRAN currently.

r-oldrel on CRAN is still R 4.2.x.

See my comments in this related LightGBM discussion: microsoft/LightGBM#6213 (review).

There's no requirement to have the latest version of a package running under oldrel. As an example, this package depends on R >= 4.3 and doesn't get any issue with the checks, which are running on an older version of the package for oldrel:
https://cran.r-project.org/web/packages/outliertree/index.html

@jameslamb
Copy link
Contributor

Oh interesting! I always thought it was necessary that packages on CRAN pass checks on all of these: https://cran.r-project.org/web/checks/check_flavors.html

Sorry for being a distraction, and thanks for teaching me something new 😊

@trivialfis
Copy link
Member Author

Need to spend some time on Windows to get MSVC to work with R 4.3. Hopefully can find a workaround somehow.

https://github.com/dmlc/xgboost/actions/runs/7096217610/job/19314372806?pr=9847

@david-cortes
Copy link
Contributor

Curious that it fails like that. The offending header has this disclaimer:

This definition uses an anonymous structure, which is defined in C11 (but
not C99). It is, however, supported at least by GCC, clang and icc. The
private_data_c member should never be used in code, but tells the compiler
about type punning when accessing the .r and .i elements, so is safer to use
when interfacing with Fortran COMPLEX*16 or directly C99 _Complex double
(PR#18430).

Is it perhaps a matter of using a different C++ standard?

@jameslamb
Copy link
Contributor

syntax error: missing ';' before identifier 'private_data_c' 

^ We encountered this in LightGBM when upgrading to R4.3 in that project's CI.

See microsoft/LightGBM#6061

We fixed it by adding -DR_LEGACY_COMPLEX for MSVC-based builds.

https://github.com/microsoft/LightGBM/blob/f5b6bd60d9d752c8e5a75b11ab771d0422214bb4/CMakeLists.txt#L386-L392

@trivialfis
Copy link
Member Author

Ah! Thank you both for looking into this, it has been most helpful!

@trivialfis trivialfis changed the title [R] Bump requirement to 4.3.2. [R] Bump requirement to 4.3.0. Dec 6, 2023
@trivialfis
Copy link
Member Author

Hi, please help review the PR when you are available.

Copy link
Contributor

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that you want to move to R >= 4.3.

The fix to the if args.compiler == "msvc" block in test_r_package.py especially looks like it might fix some test coverage that's accidentally not been working for a while (maybe those builds were falling back to MinGW?).

But could you link me to the discussion that led to wanting to enforce R >= 4.3 in DESCRIPTION? Sorry if I missed it, I know there's been a lot of activity on the R package here recently.

To the best of my understanding, making that change will have at least 2 negative side effects I can think of:

  • forcing all of {xgboost}'s strong reverse dependencies to either drop support for older version of R or move their {xgboost} support from from strong (Imports, Depends) to conditional (Suggests)
    • if they want to use newer versions of {xgboost}
  • preventing anyone using R < 4.3 from even installing the newer versions of the library

For example {shapviz} (link) and {modeltime} (link) currently have strong dependencies on {xgboost} and support R >= 3.6.0.

R 4.3.x has only been out for 8 months at this point (R release history).

If these concerns have already been considered, I apologize for the extra noise and for slowing you down unnecessarily.

@trivialfis
Copy link
Member Author

But could you link me to the discussion that led to wanting to enforce

This is required by the error handling work by @david-cortes in #9823 (comment) The upgrade was done in this PR: #9835

To the best of my understanding, making that change will have at least 2 negative side effects I can think of:

Considering that we are trying to create a new interface #9734 while maintaining the existing one for a period of time, I'm not too concerned about breaking change in the R version and would like to not block @david-cortes 's progress. We can review some of the changes in a case-by-case manner once the new interface by @david-cortes is ready.

If these concerns have already been considered, I apologize for the extra noise and for slowing you down unnecessarily.

Your input has been extremely valuable and I appreciate it!

@jameslamb
Copy link
Contributor

Ok that addresses my concerns, just wanted to be sure it had been considered before. Thank you!

@trivialfis trivialfis merged commit 4bc1f3a into dmlc:master Dec 6, 2023
26 of 28 checks passed
@trivialfis trivialfis deleted the r-version branch December 6, 2023 16:12
@david-cortes
Copy link
Contributor

To clarify: the requirement for R 4.3 specifically is for the altrep'd list class which can be used to supply custom serialization methods that only trigger on-demand. The rest of the changes probably wouldn't require a version higher than 4.0 or so.

Also, while today you might say that R 4.3 has only been around for 8 months, we aren't anywhere close to having the new R interface in a release-ready status, and by the time we're finished we might see 4.3 as being 'oldrel' already.

@jameslamb
Copy link
Contributor

Ahhh got it, thanks!

I think I didn't understand that PRs merged to master were considered to be contributing to that new R interface. I was assuming anything merged to master could end up being pushed to CRAN the next time XGBoost does a release, and that a new release could happen in days to weeks from now, not months.

Thanks for correcting my understanding.

@trivialfis
Copy link
Member Author

There are packages that use hardcoded results for tests, and the algorithm in XGB has changed significantly enough to break those packages in 2.0. As a result, we couldn't pass the reverse dependency check. It's unlikely that we will go through those packages for 2.0 and then do it again for the new interface.

@trivialfis
Copy link
Member Author

I did submit an update for 1.7 to keep XGB on CRAN though.

@mayer79
Copy link
Contributor

mayer79 commented Dec 11, 2023

One comment

Requiring R >= 4.3 is uncommonly strict. For instance, {ggplot2} requires R 3.4.4, {data.table} the even older 3.1.0. Companies often stick to a certain R version until there is sufficient pressure to update. This does not mean we should not do it, but there need to be very good reasons to do so.

(I had a second comment, but I think it is not relevant.)

@david-cortes
Copy link
Contributor

there need to be very good reasons to do so.

There are very good reasons to: R 4.3 introduces functionality that can be used to define custom serialization for more object types.

For xgboost, this means it will possible (after a few more PRs) to only trigger serialization on-demand, which has multiple benefits:

  • Reduced memory usage.
  • Faster fitting times (assuming a model that doesn't get immediately serialized).
  • No need to call functions like xgb.Booster.complete from the user side (in turn preventing mistakes such as de-serializing models at every predict call in a forked process, or timing the predict method incorrectly).
  • Reduced chances of generating inconsistencies between the R metadata and the C++ objects when something fails or gets interrupted.

@trivialfis
Copy link
Member Author

trivialfis commented Dec 12, 2023

For me, this is quite convincing

Reduced chances of generating inconsistencies between the R metadata and the C++ objects when something fails or gets interrupted.

The serialization has been a foot gun since forever.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants