Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci] simplify CI configurations, parallelize compilation, test CUDA on Ubuntu 22.04 #6458

Merged
merged 34 commits into from
May 23, 2024

Conversation

jameslamb
Copy link
Collaborator

@jameslamb jameslamb commented May 17, 2024

Proposes the following changes to the CI setup:

In the CUDA jobs:

  • add support for triggering a build by clicking a button in the UI (workflow dispatch)
  • removes code running "set up nvidia-docker and restart the docker daemon" on every CUDA build
    • to save a bit of CI time on each CUDA build
    • added an option to do this on the manual runs, so that can be done by every maintainer but
  • upgrades Ubuntu versions
    • to Ubuntu 20.04 for CUDA 11.x, Ubuntu 22.04 for CUDA 12.x (see Notes)
  • uses GitHub Actions container support directly instead of calling docker run in a script: block
  • writes miniforge to /tmp instead of a directory that's mounted in from the self-hosted runner
    • to avoid files being left behind on the hosted runner, which I found sometimes happened in failed builds, causing conflicts like "/home/github/miniforge already exists" on the next build
  • updates actions/checkout from v1 to v3
    • can't go all the way to v4 yet because GLIBC in the container images used in this job aren't new enough

On most of the CI jobs:

  • stops setting unused variable GITHUB_ACTIONS=true
    • *this is set automatically by GITHUB_ACTIONS anyway (docs)
  • moves "read LightGBM's version out of VERSION.txt" into the 2 CI scripts that need it, instead of having it defined as inline shell code across most of the CI configs
  • sets CMAKE_BUILD_PARALLEL_LEVEL=4 environment variable (see Notes)

If any of these generate a lot of discussion, I'll split this up into smaller PRs. But thought the sum total was small enough to do as a single PR.

Notes for Reviewers

Why set CMAKE_BUILD_PARALLEL_LEVEL?

This environment variable is the equivalent of passing e.g. -j4 to cmake --build or make.
It tells that build tool (Ninja, in most of our builds here), to compile multiple objects at a time.

We set that in builds that separately invoke cmake, like here:

cmake --build build --target lightgbm -j4 || exit 1

But currently any builds that are just running sh build-python.sh or Rscript build_r.R are performing serial compilation.

Setting this to a value greater than 1 should speed up builds.
I chose 4 because we're already using -j4 in lots of places, and it seems to be working well.

References:

  • scikit-build-core docs recommending this (link)
  • CMake docs on CMAKE_BUILD_PARALLEL_LEVEL (link)

Why update Ubuntu versions?

It helps with the GitHub Actions Node 16/20 situation: #6453 (comment).

But more importantly, I think it's more likely to match the set of operating systems and library versions that lightgbm users are using in their environments.

Ubuntu 22.04 has been available for 2 years (Ubuntu release history) and all of RAPIDS CI uses Ubuntu 20.04 and 22.04:

https://github.com/rapidsai/shared-workflows/blob/19d17957e59cf81574f214e043adf8cff7db9447/.github/workflows/wheels-test.yaml#L81-L85

Other References

Some related PRs explaining the history of the CUDA jobs:

@jameslamb jameslamb changed the title WIP: [ci] reduce duplication of LGB_VER across CI configs WIP: [ci] simplify some CI configurations May 17, 2024
@jameslamb jameslamb changed the title WIP: [ci] simplify some CI configurations WIP: [ci] simplify CI configurations, parallelize compilation for more builds May 18, 2024
@jameslamb jameslamb changed the title WIP: [ci] simplify CI configurations, parallelize compilation for more builds WIP: [ci] simplify CI configurations, parallelize compilation, test CUDA on Ubuntu 22.04 May 18, 2024
@jameslamb jameslamb changed the title WIP: [ci] simplify CI configurations, parallelize compilation, test CUDA on Ubuntu 22.04 [ci] simplify CI configurations, parallelize compilation, test CUDA on Ubuntu 22.04 May 18, 2024
@jameslamb jameslamb marked this pull request as ready for review May 18, 2024 07:06
Copy link
Collaborator

@borchero borchero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for spending so much effort to improve the CI here @jameslamb! 🙏🏼

@jameslamb
Copy link
Collaborator Author

Sure, happy to do it! Thanks for all the reviews!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants