-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] Engine CI Rework, Part 1 #9295
Conversation
+1 I see Mac native image failure:
Similar action on Linux is passing without any issues. CCing @Akirathan. I managed to reproduce the problem on my Mac. I am not yet sure why the failure happens, but I am assuming it is some mismatch in installed versions that triggers download which doesn't happen on Linux. Btw. why is Mac CI only "optional"? It should be passing regularly, should it not? |
Shouldn't Engine CI / Scala Tests (linux, x86_64) and similar jobs be required to pass? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More jobs (especially newly added) should in my opinion be required to pass.
@@ -108,6 +108,7 @@ impl RunContext { | |||
|
|||
/// Check that required programs are present (if not, installs them, if supported). Set | |||
/// environment variables for the build to follow. | |||
#[instrument(skip(self))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this mean?
This introduces a span coupled with the function execution.
Essentialy it logs function start and end, and groups logs from it under that span.
build/cli/src/arg/backend.rs
Outdated
// /// Run an SBT command. | ||
// Run { | ||
// #[clap(last = true)] | ||
// args: Vec<String>, | ||
// }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this commented out? Can we just remove that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, missed that one. Will remove.
enso-build-ci-gen-job-ci-check-backend-windows-x86_64: | ||
name: Engine (windows, x86_64) | ||
runs-on: | ||
- self-hosted | ||
- Windows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like we could have a single job
for all platforms and use the matrix
to schedule runs on various configurations. But I assume this approach was easier to implement in the Rust generator? If so, that seems totally OK too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like we could have a single job for all platforms and use the matrix to schedule runs on various configurations. But I assume this approach was easier to implement in the Rust generator? If so, that seems totally OK too.
IIRC it was needed when the generator was first implemented and while we stopped requiring this, there was not enough motivation to get rid of it.
As it is right now, it could probably be replaced with matrix
. Having this manually generated gives us more flexibility though, as it allows us e.g. to introduce cross-os dependencies between jobs. E.g. if we wanted Standard Library tests on MacOS to use Linux-built Engine distribution; or package IDE with Linux-generated WASM (while still building it on all supported platforms).
Also, this allows the generator to customize steps depending on os
. While I avoid doing this (as I want CI stuff to be as cross-platform as possible), having one more tool in the box is nice.
- self-hosted | ||
- Windows | ||
steps: | ||
- if: startsWith(runner.name, 'GitHub Actions') || startsWith(runner.name, 'Hosted Agent') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw. when is this condition not true? Sounds like it would be both for GH and self-hosted runners, so when not? Or am I misunderstanding something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true only for Github-hosted runners. Our self-hosted runners follow different naming convention (like "enso-org-engine-demeter-5"
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, I can't wait to be able to see this in action ❤️
As @JaroslavTulach mentioned, I think we have to make sure the newly split-off engine workflows: Scala Tests and Standard Library Tests are made required (excluding macOS, at least until we fix #9216). Otherwise we will start merging PRs with broken tests and that will be bad.
I added some general questions trying to understand the overall CI codebase in the context of these changes.
@JaroslavTulach @radeusgd |
But won't that mean that for a while we will be able to merge (even by accident) PRs that are not passing the tests (i.e. as soon as I merge in these changes)? Won't these changes only be in place once the PR with such changes lands? Thus they will start once this PR is merged. Then other pending PRs will just have to merge develop in to get the latest checks. That seems fine. I don't see reason to do this in two parts, it feels much safer to do 'atomically'. |
Yes, yes. By "after" I meant "immediately after". I might as well do "immediately before". |
I have created PR with the first set of changes for the Engine CI. The changes are small and effectively consist of: 1. Spltting the `verifyLicensePackages`. It is now run only on Linux. There are hardly any time benefits, as the actual job cost is dominated by the overhead of spinning a new job — but it is not expensive in the big picture. 2. Splitting the Scala Tests into separate job. This is probably the biggest "atomic" piece of work we have. 3. Splitting the Standard Library Tests into a separate job. The time is nicely split across the jobs now. The last run has: * 27 min for Scala tests; * 25 min for Standard Library tests; * 24 min for the "rest": the old job containing everything that has not been split. While total CPU time has increased (as jobs are not effectively reusing the same build context), the wall time has decreased significantly. Previously we had ~1 hour of wall time for the old monolithic job, so we are getting more than 2x speedup. The now-slowest Scala tests job is currently comparable with the native Rust tests (and they should improve when the old gui is gone) — which are the slowest job across all CI checks. The PR is pretty minimal. Several future improvements can be made: * Reorganizing and splitting other "heavy" jobs, like the native image generation. * Reusing the built Engine distribution. However, this is probably a lower priority than I initially thought. * Building package takes several minutes, so duplicating this job is not that expensive. * The package is OS-specific. * Scala tests don't really benefit from it, they'd need way more compilation artifacts. It'd make sense to reuse the distribution if we, for example, decided to split more jobs that actually benefit from it, like Standard Library tests. * Reusing the Rust build script binary. * As our self-hosted runners reuse environment, we effectively get this for free. Especially when Rust part of codebase is less frequently changed. * This is however significant cost for the GitHub-hosted runners, affecting our macOS runners. Reusing the binary does not save wall time for jobs that are run in parallel (as we have enough runners), but if we introduce job dependencies that'd force sequential execution of jobs on macOS, this would be a significant need.
Pull Request Description
I have created PR with the first set of changes for the Engine CI. The changes are small and effectively consist of:
verifyLicensePackages
. It is now run only on Linux. There are hardly any time benefits, as the actual job cost is dominated by the overhead of spinning a new job — but it is not expensive in the big picture.The time is nicely split across the jobs now. The last run has:
While total CPU time has increased (as jobs are not effectively reusing the same build context), the wall time has decreased significantly. Previously we had ~1 hour of wall time for the old monolithic job, so we are getting more than 2x speedup.
The now-slowest Scala tests job is currently comparable with the native Rust tests (and they should improve when the old gui is gone) — which are the slowest job across all CI checks.
The PR is pretty minimal. Several future improvements can be made:
It'd make sense to reuse the distribution if we, for example, decided to split more jobs that actually benefit from it, like Standard Library tests.
Important Notes
Checklist
Please ensure that the following checklist has been satisfied before submitting the PR:
Scala,
Java,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.
./run ide build
.