Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Engine CI Rework, Part 1 #9295

Merged
merged 11 commits into from
Mar 6, 2024
Merged

[CI] Engine CI Rework, Part 1 #9295

merged 11 commits into from
Mar 6, 2024

Conversation

mwu-tow
Copy link
Contributor

@mwu-tow mwu-tow commented Mar 5, 2024

Pull Request Description

I have created PR with the first set of changes for the Engine CI. The changes are small and effectively consist of:

  1. Spltting the verifyLicensePackages. It is now run only on Linux. There are hardly any time benefits, as the actual job cost is dominated by the overhead of spinning a new job — but it is not expensive in the big picture.
  2. Splitting the Scala Tests into separate job. This is probably the biggest "atomic" piece of work we have.
  3. Splitting the Standard Library Tests into a separate job.

The time is nicely split across the jobs now. The last run has:

  • 27 min for Scala tests;
  • 25 min for Standard Library tests;
  • 24 min for the "rest": the old job containing everything that has not been split.

While total CPU time has increased (as jobs are not effectively reusing the same build context), the wall time has decreased significantly. Previously we had ~1 hour of wall time for the old monolithic job, so we are getting more than 2x speedup.

The now-slowest Scala tests job is currently comparable with the native Rust tests (and they should improve when the old gui is gone) — which are the slowest job across all CI checks.

The PR is pretty minimal. Several future improvements can be made:

  • Reorganizing and splitting other "heavy" jobs, like the native image generation.
  • Reusing the built Engine distribution. However, this is probably a lower priority than I initially thought.
    • Building package takes several minutes, so duplicating this job is not that expensive.
    • The package is OS-specific.
    • Scala tests don't really benefit from it, they'd need way more compilation artifacts.
      It'd make sense to reuse the distribution if we, for example, decided to split more jobs that actually benefit from it, like Standard Library tests.
  • Reusing the Rust build script binary.
    • As our self-hosted runners reuse environment, we effectively get this for free. Especially when Rust part of codebase is less frequently changed.
    • This is however significant cost for the GitHub-hosted runners, affecting our macOS runners. Reusing the binary does not save wall time for jobs that are run in parallel (as we have enough runners), but if we introduce job dependencies that'd force sequential execution of jobs on macOS, this would be a significant need.

Important Notes

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • All code has been tested:
    • Unit tests have been written where possible.
    • If GUI codebase was changed, the GUI was tested when built using ./run ide build.

@mwu-tow mwu-tow self-assigned this Mar 5, 2024
@mwu-tow mwu-tow added the CI: No changelog needed Do not require a changelog entry for this PR. label Mar 5, 2024
@mwu-tow mwu-tow marked this pull request as ready for review March 5, 2024 21:40
@JaroslavTulach
Copy link
Member

JaroslavTulach commented Mar 6, 2024

Previously we had ~1 hour of wall time for the old monolithic job, so we are getting more than 2x speedup.

+1

I see Mac native image failure:

ERROR main_internal: enso_build_cli: error=Failed to get output of the command: Command:
	ENSO_DATA_DIRECTORY="/Users/runner/work/enso/enso/built-distribution/enso-engine-2024.1.1-dev-macos-amd64/enso-2024.1.1-dev" "/Users/runner/work/enso/enso/runner" "--run" "/Users/runner/work/enso/enso/engine/runner/src/test/resources/Factorial.enso" "6"


[WARN] [2024-03-05T21:12:22Z] [org.enso.editions.updater.EditionUpdater] Failed to fetch editions from [https://editions.release.enso.org/enso]: {}
       java.lang.UnsupportedOperationException: Preview Features not enabled, need to run with --enable-preview
       	at [email protected]/jdk.internal.misc.PreviewFeatures.ensureEnabled(PreviewFeatures.java:49)
       	at [email protected]/java.lang.Thread.ofVirtualWithoutLoom(Thread.java:680)
       	at [email protected]/java.util.concurrent.Executors.newVirtualThreadPerTaskExecutor(Executors.java:269)
       	at org.enso.downloader.http.HTTPDownload$.asyncDownload(HTTPDownload.scala:82)
       	at org.enso.downloader.http.HTTPDownload$.fetchString(HTTPDownload.scala:53)

  Exception in thread "main" org.graalvm.polyglot.PolyglotException: org.enso.editions.EditionResolutionError$CannotLoadEdition: Cannot load edition [2024.1.1-dev]: The edition was not found.
       	at org.enso.editions.EditionResolver.$anonfun$resolveParent$1(EditionResolver.scala:157)
       	at scala.util.Either$LeftProjection.map(Either.scala:614)
       	at org.enso.editions.EditionResolver.resolveParent(EditionResolver.scala:157)
       	at org.enso.editions.EditionResolver.resolveEdition(EditionResolver.scala:39)
       	at org.enso.editions.EditionResolver.resolve(EditionResolver.scala:29)
       	at org.enso.editions.updater.EditionManager.resolveEdition(EditionManager.scala:26)

Similar action on Linux is passing without any issues. CCing @Akirathan. I managed to reproduce the problem on my Mac. I am not yet sure why the failure happens, but I am assuming it is some mismatch in installed versions that triggers download which doesn't happen on Linux.

Btw. why is Mac CI only "optional"? It should be passing regularly, should it not?

@JaroslavTulach
Copy link
Member

JaroslavTulach commented Mar 6, 2024

Shouldn't Engine CI / Scala Tests (linux, x86_64) and similar jobs be required to pass?

Copy link
Member

@JaroslavTulach JaroslavTulach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More jobs (especially newly added) should in my opinion be required to pass.

@@ -108,6 +108,7 @@ impl RunContext {

/// Check that required programs are present (if not, installs them, if supported). Set
/// environment variables for the build to follow.
#[instrument(skip(self))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this mean?

This introduces a span coupled with the function execution.
Essentialy it logs function start and end, and groups logs from it under that span.

Comment on lines 50 to 54
// /// Run an SBT command.
// Run {
// #[clap(last = true)]
// args: Vec<String>,
// },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this commented out? Can we just remove that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, missed that one. Will remove.

Comment on lines +127 to +131
enso-build-ci-gen-job-ci-check-backend-windows-x86_64:
name: Engine (windows, x86_64)
runs-on:
- self-hosted
- Windows
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like we could have a single job for all platforms and use the matrix to schedule runs on various configurations. But I assume this approach was easier to implement in the Rust generator? If so, that seems totally OK too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like we could have a single job for all platforms and use the matrix to schedule runs on various configurations. But I assume this approach was easier to implement in the Rust generator? If so, that seems totally OK too.

IIRC it was needed when the generator was first implemented and while we stopped requiring this, there was not enough motivation to get rid of it.

As it is right now, it could probably be replaced with matrix. Having this manually generated gives us more flexibility though, as it allows us e.g. to introduce cross-os dependencies between jobs. E.g. if we wanted Standard Library tests on MacOS to use Linux-built Engine distribution; or package IDE with Linux-generated WASM (while still building it on all supported platforms).

Also, this allows the generator to customize steps depending on os. While I avoid doing this (as I want CI stuff to be as cross-platform as possible), having one more tool in the box is nice.

- self-hosted
- Windows
steps:
- if: startsWith(runner.name, 'GitHub Actions') || startsWith(runner.name, 'Hosted Agent')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw. when is this condition not true? Sounds like it would be both for GH and self-hosted runners, so when not? Or am I misunderstanding something?

Copy link
Contributor Author

@mwu-tow mwu-tow Mar 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true only for Github-hosted runners. Our self-hosted runners follow different naming convention (like "enso-org-engine-demeter-5").

Copy link
Member

@radeusgd radeusgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, I can't wait to be able to see this in action ❤️

As @JaroslavTulach mentioned, I think we have to make sure the newly split-off engine workflows: Scala Tests and Standard Library Tests are made required (excluding macOS, at least until we fix #9216). Otherwise we will start merging PRs with broken tests and that will be bad.

I added some general questions trying to understand the overall CI codebase in the context of these changes.

@mwu-tow
Copy link
Contributor Author

mwu-tow commented Mar 6, 2024

@JaroslavTulach @radeusgd
Good point about adjusting required checks. I will do this after merging this, as this is repository-wide configuration setting. Applying it earlier, would block all other PRs (that are obviously missing the new checks).

@radeusgd
Copy link
Member

radeusgd commented Mar 6, 2024

@JaroslavTulach @radeusgd Good point about adjusting required checks. I will do this after merging this, as this is repository-wide configuration setting. Applying it earlier, would block all other PRs (that are obviously missing the new checks).

But won't that mean that for a while we will be able to merge (even by accident) PRs that are not passing the tests (i.e. as soon as I merge in these changes)?

Won't these changes only be in place once the PR with such changes lands? Thus they will start once this PR is merged. Then other pending PRs will just have to merge develop in to get the latest checks. That seems fine.

I don't see reason to do this in two parts, it feels much safer to do 'atomically'.

@mwu-tow
Copy link
Contributor Author

mwu-tow commented Mar 6, 2024

But won't that mean that for a while we will be able to merge (even by accident) PRs that are not passing the tests (i.e. as soon as I merge in these changes)?

Won't these changes only be in place once the PR with such changes lands? Thus they will start once this PR is merged. Then other pending PRs will just have to merge develop in to get the latest checks. That seems fine.

I don't see reason to do this in two parts, it feels much safer to do 'atomically'.

Yes, yes. By "after" I meant "immediately after". I might as well do "immediately before".

@mwu-tow mwu-tow added the CI: Ready to merge This PR is eligible for automatic merge label Mar 6, 2024
@mergify mergify bot merged commit e930738 into develop Mar 6, 2024
33 of 36 checks passed
@mergify mergify bot deleted the wip/mwu/splitLicenseCheck branch March 6, 2024 18:56
MichaelMauderer pushed a commit that referenced this pull request Mar 6, 2024
I have created PR with the first set of changes for the Engine CI. The changes are small and effectively consist of:
1. Spltting the `verifyLicensePackages`. It is now run only on Linux. There are hardly any time benefits, as the actual job cost is dominated by the overhead of spinning a new job — but it is not expensive in the big picture.
2. Splitting the Scala Tests into separate job. This is probably the biggest "atomic" piece of work we have.
3. Splitting the Standard Library Tests into a separate job.

The time is nicely split across the jobs now. The last run has:
* 27 min for Scala tests;
* 25 min for Standard Library tests;
* 24 min for the "rest": the old job containing everything that has not been split.

While total CPU time has increased (as jobs are not effectively reusing the same build context), the wall time has decreased significantly. Previously we had ~1 hour of wall time for the old monolithic job, so we are getting more than 2x speedup.

The now-slowest Scala tests job is currently comparable with the native Rust tests (and they should improve when the old gui is gone) — which are the slowest job across all CI checks.

The PR is pretty minimal. Several future improvements can be made:
* Reorganizing and splitting other "heavy" jobs, like the native image generation.
* Reusing the built Engine distribution. However, this is probably a lower priority than I initially thought.
* Building package takes several minutes, so duplicating this job is not that expensive.
* The package is OS-specific.
* Scala tests don't really benefit from it, they'd need way more compilation artifacts.
It'd make sense to reuse the distribution if we, for example, decided to split more jobs that actually benefit from it, like Standard Library tests.
* Reusing the Rust build script binary.
* As our self-hosted runners reuse environment, we effectively get this for free. Especially when Rust part of codebase is less frequently changed.
* This is however significant cost for the GitHub-hosted runners, affecting our macOS runners. Reusing the binary does not save wall time for jobs that are run in parallel (as we have enough runners), but if we introduce job dependencies that'd force sequential execution of jobs on macOS, this would be a significant need.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: No changelog needed Do not require a changelog entry for this PR. CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants