Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce gitflow-incremental-builder #13243

Merged
merged 1 commit into from
Mar 1, 2021
Merged

Introduce gitflow-incremental-builder #13243

merged 1 commit into from
Mar 1, 2021

Conversation

famod
Copy link
Member

@famod famod commented Nov 11, 2020

Relates to #11622
Fixes #6260

Note: master, release branches (e.g. 1.10) and backport-branches (without PR) are not built incrementally (yet)!

Some general remarks:

  • various details have been discussed here: https://quarkusio.zulipchat.com/#narrow/stream/187038-dev/topic/gitflow-incremental-builder.20(GIB)
  • for branches without a PR, I decided to use quarkusio/quarkus master and not the fork's master as the reference because no one on Zulip had a (strong) opinion about this and since some (most?) devs don't seem to push to their fork's master at all, fetching quarkusio/quarkus master seemed like the best compromise
  • fetch-depth: 0 is required to have a fully usable history and the reference branch available (in case the reference is not quarkusio/quarkus master, which is fetched explicitely)
  • the Maven profile is set up in a way that allows local usage as well, with a slightly different behaviour than in CI - e.g. I usually do want GIB to care about uncommitted and untracked changes locally, but not in CI
  • GIB cannot do anything about the gradle jobs, so those will always perform full builds
  • I tested this rather thoroughly but in my quarkus fork I could not test the real setup. So in the worst case (in case of errors after merging this) I/we have to be quick to fix things or revert this.

I'll also add some review comments myself.

Edit: List of TODOs for a better overview (updated on demand):

Idea for testing this closer to reality:
Someone with write access to quarkusio/quarkus could create a branch named e.g. gib-test in quarkusio/quarkus from this PR branch and could then branch off gib-test in his/her fork (!) and create a test PR.

@ghost ghost added the area/infra-automation anything related to CI, bots, etc. that are used to automated our infrastructure label Nov 11, 2020
@@ -95,6 +95,30 @@ jobs:
- name: Delete Local Artifacts From Cache
shell: bash
run: rm -r ~/.m2/repository/io/quarkus
- name: Get GIB arguments
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking long and hard for a possibility to have some kind of conditional logic in the env section but it seems this is not supported by GH Workflows.
That's why I ended up with this build step that creates an output that is then exposed as a job output to subsequent jobs.
I could have created a spearate job for this but then I would need to extend needs of each job.

fetch-depth: 0
- name: Add quarkusio remote
shell: bash
run: git remote add quarkusio https://github.com/quarkusio/quarkus.git
Copy link
Member Author

@famod famod Nov 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes:

  • this entire block is now present for almost each job
  • I decided to only add the remote, not directly fetch from it, so that GIB can decide whether or not to fetch master at all (see the "Get GIB arguments" step further up for the cases where GIB disables itself)

@@ -154,7 +183,8 @@ jobs:
shell: bash
run: tar -xzf maven-repo.tgz -C ~
- name: Build with Maven
run: eval mvn $JVM_TEST_MAVEN_OPTS install -pl !integration-tests/gradle -pl !integration-tests/maven -pl !integration-tests/devtools ${{ matrix.java.maven_args}}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some nasty formatting/escaping problems with eval, especially with -Dgib.disableIfBranchRegex='master|\d+\.\d+|.*backport.*' (see further up).
AFAICT eval ist not required so I simply removed it from every mvn call.

do modules+=("integration-tests/$i"); done
IFS=,
eval mvn -pl "${modules[*]}" $NATIVE_TEST_MAVEN_OPTS
do modules+="integration-tests/$i,"; done
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally to eval (see futher up) I also had issues with IFS=,.
Since Maven doesn't mind a trailing ,, I simplified it all a bit.

pom.xml Outdated Show resolved Hide resolved
pom.xml Outdated Show resolved Hide resolved
@gastaldi gastaldi linked an issue Nov 11, 2020 that may be closed by this pull request
2 tasks
@gastaldi gastaldi requested review from gsmet and n1hility November 11, 2020 21:12
@famod
Copy link
Member Author

famod commented Nov 12, 2020

It just occured to me that if we use this locally, we will most likely run into a problem similar to this: #13218 (comment)

Of course this PR here is not about parallel builds, but incremental builds are also affected because in case an extension changes (or is added) in a way that is relevant for the platform descriptor generation, an incremetal build will not include quarkus-bom-quarkus-platform-descriptor since that module has no dependencies at all.

In CI we are safe since we have this initial single threaded build that will produce a consistent platform descriptor.
Therefore I don't think this problem should be blocking this PR, but we maybe should not advertise local usage just yet.

PS: I might be able to work around this via/in GIB but then again we also have local builds without GIB, so we need a general solution.

@aloubyansky
Copy link
Member

The descriptor in fact does depend on the presence of the extensions it includes. So we could introduce dependencies at least on the POMs of all the extensions.

@famod
Copy link
Member Author

famod commented Nov 12, 2020

The descriptor in fact does depend on the presence of the extensions it includes. So we could introduce dependencies at least on the POMs of all the extensions.

Yeah, that would be the hideously verbose but standard Maven solution. I'd like to continue this discussion in #13218.

Copy link
Member

@gsmet gsmet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few questions.

.github/workflows/ci-actions.yml Outdated Show resolved Hide resolved
.github/workflows/ci-actions.yml Outdated Show resolved Hide resolved
@ghost ghost added the area/dependencies Pull requests that update a dependency file label Nov 18, 2020
@famod
Copy link
Member Author

famod commented Nov 18, 2020

Rebased & updated:

@famod
Copy link
Member Author

famod commented Nov 22, 2020

I'll finish this after #13218 was merged.

@famod
Copy link
Member Author

famod commented Nov 25, 2020

I've just rebased and had to adapt to 5244c9c.
Since that commit brought in the "external" repo quarkus-rest-testsuite (and a mvn invocation for it), GIB is not able to help and therefore this step will not be run incrementally (takes ~4mins currently).
/cc @FroMage @geoand

@geoand
Copy link
Contributor

geoand commented Nov 25, 2020

Thanks @famod

@famod
Copy link
Member Author

famod commented Nov 27, 2020

PrometheusMetricsRegistryTest fails in all "JVM Tests" jobs and I don't understand why:

[ERROR] Tests run: 8, Failures: 0, Errors: 1, Skipped: 7, Time elapsed: 6.509 s <<< FAILURE! - in io.quarkus.it.micrometer.prometheus.PrometheusMetricsRegistryTest
[ERROR] io.quarkus.it.micrometer.prometheus.PrometheusMetricsRegistryTest.testRegistryInjection  Time elapsed: 0.01 s  <<< ERROR!
java.lang.RuntimeException: 
java.lang.VerifyError: Bad type on operand stack
Exception Details:
  Location:
    io/quarkus/resteasy/reactive/server/deployment/QuarkusRestProcessor.setupEndpoints(Lio/quarkus/deployment/Capabilities;Lio/quarkus/arc/deployment/BeanArchiveIndexBuildItem;Lio/quarkus/arc/deployment/BeanContainerBuildItem;Lio/quarkus/resteasy/reactive/common/runtime/QuarkusRestConfig;Ljava/util/Optional;Lio/quarkus/deployment/annotations/BuildProducer;Lio/quarkus/deployment/annotations/BuildProducer;Lio/quarkus/rest/server/runtime/QuarkusRestRecorder;Lio/quarkus/deployment/recording/RecorderContext;Lio/quarkus/deployment/builditem/ShutdownContextBuildItem;Lio/quarkus/vertx/http/runtime/HttpBuildTimeConfig;Ljava/util/List;Ljava/util/List;Ljava/util/List;Ljava/util/List;Ljava/util/List;Ljava/util/List;Ljava/util/List;Ljava/util/List;Ljava/util/List;Ljava/util/List;Ljava/util/List;Ljava/util/Optional;Ljava/util/Optional;Lio/quarkus/deployment/annotations/BuildProducer;Lio/quarkus/deployment/annotations/BuildProducer;Lio/quarkus/deployment/annotations/BuildProducer;Lio/quarkus/resteasy/reactive/common/deployment/ApplicationResultBuildItem;)V @1678: invokestatic
  Reason:
    Type 'org/jboss/resteasy/reactive/server/core/ServerSerialisers' (current frame, stack[6]) is not assignable to 'org/jboss/resteasy/reactive/common/core/Serialisers'

@ebullient do you have a clue what's going on? Thanks!

@ebullient
Copy link
Member

ebullient commented Nov 28, 2020

PrometheusMetricsRegistryTest fails in all "JVM Tests" jobs and I don't understand why:

[ERROR] Tests run: 8, Failures: 0, Errors: 1, Skipped: 7, Time elapsed: 6.509 s <<< FAILURE! - in io.quarkus.it.micrometer.prometheus.PrometheusMetricsRegistryTest
[ERROR] io.quarkus.it.micrometer.prometheus.PrometheusMetricsRegistryTest.testRegistryInjection  Time elapsed: 0.01 s  <<< ERROR!
java.lang.RuntimeException: 
java.lang.VerifyError: Bad type on operand stack
Exception Details:
  Location:
    io/quarkus/resteasy/reactive/server/deployment/QuarkusRestProcessor.setupEndpoints(Lio/quarkus/deployment/Capabilities;Lio/quarkus/arc/deployment/BeanArchiveIndexBuildItem;Lio/quarkus/arc/deployment/BeanContainerBuildItem;Lio/quarkus/resteasy/reactive/common/runtime/QuarkusRestConfig;Ljava/util/Optional;Lio/quarkus/deployment/annotations/BuildProducer;Lio/quarkus/deployment/annotations/BuildProducer;Lio/quarkus/rest/server/runtime/QuarkusRestRecorder;Lio/quarkus/deployment/recording/RecorderContext;Lio/quarkus/deployment/builditem/ShutdownContextBuildItem;Lio/quarkus/vertx/http/runtime/HttpBuildTimeConfig;Ljava/util/List;Ljava/util/List;Ljava/util/List;Ljava/util/List;Ljava/util/List;Ljava/util/List;Ljava/util/List;Ljava/util/List;Ljava/util/List;Ljava/util/List;Ljava/util/List;Ljava/util/Optional;Ljava/util/Optional;Lio/quarkus/deployment/annotations/BuildProducer;Lio/quarkus/deployment/annotations/BuildProducer;Lio/quarkus/deployment/annotations/BuildProducer;Lio/quarkus/resteasy/reactive/common/deployment/ApplicationResultBuildItem;)V @1678: invokestatic
  Reason:
    Type 'org/jboss/resteasy/reactive/server/core/ServerSerialisers' (current frame, stack[6]) is not assignable to 'org/jboss/resteasy/reactive/common/core/Serialisers'

@ebullient do you have a clue what's going on? Thanks!

Hmm. I don't know off the top of my head. The other micrometer tests are ok.. it's just that one?
Do I have a test/project dependency crossed (server/core vs. common/core)? That test isn't doing anything fancy with REST.. so I would guess I mis-specified a dependency first

@famod
Copy link
Member Author

famod commented Nov 28, 2020

The other micrometer tests are ok.. it's just that one?

The following tests are ok:

  • io.quarkus.it.micrometer.jmx.JmxMetricsRegistryTest
  • io.quarkus.it.micrometer.mpmetrics.MPMetricsTest
  • io.quarkus.it.micrometer.native_mode.NativeMeterRegistriesTest

But in the module micrometer-prometheus there are no other tests, only the failing PrometheusMetricsRegistryTest.

@ebullient
Copy link
Member

The following tests are ok:

* io.quarkus.it.micrometer.jmx.JmxMetricsRegistryTest
* io.quarkus.it.micrometer.mpmetrics.MPMetricsTest
* io.quarkus.it.micrometer.native_mode.NativeMeterRegistriesTest

But in the module micrometer-prometheus there are no other tests, only the failing PrometheusMetricsRegistryTest.

It was the other buckets I meant (they all set up the same registries, more or less. The JMX test does not use Resteasy, but the MP Metrics test does. The projects and tests between MP Metrics and Prometheus are pretty similar in structure, so I don't understand why just the Prometheus test would fail.. and I have no idea why one ServerSerialiser would be selected over the other ... (or what the incremental build touches that would be relevant here, as it clearly all passes in the full builds.. )

@famod
Copy link
Member Author

famod commented Nov 29, 2020

https://github.com/vackosar/gitflow-incremental-builder#gibbuildall works around the problem but is not very efficient for 700+ modules. Still digging...

@famod
Copy link
Member Author

famod commented Nov 29, 2020

I found a much better alternative to buildAll (but it's still a workaround):
-Dgib.forceBuildModules=resteasy-reactive (see also: https://github.com/vackosar/gitflow-incremental-builder#gibforcebuildmodules)
This will always trigger a build of independent-projects/resteasy-reactive/server/runtime which contains ServerSerialisers.

What I don't get is why is a rebuild of this module even needed? The module is already built in the "Initial JDK 11 Build" and is shared via the "persisted" Maven repo.
The parameters for mvn in that initial job are a bit different than the parameters in the JVM tests jobs but nothing that really caught my eye, except for --settings .github/mvn-settings.xml which is not set for the initial job.
Adding this settings file to the initial job does not change anything, though.

@geoand @FroMage any idea what's going on? I'm talking about #13243 (comment) and subsequent comments.

@maxandersen
Copy link
Member

I've created https://github.com/quarkusio/quarkus/tree/gib-test that is this branch but in quarkusio ..now just need to find a PR to try with ;)

@geoand
Copy link
Contributor

geoand commented Nov 30, 2020

@geoand @FroMage any idea what's going on? I'm talking about #13243 (comment) and subsequent comments.

I don't see why this would cause a problem :(.

@famod
Copy link
Member Author

famod commented Jan 19, 2021

@n1hility I'd love to hear your opinion about this! 🙂

@gsmet
Copy link
Member

gsmet commented Feb 11, 2021

@famod this needs a rebase as I merged your other PR (I think that's the one that created the conflict)

So... my take on this is that we should have some shadowing of the current CI for ~ a week to be able to see what's going on. So, I would probably duplicate the workflow first, have it run for a week and then decide depending on the result of it.

We will pay a CI price for a week but if GIB works great, it should be sustainable.

We don't update the ci-actions workflow that often now that the native test list is in a JSON file so I think it should be doable without too much pain.

@famod WDYT about this strategy?

@famod
Copy link
Member Author

famod commented Feb 11, 2021

Thanks for your feedback, @gsmet!

@famod WDYT about this strategy?

So the idea is to have a copy of the pre-GIB state of ci-actions.yml as...say... ci-actions-not-incremental.yml or so?
If the available resources can handle it and we mange to invest some "monitoring time", then this sounds like a good idea!
If it's too much then maybe only pick some very critical jobs? (e.g. leave out one or two JDK test jobs)

In the beginning, I actually aimed at making just a few jobs incremental as a start, but with that approach you wouldn't have a full side-by-side comparison.

IIRC, @ebullient mentioned that in a past project they had a permanent weekly (?) full build to make people more confident with the incremental approach. This might also be something to consider (additionally).

But before we merge this I have to complete the remaining tasks.

The one task with the new generated dependencies of devtools/bom-descriptor-json (#13218) is not so trivial.
If we don't do anything about it, a single change to any of the extensions modules will trigger all integration-tests modules because all (most?) IT modules depend on the descriptor and thus on all extensions.
This would be a massive setback for the incremenal build efficiency.
(When I say "depend on" here, I don't mean compile, runtime etc., I mean it in terms of the reactor build order.)

A simplistic solution would be to take devtools/bom-descriptor-json out of the reactor in IT jobs via a Maven profile (-pl !... doesn't work here). But this means that changes to bom-descriptor-json or quarkus-platform-descriptor-json-plugin would not trigger any ITs (and to some extend some of the plugin's upstream dependencies wouldn't trigger them either, depending on the actual dependency graph).

The alternative to this would be a feature in GIB that can remove certain dependency subpaths from the reactor when certain conditions are met. Doesn't seem easy to implement, especially given the API Maven provides for this.

Next best (but slightly weird) solution might be to disable GIB in case certain modules are changed. This is something CI would have to figure out. And of course you would end up with a full build.

/cc @aloubyansky What's your take on this?

PS: Sorry for the long text but this must be done right. CI/build consistency over efficiency.

@famod
Copy link
Member Author

famod commented Feb 11, 2021

I just had another idea reagarding the descriptor dependency graph problem:

If we move all those generated extension dependencies to a profile (that is active by default), we could disable it for incremental builds. So changes to the descriptor, its plugin or any of the plugin's upstream dependencies would trigger all ITs, but not a change to a single extension module!

/cc @aloubyansky ^ (for now this feels like the best solution but maybe I'm missing something)

PS: Just as a recap: The initial JDK build will always install a consistent descriptor artifact and the repo is shared with the subsequent jobs. That's why you could skip the descriptor module entirely in the IT jobs etc.

@gsmet
Copy link
Member

gsmet commented Feb 11, 2021

So the idea is to have a copy of the pre-GIB state of ci-actions.yml as...say... ci-actions-not-incremental.yml or so?
If the available resources can handle it and we mange to invest some "monitoring time", then this sounds like a good idea!
If it's too much then maybe only pick some very critical jobs? (e.g. leave out one or two JDK test jobs)

I would better create yours as a new one for now. And, yes, we could enable only one JDK for the incremental build.

As for CI resources, we will see, it will for sure slow things down but I'm not comfortable merging this without having some sort of shadowing for a while.

IIRC, @ebullient mentioned that in a past project they had a permanent weekly (?) full build to make people more confident with the incremental approach. This might also be something to consider (additionally).

I agree.

As for the rest, I will let you discuss it with @aloubyansky :).

@ghost ghost added the area/devtools Issues/PR related to maven, gradle, platform and cli tooling/plugins label Feb 11, 2021
@famod
Copy link
Member Author

famod commented Feb 11, 2021

I've rebased and added another commit that shows my idea with the depdencies profile in bom-descriptor-json.
If we agree on this approach, I'll finalize the indentation logic in update-extension-dependencies.sh.

@aloubyansky
Copy link
Member

PS: Just as a recap: The initial JDK build will always install a consistent descriptor artifact and the repo is shared with the subsequent jobs. That's why you could skip the descriptor module entirely in the IT jobs etc.

That is true.

@famod
Copy link
Member Author

famod commented Feb 12, 2021

I think I will use the incremental property to control that new profile in bom-descriptor-json instead of adding that other new clunky property.
Also, I'll test skipping maven-install-plugin on incremental in that module to avoid an inconsistent descriptor being installed (hoping that the descriptor lookup in the ITs doesn't barf and is resorting to the descriptor in the local repo).

There might also be a problem with independent-projects not being rebuilt if e.g. .github/** contains changes, because those modules don't depend on the root pom.xml (which GIB is normally assigning those changes to).
If testing confirms this (90% sure), I'll have to introduce something in GIB to support this special case. Doing this in the workflow instead would be rather verbose and fragile.

So, it doesn't get boring! 😉 But better find the problems now than later.

@famod
Copy link
Member Author

famod commented Feb 15, 2021

There might also be a problem with independent-projects not being rebuilt if e.g. .github/** contains changes

Confirmed, as expected. I'll add disableIfChangedRegex (or similar) to GIB and release 3.12.2.

The other adjustments are being tested at the moment in another branch of my fork. "Shadow CI" is still left to do (as is a first iteration to CONTRIBUTING.md).

@famod
Copy link
Member Author

famod commented Feb 15, 2021

I kicked off a run of the "Shadow CI" in my fork (including GIB 3.12.2 that I've just released).
If all goes as planned I'll update CONTRIBUTING and this branch tomorrow and we should be good to go!

@famod
Copy link
Member Author

famod commented Feb 16, 2021

All TODOs are done! 🎉 (AFAICS)

/cc @aloubyansky @gsmet

@famod famod requested review from aloubyansky and gsmet February 16, 2021 12:54
@geoand
Copy link
Contributor

geoand commented Feb 16, 2021

EPIC!

@famod
Copy link
Member Author

famod commented Feb 16, 2021

FTR, Quarkus CI incremental does not include any Windows job and it has only a single JVM test job (JDK11). Should suffice for the temporary dual/shadow setup, IMO.

@famod
Copy link
Member Author

famod commented Feb 16, 2021

Websocket extension renaming got in the way, merge conflict resolved.

@famod
Copy link
Member Author

famod commented Feb 18, 2021

It's sad that stef wasn't found, but this has nothing to do with GIB 🤓

Error:  Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.652 s <<< FAILURE! - in io.quarkus.resteasy.reactive.jsonb.deployment.test.sse.SseTestCase
Error:  io.quarkus.resteasy.reactive.jsonb.deployment.test.sse.SseTestCase.testMultiFromMulti  Time elapsed: 0.013 s  <<< FAILURE!
java.lang.AssertionError: 

Expecting:
  []
to contain exactly (and in same order):
  ["hello", "stef"]
but could not find the following elements:
  ["hello", "stef"]

@famod
Copy link
Member Author

famod commented Feb 19, 2021

bom-descriptor-json/pom.xml keeps getting merge conflicts...

Relates to #11622

Note: master, release branches (e.g. 1.10) and backport-branches are not built incrementally!

Includes some efficiency teaks to bom-descriptor-json
and a "shadow CI" for a temporary parallel testing phase.
@gsmet
Copy link
Member

gsmet commented Mar 1, 2021

OK, let's merge this and see how it goes for a week!

Thanks a lot for your continued work on this, @famod ! Hopefully, it will drastically help with our CI feedback loop!

@gsmet gsmet merged commit 6e00598 into quarkusio:master Mar 1, 2021
@quarkus-bot quarkus-bot bot added this to the 1.13 - master milestone Mar 1, 2021
@famod famod deleted the gitflow-incremental-builder branch March 1, 2021 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dependencies Pull requests that update a dependency file area/devtools Issues/PR related to maven, gradle, platform and cli tooling/plugins area/infra-automation anything related to CI, bots, etc. that are used to automated our infrastructure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Overall CI improvement Improve efficiency of CI pipeline resources by partial rebuild
7 participants