Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make CI only build/test modules with any changes #10323

Merged
merged 9 commits into from
Feb 28, 2022

Conversation

nineinchnick
Copy link
Member

The gitflow-incremental-builder (GIB) Maven extension allows to only build modules that contain any changes. It compares the current git branch with the merge base and selects modules with any modified files.

I enable buildUpstream to make sure all (upstream) dependencies are built, but without running tests. I also enable buildDownstream to build and test all dependant (downstream) modules. If there are changes in the root module, everything would be built.

For example, when there would only be changes in the plugin/trino-hive-hadoop2 module, only the following would be built:

[INFO] Reactor Build Order:
[INFO] 
[INFO] trino-root                                                         [pom]
[INFO] trino-testing-services                                             [jar]
[INFO] trino-spi                                                          [jar]
[INFO] trino-client                                                       [jar]
[INFO] trino-parser                                                       [jar]
[INFO] trino-matching                                                     [jar]
[INFO] trino-plugin-toolkit                                               [jar]
[INFO] trino-array                                                        [jar]
[INFO] trino-geospatial-toolkit                                           [jar]
[INFO] trino-memory-context                                               [jar]
[INFO] trino-tpch                                                [trino-plugin]
[INFO] trino-main                                                         [jar]
[INFO] trino-testing                                                      [jar]
[INFO] trino-orc                                                          [jar]
[INFO] trino-memory                                              [trino-plugin]
[INFO] trino-benchmark                                                    [jar]
[INFO] trino-parquet                                                      [jar]
[INFO] trino-rcfile                                                       [jar]
[INFO] trino-hive                                                         [jar]
[INFO] trino-hive-hadoop2                                        [trino-plugin]
[INFO] trino-jdbc                                                         [jar]
[INFO] trino-proxy                                                        [jar]
[INFO] trino-verifier                                                     [jar]
[INFO] trino-benchto-benchmarks                                           [jar]
[INFO] trino-product-tests                                                [jar]
[INFO] trino-product-tests-launcher                                       [jar]
[INFO] trino-test-jdbc-compatibility-old-driver                           [jar]
[INFO] trino-test-jdbc-compatibility-old-server                           [jar]

where modules like trino-jdbc and trino-test-jdbc-compatibility-old-server and dependants, so their tests would be run.

Note this is only enabled in the ci profile and I'm adding this profile to most jobs. This does NOT affect Product Tests, which would always run with full coverage.

This would probably make the separate docs workflow obsolete.

This is a draft because I'll test different combinations of changes to be absolutely sure it won't reduce coverage.

@cla-bot cla-bot bot added the cla-signed label Dec 16, 2021
@nineinchnick nineinchnick marked this pull request as draft December 16, 2021 12:19
@martint
Copy link
Member

martint commented Dec 17, 2021

Note this is only enabled in the ci profile and I'm adding this profile to most jobs. This does NOT affect Product Tests, which would always run with full coverage.

At some point we're going to have to split product tests into modules so that only the relevant product tests run. Currently, product tests are monolithic -- if something changes in the cassandra connector, all product tests for hive, phoenix, etc will run.

@hashhar hashhar added the WIP label Dec 17, 2021
@kokosing
Copy link
Member

At some point we're going to have to split product tests into modules so that only the relevant product tests run.

I am not sure how that would look like. Can you please elaborate @martint ?

I was thinking about different approach. Currently all product tests environments configuration is known and stored in files, we could analyze that files and extract a list of features given environment is using. Like when you find a properties file with connector.name=hive then you know that it is testing hive. Then if you know what modules have changed, you can only run product tests that are using environments which are using such modules.

@martint
Copy link
Member

martint commented Dec 17, 2021

I am not sure how that would look like. Can you please elaborate @martint ?

By making product tests a framework for building product tests (i.e., the test definitions, runners, etc. might go there), and then splitting them into separate modules, one for each connector (the specific instantiations of tests for each connector would go here).

Copy link
Member

@martint martint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[INFO] trino-benchmark [jar]
[INFO] trino-proxy [jar]
[INFO] trino-verifier [jar]

In the example above, why are those two modules being built? None of those are downstream or upstream dependencies of trino-hive-hadoop2

.github/workflows/ci.yml Outdated Show resolved Hide resolved
pom.xml Outdated Show resolved Hide resolved
@nineinchnick
Copy link
Member Author

[INFO] trino-benchmark [jar]
[INFO] trino-proxy [jar]
[INFO] trino-verifier [jar]

In the example above, why are those two modules being built? None of those are downstream or upstream dependencies of trino-hive-hadoop2

I believe this is because trino-verifier requires trino-jdbc and the latter uses trino-hive-hadoop2 for tests. While including direct dependents with scope test is ok, transitive dependents should only be included for direct dependents with scope compile. But we should discuss this with maintainers of GIB, I think it's out of the scope of this PR. I reported two bugs so far and they've been very responsive and friendly.

@MiguelWeezardo
Copy link
Member

@electrum, @martint Would it be possible to take another look, and merge it if everything is good?

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits and some comments. Fine to merge as-is but let's ensure that we at-least track the actionable items in a follow-up issue.

Looks good to go otherwise.

.github/bin/build-matrix-from-impacted.py Outdated Show resolved Hide resolved
.github/bin/build-matrix-from-impacted.py Outdated Show resolved Hide resolved
.github/bin/build-matrix-from-impacted.py Show resolved Hide resolved
build(args.matrix, args.impacted, args.output)


def build(matrix_file, impacted_file, output_file):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bump, let's create follow-up issues to track

.github/bin/build-matrix-from-impacted.py Outdated Show resolved Hide resolved
pom.xml Show resolved Hide resolved
<buildUpstreamMode>impacted</buildUpstreamMode>
<!-- skip tests and checks for upstream modules since they have not been modified but are still required to be built -->
<skipTestsForUpstreamModules>true</skipTestsForUpstreamModules>
<argsForUpstreamModules>-Dmaven.source.skip=true -Dair.check.skip-all</argsForUpstreamModules>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skip javadocs too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also gib.skipIfPathMatches might be helpful to ensure that change in some files (e.g. .github/workflows?) causes gib to consider all modules as impacted.

Copy link
Member Author

@nineinchnick nineinchnick Feb 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't add it because of these comments: #10323 (comment)

From the gib docs:

This property is helpful for more complex project setups in which not all submodules depend on the root module (= are not direct or indirect children of the root module).

I believe we don't have this kind of modules, right?

Copy link
Member

@hashhar hashhar Feb 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we don't have such modules as far as I can see (though nothing prevents such a module being added in future).
However in current impl we end up running GIB when github/workflows/ci.yml changes for example. And GIB says that no modules changed. We'd want the reverse behaviour for safety IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should treat this file as part of the root module and build everything.

.github/bin/git-fetch-base-ref.sh Show resolved Hide resolved
pom.xml Outdated Show resolved Hide resolved
pom.xml Show resolved Hide resolved
The gitflow-incremental-builder (GIB) Maven extension allows to
only build modules that contain any changes. It compares the current git branch
with the merge base and selects modules with any modified files.

All (upstream) dependencies of impacted modules are built, but without
running tests. All dependant (downstream) modules of impacted modules
are also build and tested. If there are changes in the root module,
everything would be built.

Note this is only enabled in the ci profile.

This does not affect product tests, which would always run with full coverage.
Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

plugin/trino-hive-hadoop2/bin/common.sh Outdated Show resolved Hide resolved
Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from me.

I don't see any remaining unresolved comments so I'll merge once CI is done.
We can keep iterating as needed.

@hashhar hashhar merged commit bd0de0a into trinodb:master Feb 28, 2022
@hashhar
Copy link
Member

hashhar commented Feb 28, 2022

Thanks a lot @nineinchnick (and others who assisted and reviewed).

Let's keep an eye open for any cases where we feel tests are skipped which shouldn't be and we can iterate.

@github-actions github-actions bot added this to the 372 milestone Feb 28, 2022
@nineinchnick
Copy link
Member Author

I messed up, the build on master failed with:

usage: build-matrix-from-impacted.py [-h] [-m MATRIX] [-i IMPACTED]
                                     [-o OUTPUT] [-v]
build-matrix-from-impacted.py: error: argument -i/--impacted: can't open 'gib-impacted.log': [Errno 2] No such file or directory: 'gib-impacted.log'
Error: Process completed with exit code 2.

I'll open up a PR to fix this ASAP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

7 participants