-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Github Actions CI Builds: convert matrix strategy for unit and cluster tests to individual tests #7258
Github Actions CI Builds: convert matrix strategy for unit and cluster tests to individual tests #7258
Conversation
This is great! I can take care of the admin part when the PR is ready. |
Signed-off-by: Rohit Nayak <[email protected]>
Signed-off-by: Rohit Nayak <[email protected]>
Signed-off-by: Rohit Nayak <[email protected]>
Signed-off-by: Rohit Nayak <[email protected]>
Signed-off-by: Rohit Nayak <[email protected]>
Signed-off-by: Rohit Nayak <[email protected]>
Signed-off-by: Rohit Nayak <[email protected]>
67eb271
to
a1d35f1
Compare
Signed-off-by: Rohit Nayak <[email protected]>
Signed-off-by: Rohit Nayak <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rohit-nayak-ps this is awesome! Thank you for doing this.
Question: can you please clarify what the new role of config.json
is, if any? Previously it was used to determine which test runs on each matrix shard. Should we remove config.json
and just hard code inside each test what it's expected to do?
As followup to the above question, I think that in a future PR (not this one) we should further rename test files according to their purpose. For example, .github/workflows/cluster_endtoend_26.yml
can be renamed to .github/workflows/cluster_endtoend_online_ddl.yml
because it really only tests Online DDL. I know some other shards are more mixed and not all have a theme.
Signed-off-by: Rohit Nayak <[email protected]>
Great idea. I think it might be worth refactoring config.json so that we have a single config for both the runner and the workflow yaml generator. At that time we can also add a friendly name to each test shard (and maybe use a different term than shard!) and use that name for the CI workflow so that it makes more sense in the CI dashboard. You are right that some test shards do run multiple tests, not all of them related to a single module, but that can be addressed in some way too. I will look into it: |
shard -> (topic | section | segment | module | slice) |
Should I go ahead and merge it? (I have Admin rights) |
Probably @deepthi should also review this since she was involved with the test framework development. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
jobs: | ||
|
||
build: | ||
runs-on: ubuntu-latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rohit-nayak-ps
Can we add to the template something like
name: Run endtoend tests on {{.Name}}
It turns out that once the PR is merged the top-level name is not displayed any more, only the name from here is displayed and it defaults to build
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created #7289
Please note that as of #7324, shard names are strings, not numbers. so you can have a shard named |
Signed-off-by: Rohit Nayak [email protected]
Description
Currently our tests use the matrix strategy feature of Github Workflows (https://docs.github.com/en/free-pro-team@latest/actions/reference/workflow-syntax-for-github-actions#jobsjob_idstrategymatrix). The advantage of a matrix is that you can define similar tests with a single yaml file per cluster. Unit tests and Cluster end to end tests, each have their own matrix. This was originally created assuming that the tests in a matrix are functionally related: if one of these tests fail it doesn't make sense to run the rest within that matrix.
However, if one of these child tests fail all the parent tests are canceled. github Actions only allows you to restart all tests in a workflow: it is not possible to run just one of the child tests. Unfortunately we often have flaky tests that fail periodically, usually within the cluster matrix. When this happens all cluster tests need to be rerun, even though all the others have passed. And it is possible, in the re-run, another flaky test fails, we end up having to run the whole set again.
This PR splits the tests within the clusters to individual ones. Each test now needs its own yaml file. A code-generator script has been written to create these yaml files. A new make file action
generate_ci_workflows
generates these files. We need to rebuild only if the templates change or if a new test "shard" is created. We do not need to rebuild tests if only the test/config.json is changed by adding a new test to an existing shard. Any new or modified files need to be committed into git.ALERT: Read before merging
Currently all tests are running to completion in CI. However the old tests appear to be setup as prerequisites in the repository settings (https://github.community/t/github-actions-pull-requests-always-include-some-outdated-checks/16157). Hence the tests are not marked as completed and one cannot merge any PRs unless the settings are updated. Only an admin can update this: we need to make sure those settings are updated correctly, otherwise all PRs will be stuck.
Checklist
Impacted Areas in Vitess
Components that this PR will affect: