Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Github Actions CI Builds: convert matrix strategy for unit and cluster tests to individual tests #7258

Merged
merged 10 commits into from
Jan 12, 2021

Conversation

rohit-nayak-ps
Copy link
Contributor

@rohit-nayak-ps rohit-nayak-ps commented Jan 6, 2021

Signed-off-by: Rohit Nayak [email protected]

Description

Currently our tests use the matrix strategy feature of Github Workflows (https://docs.github.com/en/free-pro-team@latest/actions/reference/workflow-syntax-for-github-actions#jobsjob_idstrategymatrix). The advantage of a matrix is that you can define similar tests with a single yaml file per cluster. Unit tests and Cluster end to end tests, each have their own matrix. This was originally created assuming that the tests in a matrix are functionally related: if one of these tests fail it doesn't make sense to run the rest within that matrix.

However, if one of these child tests fail all the parent tests are canceled. github Actions only allows you to restart all tests in a workflow: it is not possible to run just one of the child tests. Unfortunately we often have flaky tests that fail periodically, usually within the cluster matrix. When this happens all cluster tests need to be rerun, even though all the others have passed. And it is possible, in the re-run, another flaky test fails, we end up having to run the whole set again.

This PR splits the tests within the clusters to individual ones. Each test now needs its own yaml file. A code-generator script has been written to create these yaml files. A new make file action generate_ci_workflows generates these files. We need to rebuild only if the templates change or if a new test "shard" is created. We do not need to rebuild tests if only the test/config.json is changed by adding a new test to an existing shard. Any new or modified files need to be committed into git.

ALERT: Read before merging

Currently all tests are running to completion in CI. However the old tests appear to be setup as prerequisites in the repository settings (https://github.community/t/github-actions-pull-requests-always-include-some-outdated-checks/16157). Hence the tests are not marked as completed and one cannot merge any PRs unless the settings are updated. Only an admin can update this: we need to make sure those settings are updated correctly, otherwise all PRs will be stuck.

Checklist

  • Should this PR be backported?
  • Tests were added or are not required
  • Documentation was added or is not required

Impacted Areas in Vitess

Components that this PR will affect:

  • Query Serving
  • VReplication
  • Cluster Management
  • Build (CI Actions)

@deepthi
Copy link
Member

deepthi commented Jan 7, 2021

This is great! I can take care of the admin part when the PR is ready.

@rohit-nayak-ps rohit-nayak-ps force-pushed the rn-make-tests-individual branch from 67eb271 to a1d35f1 Compare January 7, 2021 13:13
Signed-off-by: Rohit Nayak <[email protected]>
@rohit-nayak-ps rohit-nayak-ps changed the title WIP: convert matrix strategy for CI to individual tests Github Actions CI Builds: convert matrix strategy for unit and cluster tests to individual tests Jan 7, 2021
@rohit-nayak-ps rohit-nayak-ps marked this pull request as ready for review January 7, 2021 17:57
@rohit-nayak-ps rohit-nayak-ps requested a review from sougou as a code owner January 7, 2021 17:57
Copy link
Contributor

@shlomi-noach shlomi-noach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rohit-nayak-ps this is awesome! Thank you for doing this.

Question: can you please clarify what the new role of config.json is, if any? Previously it was used to determine which test runs on each matrix shard. Should we remove config.json and just hard code inside each test what it's expected to do?

As followup to the above question, I think that in a future PR (not this one) we should further rename test files according to their purpose. For example, .github/workflows/cluster_endtoend_26.yml can be renamed to .github/workflows/cluster_endtoend_online_ddl.yml because it really only tests Online DDL. I know some other shards are more mixed and not all have a theme.

@rohit-nayak-ps
Copy link
Contributor Author

Question: can you please clarify what the new role of config.json is, if any? Previously it was used to determine which test runs on each matrix shard. Should we remove config.json and just hard code inside each test what it's expected to do?

config.json is orthogonal to the workflow templates that this script generates. It is used by the vitess test runner whereas all this PR does is to change the workflow definitions for our CI.

As followup to the above question, I think that in a future PR (not this one) we should further rename test files according to their purpose. For example, .github/workflows/cluster_endtoend_26.yml can be renamed to .github/workflows/cluster_endtoend_online_ddl.yml because it really only tests Online DDL. I know some other shards are more mixed and not all have a theme.

Great idea. I think it might be worth refactoring config.json so that we have a single config for both the runner and the workflow yaml generator. At that time we can also add a friendly name to each test shard (and maybe use a different term than shard!) and use that name for the CI workflow so that it makes more sense in the CI dashboard. You are right that some test shards do run multiple tests, not all of them related to a single module, but that can be addressed in some way too. I will look into it:

@shlomi-noach
Copy link
Contributor

shard -> (topic | section | segment | module | slice)

@shlomi-noach
Copy link
Contributor

Should I go ahead and merge it? (I have Admin rights)

@rohit-nayak-ps
Copy link
Contributor Author

Probably @deepthi should also review this since she was involved with the test framework development.

Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@deepthi deepthi merged commit 5d863a7 into vitessio:master Jan 12, 2021
@deepthi deepthi deleted the rn-make-tests-individual branch January 12, 2021 00:16
jobs:

build:
runs-on: ubuntu-latest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rohit-nayak-ps
Can we add to the template something like

name: Run endtoend tests on {{.Name}}

It turns out that once the PR is merged the top-level name is not displayed any more, only the name from here is displayed and it defaults to build.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #7289

@deepthi deepthi mentioned this pull request Jan 12, 2021
8 tasks
@askdba askdba added this to the v9.0 milestone Jan 19, 2021
@shlomi-noach
Copy link
Contributor

Please note that as of #7324, shard names are strings, not numbers. so you can have a shard named "vreplication_basic" rather than 29.

@harshit-gangal harshit-gangal restored the rn-make-tests-individual branch June 21, 2021 06:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants