Improving CI/CD test coverage #4989

rafael · 2019-07-12T15:57:21Z

Expanding Vitess Test Coverage

Currently Vitess has a rich set of functional tests that are run as part of every commit to catch regressions early. However, they are not sufficient to assess the quality of the product for production rollout.

Some of the problems with the current approach are:

The impact of a commit on integration scenarios is not caught in a timely manner, and in case of a regression is leaving the code in an unhealthy state. Without a timely detection of this regression, additional commits continue to land on an already unhealthy codebase.
The effort to integrate with the latest codebase is taking longer due to the unknown state of the codebase and uncaught regressions, performance, compatibility issues.

Proposed Testing environments:

Continuous Integration

Frequency: For every commit
Scope: Unit tests and most functionality/integration testing

Currently most tests are run in Travis CI. As tests are converted to Golang, we will move them to use GitHub Actions.

Known Tasks:

Convert Tests to Golang (will be tracked outside this issue)
Run unit tests on all supported MySQL Flavors
Run local_example on Ubuntu/macOS/Centos

We will try and run as many tests in the Continuous Integration suite as possible. GitHub actions allow 20 concurrent tests to be run, we may be able to upgrade for more.

Upgrade/Downgrade Testing

Frequency: Nightly
Scope: Reliability, Backward/Forward compatibility

We currently do not test for incompatibilities introduced via upgrade, and ensuring that users can downgrade one level if they need to backout of a failed upgrade. The scenarios will need to be written down, and then tests can be written using GitHub actions:

Document Supported Upgrade/Downgrade Scenario
Author GitHub actions tests to checkout 2 versions, test scenarios.

Production Readiness

Frequency: Pick a new build every 2 weeks
Scope: Performance, Stress

Longer term we should track regressions in performance as part of automated testing. I suggest we scope this out after we have started upgrade/downgrade tests, as we might learn which scenarios we would like to test against. I am a bit nervous of skew, since as well as being virtualized, there are no promises of exactly what hardware GitHub actions is providing. We may be best served by using physical hardware.

morgo · 2019-07-12T16:17:37Z

I like it! I just have one suggestion: under regression testing, we should test all MySQL flavors and versions Vitess claims to support.

rafael · 2019-07-12T17:32:18Z

Oh that's a great idea. Just added that!

derekperkins · 2019-07-12T18:21:50Z

This is great. I think we should also add release tagging as a part of the production readiness cadence.

morgo · 2019-07-23T22:47:09Z

If everyone is okay with it, I would like to take a look at this after I've finished the release cycle documentation + small documentation refactoring tasks I'm working on (~1-2 weeks time)

rafael · 2019-07-24T19:25:11Z

@morgo yes! I think that makes sense. I see the release cycle/doc as a pre-req for the improvements in this issue.

morgo · 2019-07-25T17:34:45Z

I am going to look at Circle CI & AWS Code Build in scope as well. @dkhenry suggested that if we parallelize the tests more we can run the regression suite on every commit (versus nightly).

That makes sense to me. We can revert the plan to nightly if the cost/time is prohibitive.

morgo · 2019-11-05T22:52:33Z

I am planning to loop back on this. I just want to merge a couple of PRs that change the build/testing environment:

Work is also underway to remove python from the tests. @arindamnayak, @ajeetj and @saurabh408 are all working on it :-)

What I plan to do is first move local_example to use GitHub actions as a true matrix build on supported flavors; since this is python-less. Assuming we can run tests on new infrastructure much faster, we can go with the "everything on commit" plan, which simplifies having to think about things.

aribalam · 2021-08-16T16:28:13Z

@GuptaManan100 I am willing to work on this project. The way I understand, there are 2 tasks.

All the functional tests that are run in Travis CI need to be migrated to Github actions.
We need to create scenarios that requires users to downgrade a level in case of a failed upgrade. Moreover, we need to create Github Actions that will enable us to checkout 2 versions for testing.

Is there anything that needs to be known to better understand the project?

GuptaManan100 · 2021-08-16T16:31:24Z

The mentor for this project is going to be @harshit-gangal. He will be best able to answer your queries.

harshit-gangal · 2021-08-16T18:03:39Z

The Vitess Test Suite runs on Github Actions today, there is no Travis CI anymore.
We need to test the real-world scenario of upgrading to a newer version in any order (Vttablet or Vtgate first) and then downgrading while serving traffic.

aribalam · 2021-08-17T15:14:04Z

@harshit-gangal Great! :)
Could you give me any starting lead on how to get started so as to understand the project better?

deepthi · 2021-08-17T18:11:28Z

This might help: #7344
Upgrade/downgrade scenarios that need testing are documented in that issue.

aribalam · 2021-08-24T11:53:35Z

Okay so I went through #7344, and observed that only 2 of the scenarios still remains to be tested

Upgrade/downgrade subset of vttablets.
Upgrade/downgrade vtgate, vtctld with 1.

Also, it mentions that there needs to be some more explicit testing for vtgate and vtctld in the end-to-end tests itself.
Is there anything else that I failed to mention?
@deepthi @harshit-gangal

morgo self-assigned this Jul 23, 2019

morgo added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: Build/CI labels Oct 26, 2019

morgo mentioned this issue Nov 6, 2019

Add GitHub action for local example #5414

Merged

morgo mentioned this issue Dec 12, 2019

Add matrix build for unit tests #5559

Merged

morgo removed their assignment Dec 3, 2020

GuptaManan100 added the LFX label Aug 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving CI/CD test coverage #4989

Improving CI/CD test coverage #4989

rafael commented Jul 12, 2019 •

edited by morgo

Loading

morgo commented Jul 12, 2019

rafael commented Jul 12, 2019

derekperkins commented Jul 12, 2019

morgo commented Jul 23, 2019

rafael commented Jul 24, 2019

morgo commented Jul 25, 2019

morgo commented Nov 5, 2019

aribalam commented Aug 16, 2021

GuptaManan100 commented Aug 16, 2021

harshit-gangal commented Aug 16, 2021

aribalam commented Aug 17, 2021

deepthi commented Aug 17, 2021 •

edited

Loading

aribalam commented Aug 24, 2021

Improving CI/CD test coverage #4989

Improving CI/CD test coverage #4989

Comments

rafael commented Jul 12, 2019 • edited by morgo Loading

Expanding Vitess Test Coverage

Proposed Testing environments:

Continuous Integration

Upgrade/Downgrade Testing

Production Readiness

morgo commented Jul 12, 2019

rafael commented Jul 12, 2019

derekperkins commented Jul 12, 2019

morgo commented Jul 23, 2019

rafael commented Jul 24, 2019

morgo commented Jul 25, 2019

morgo commented Nov 5, 2019

aribalam commented Aug 16, 2021

GuptaManan100 commented Aug 16, 2021

harshit-gangal commented Aug 16, 2021

aribalam commented Aug 17, 2021

deepthi commented Aug 17, 2021 • edited Loading

aribalam commented Aug 24, 2021

rafael commented Jul 12, 2019 •

edited by morgo

Loading

deepthi commented Aug 17, 2021 •

edited

Loading