Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving CI/CD test coverage #4989

Open
2 of 5 tasks
rafael opened this issue Jul 12, 2019 · 13 comments
Open
2 of 5 tasks

Improving CI/CD test coverage #4989

rafael opened this issue Jul 12, 2019 · 13 comments
Labels
Component: Build/CI LFX Type: Enhancement Logical improvement (somewhere between a bug and feature)

Comments

@rafael
Copy link
Member

rafael commented Jul 12, 2019

Expanding Vitess Test Coverage

Currently Vitess has a rich set of functional tests that are run as part of every commit to catch regressions early. However, they are not sufficient to assess the quality of the product for production rollout.

Some of the problems with the current approach are:

  • The impact of a commit on integration scenarios is not caught in a timely manner, and in case of a regression is leaving the code in an unhealthy state. Without a timely detection of this regression, additional commits continue to land on an already unhealthy codebase.
  • The effort to integrate with the latest codebase is taking longer due to the unknown state of the codebase and uncaught regressions, performance, compatibility issues.

Proposed Testing environments:

Continuous Integration

Frequency: For every commit
Scope: Unit tests and most functionality/integration testing

Currently most tests are run in Travis CI. As tests are converted to Golang, we will move them to use GitHub Actions.

Known Tasks:

  • Convert Tests to Golang (will be tracked outside this issue)
  • Run unit tests on all supported MySQL Flavors
  • Run local_example on Ubuntu/macOS/Centos

We will try and run as many tests in the Continuous Integration suite as possible. GitHub actions allow 20 concurrent tests to be run, we may be able to upgrade for more.

Upgrade/Downgrade Testing

Frequency: Nightly
Scope: Reliability, Backward/Forward compatibility

We currently do not test for incompatibilities introduced via upgrade, and ensuring that users can downgrade one level if they need to backout of a failed upgrade. The scenarios will need to be written down, and then tests can be written using GitHub actions:

  • Document Supported Upgrade/Downgrade Scenario
  • Author GitHub actions tests to checkout 2 versions, test scenarios.

Production Readiness

Frequency: Pick a new build every 2 weeks
Scope: Performance, Stress

Longer term we should track regressions in performance as part of automated testing. I suggest we scope this out after we have started upgrade/downgrade tests, as we might learn which scenarios we would like to test against. I am a bit nervous of skew, since as well as being virtualized, there are no promises of exactly what hardware GitHub actions is providing. We may be best served by using physical hardware.

@morgo
Copy link
Contributor

morgo commented Jul 12, 2019

I like it! I just have one suggestion: under regression testing, we should test all MySQL flavors and versions Vitess claims to support.

@rafael
Copy link
Member Author

rafael commented Jul 12, 2019

Oh that's a great idea. Just added that!

@derekperkins
Copy link
Member

This is great. I think we should also add release tagging as a part of the production readiness cadence.

@morgo morgo self-assigned this Jul 23, 2019
@morgo
Copy link
Contributor

morgo commented Jul 23, 2019

If everyone is okay with it, I would like to take a look at this after I've finished the release cycle documentation + small documentation refactoring tasks I'm working on (~1-2 weeks time)

@rafael
Copy link
Member Author

rafael commented Jul 24, 2019

@morgo yes! I think that makes sense. I see the release cycle/doc as a pre-req for the improvements in this issue.

@morgo
Copy link
Contributor

morgo commented Jul 25, 2019

I am going to look at Circle CI & AWS Code Build in scope as well. @dkhenry suggested that if we parallelize the tests more we can run the regression suite on every commit (versus nightly).

That makes sense to me. We can revert the plan to nightly if the cost/time is prohibitive.

@morgo morgo added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: Build/CI labels Oct 26, 2019
@morgo
Copy link
Contributor

morgo commented Nov 5, 2019

I am planning to loop back on this. I just want to merge a couple of PRs that change the build/testing environment:

Work is also underway to remove python from the tests. @arindamnayak, @ajeetj and @saurabh408 are all working on it :-)

What I plan to do is first move local_example to use GitHub actions as a true matrix build on supported flavors; since this is python-less. Assuming we can run tests on new infrastructure much faster, we can go with the "everything on commit" plan, which simplifies having to think about things.

@aribalam
Copy link

@GuptaManan100 I am willing to work on this project. The way I understand, there are 2 tasks.

  1. All the functional tests that are run in Travis CI need to be migrated to Github actions.
  2. We need to create scenarios that requires users to downgrade a level in case of a failed upgrade. Moreover, we need to create Github Actions that will enable us to checkout 2 versions for testing.

Is there anything that needs to be known to better understand the project?

@GuptaManan100
Copy link
Member

The mentor for this project is going to be @harshit-gangal. He will be best able to answer your queries.

@harshit-gangal
Copy link
Member

The Vitess Test Suite runs on Github Actions today, there is no Travis CI anymore.
We need to test the real-world scenario of upgrading to a newer version in any order (Vttablet or Vtgate first) and then downgrading while serving traffic.

@aribalam
Copy link

@harshit-gangal Great! :)
Could you give me any starting lead on how to get started so as to understand the project better?

@deepthi
Copy link
Member

deepthi commented Aug 17, 2021

This might help: #7344
Upgrade/downgrade scenarios that need testing are documented in that issue.

@aribalam
Copy link

Okay so I went through #7344, and observed that only 2 of the scenarios still remains to be tested

  1. Upgrade/downgrade subset of vttablets.
  2. Upgrade/downgrade vtgate, vtctld with 1.

Also, it mentions that there needs to be some more explicit testing for vtgate and vtctld in the end-to-end tests itself.
Is there anything else that I failed to mention?
@deepthi @harshit-gangal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Build/CI LFX Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

No branches or pull requests

7 participants