Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: increase overall test timeouts for all OnlineDDL tests #12584

Merged

Conversation

shlomi-noach
Copy link
Contributor

Description

Due to recent GitHub CI runners slowness, we're seeing some tests time out after 20min of running. These tests normally run for 5-6 min on a local dev env, and 20min used to give good margins. Not anymore.

This PR increases all Online DDL related tests timeouts to 30min.

Related Issue(s)

Checklist

  • "Backport to:" labels have been added if this change should be back-ported
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on the CI
  • Documentation was added or is not required

Deployment Notes

@vitess-bot
Copy link
Contributor

vitess-bot bot commented Mar 9, 2023

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • If this is a change that users need to know about, please apply the release notes (needs details) label so that merging is blocked unless the summary release notes document is included.
  • If a test is added or modified, there should be a documentation on top of the test to explain what the expected behavior is what the test does.

If a new flag is being introduced:

  • Is it really necessary to add this flag?
  • Flag names should be clear and intuitive (as far as possible)
  • Help text should be descriptive.
  • Flag names should use dashes (-) as word separators rather than underscores (_).

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow should be required, the maintainer team should be notified.

Bug fixes

  • There should be at least one unit or end-to-end test.
  • The Pull Request description should include a link to an issue that describes the bug.

Non-trivial changes

  • There should be some code comments as to why things are implemented the way they are.

New/Existing features

  • Should be documented, either by modifying the existing documentation or creating new documentation.
  • New features should have a link to a feature request issue or an RFC that documents the use cases, corner cases and test cases.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • vtctl command output order should be stable and awk-able.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from VTop, if used there.

@shlomi-noach shlomi-noach added Component: Online DDL Online DDL (vitess/native/gh-ost/pt-osc) and removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says labels Mar 9, 2023
@frouioui
Copy link
Member

frouioui commented Mar 9, 2023

Same question as in #12583 (comment). I think we should backport this to release-14.0, release-15.0, and release-16.0, WDYT?

@shlomi-noach
Copy link
Contributor Author

backporting this to 16, 15, 14

Copy link
Member

@frouioui frouioui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am seeing the different workflows you are changing have a step called Run cluster endtoend test which has a timeout of 45 minutes by default:

Should we also increase this variable to leave more time for the tests to complete?

Othewise, LGTM

@shlomi-noach
Copy link
Contributor Author

Should we also increase this variable to leave more time for the tests to complete?

@frouioui so this timeout comes from the tests templates:

and

These affect all tests -- should we bump the timeout for everything? I don't mind, just making sure.

@frouioui
Copy link
Member

frouioui commented Mar 9, 2023

@shlomi-noach, I looked at it a bit deeper, and test.go has a -timeout flag that defines the timeout for each test with a default value of 30 minutes. I think it is ok to remove the -timeout 30m argument you're adding, and leave that to test.go, but I don't have strong opinions.

These affect all tests -- should we bump the timeout for everything? I don't mind, just making sure.

All the shards you are changing, expect schemadiff_vrepl, are composed of only one test, so I think it is okay to leave the CI timeout as it is (45 minutes). For shard schemadiff_vrepl, there are two tests: sidecardb and schemadiff_vrepl, I don't think the sum of the two will go above 45 minutes, but if it does we can change the timeout later.

@frouioui
Copy link
Member

frouioui commented Mar 9, 2023

Should we also increase this variable to leave more time for the tests to complete?

So I take back what I said, I think let's not increase it :)

@shlomi-noach
Copy link
Contributor Author

shlomi-noach commented Mar 9, 2023

I think it is ok to remove the -timeout 30m argument you're adding

In https://github.com/vitessio/vitess/actions/runs/4360811844/jobs/7624094786'e

I see:

2023-03-08T03:51:43.1153959Z panic: test timed out after 20m0s

I otherwise know from expereience that the timeout in test/config.json does indeed set an upper bound on your test, and it's been my practice to increase it (due to timeouts) as I was developing more long running tests.

@frouioui
Copy link
Member

frouioui commented Mar 9, 2023

I otherwise know from expereience that the timeout in test/config.json does indeed set an upper bound on your test, and it's been my practice to increase it (due to timeouts) as I was developing more long running tests.

Okay! That makes sense 🙏🏻

@shlomi-noach
Copy link
Contributor Author

@frouioui oh, I think I understand better -- you're saying that 30min is the default, and that, previously "-timeout", "20m" overrode that to get a shorter timeout?

I didn't know that.

At any case, I prefer to keep the timeout explicit in test/confg.json. This way we don't lose information and intention.

@shlomi-noach shlomi-noach added Component: Online DDL Online DDL (vitess/native/gh-ost/pt-osc) and removed Component: Online DDL Online DDL (vitess/native/gh-ost/pt-osc) labels Mar 9, 2023
@frouioui
Copy link
Member

frouioui commented Mar 9, 2023

@shlomi-noach, this is what I meant yes. But yes that makes sense to keep the timeout explicit in the config.json file, that's why I don't really have strong opinions, I like both ways :)

@shlomi-noach shlomi-noach merged commit cbca36c into vitessio:main Mar 9, 2023
@shlomi-noach shlomi-noach deleted the ci-increase-onlineddl-timeouts branch March 9, 2023 08:31
@vitess-bot
Copy link
Contributor

vitess-bot bot commented Mar 9, 2023

I was unable to backport this Pull Request to the following branches: release-14.0, release-15.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants