Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connectivity: Add upgrade tests #1683

Merged
merged 2 commits into from
Jun 16, 2023
Merged

connectivity: Add upgrade tests #1683

merged 2 commits into from
Jun 16, 2023

Conversation

brb
Copy link
Member

@brb brb commented May 31, 2023

Please see each individual commit msg when reviewing.

Testing in cilium/cilium#25790 (red builds can be ignored, as they are due to Cilium issues which the upgrade tests have caught).

Related cilium/cilium#25037.

@brb brb temporarily deployed to ci May 31, 2023 13:45 — with GitHub Actions Inactive
@brb brb temporarily deployed to ci May 31, 2023 14:25 — with GitHub Actions Inactive
@brb brb temporarily deployed to ci May 31, 2023 14:35 — with GitHub Actions Inactive
@brb brb temporarily deployed to ci May 31, 2023 14:50 — with GitHub Actions Inactive
@brb brb force-pushed the pr/upgrade-testing branch from b274c58 to 3becbf2 Compare June 1, 2023 10:25
@brb brb temporarily deployed to ci June 1, 2023 10:25 — with GitHub Actions Inactive
@brb brb force-pushed the pr/upgrade-testing branch from 3becbf2 to 4466e28 Compare June 7, 2023 13:58
@brb brb temporarily deployed to ci June 7, 2023 13:58 — with GitHub Actions Inactive
@brb brb force-pushed the pr/upgrade-testing branch from 4466e28 to 9582828 Compare June 7, 2023 15:14
@brb brb temporarily deployed to ci June 7, 2023 15:14 — with GitHub Actions Inactive
@brb brb force-pushed the pr/upgrade-testing branch from 9582828 to 78d1f90 Compare June 8, 2023 14:07
@brb brb temporarily deployed to ci June 8, 2023 14:08 — with GitHub Actions Inactive
@brb brb changed the title WIP: connectivity: Upgrade tests connectivity: Add upgrade tests Jun 8, 2023
@brb brb added the area/CI Continuous Integration testing issue or flake label Jun 8, 2023
@brb brb marked this pull request as ready for review June 8, 2023 14:08
@brb brb requested a review from a team as a code owner June 8, 2023 14:08
@brb brb requested review from asauber and aanm June 8, 2023 14:08
Copy link
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also have instructions on which order we should run the commands?

connectivity/tests/upgrade.go Outdated Show resolved Hide resolved
connectivity/tests/upgrade.go Outdated Show resolved Hide resolved
@brb brb force-pushed the pr/upgrade-testing branch from 78d1f90 to 4c228f7 Compare June 15, 2023 13:34
@brb brb temporarily deployed to ci June 15, 2023 13:34 — with GitHub Actions Inactive
@brb
Copy link
Member Author

brb commented Jun 15, 2023

Can we also have instructions on which order we should run the commands?

It's in the last commit msg 😎

@brb brb requested a review from aanm June 15, 2023 13:35
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 15, 2023
for pod, count := range restartCount {
if prevCount, found := prevRestartCount[pod]; !found {
t.Fatalf("Could not found Pod %s restart count", pod)
} else if prevCount != count {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is it about Pod restarts here that implies that a connection was reset? Is there a one-shot connection, which if it fails, fails a livenessProve for the Pod?

A link to the manifest or documentation about this would be helpful here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, a Pod restart means that a long-lived connection was interrupted. I put some docs in the beginning of this file. Please let me know if more details are needed.

For now I reused the same migrate-svc deployments from the ginkgo test suite. In the future, we should at least rename them, document and extend their use-cases (e.g., to test N/S traffic interruptions).

Copy link
Member

@asauber asauber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for the most part. Commented with some suggested improvements.

brb added 2 commits June 16, 2023 07:36
To flush CT of each Cilium node after running all connectivity tests.
It's needed when running connectivity tests multiple times on the same
cluster, and the L7 netpol tests might interfere with other tests [1].

[1]: cilium/cilium#17459

Signed-off-by: Martynas Pumputis <[email protected]>
The new test case checks whether there are no interruptions in
long-lived E/W LB flows.

The upgrade test consists of two steps:

* "cli connectivity test --include-upgrade-test --upgrade-test-setup":
  deploys migrate-svc (to be renamed) pods, and stores restart counters.
* Do Cilium upgrade
* "cli connectivity test --include-upgrade-test --test post-upgrade":
  checks restart counters of migrate-svc pods, and compares against the
  previously stored counters (counters mismatch means interruptions in
  flow)

Signed-off-by: Martynas Pumputis <[email protected]>
@brb brb force-pushed the pr/upgrade-testing branch from 4c228f7 to 9e06e99 Compare June 16, 2023 04:37
@brb brb temporarily deployed to ci June 16, 2023 04:38 — with GitHub Actions Inactive
@tklauser tklauser merged commit 6c2d407 into main Jun 16, 2023
@tklauser tklauser deleted the pr/upgrade-testing branch June 16, 2023 08:57
cmd.Flags().BoolVar(&params.IncludeUpgradeTest, "include-upgrade-test", false, "Include upgrade test")
cmd.Flags().BoolVar(&params.UpgradeTestSetup, "upgrade-test-setup", false, "Set up upgrade test dependencies")
cmd.Flags().StringVar(&params.UpgradeTestResultPath, "upgrade-test-result-path", "/tmp/cilium-upgrade-test-restart-counts", "Upgrade test temporary result file (used internally)")
cmd.Flags().BoolVar(&params.FlushCT, "flush-ct", false, "Flush conntrack of Cilium on each node")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we hide this flag given connectivity tests are supposed to be usable by folks in their production cluster without side-effects? The blast radius of a misuse here is quite high.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see yes, this known issue: cilium/cilium#17459 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants