Test various upgrade scenarios #1580

sbodagala · 2023-04-10T14:51:25Z

Description

Write tests that cover the following scenarios:

Test status json in the context of version incompatible upgrades.
Test that restarts multiple processes (a storage process and multiple stateless processes) during the stage phase
Test that tests cluster generation number during upgrade

Type of change

Other (adds an upgrade test; no code changes)

Discussion

Are there any design details that you would like to discuss further?
No

Testing

Ran the test manually.

Documentation

Did you update relevant documentation within this repository?
N/A

If this change is adding new functionality, do we need to describe it in our user manual?
N/A

If this change is adding or removing subreconcilers, have we updated the core technical design doc to reflect that?
N/A

If this change is adding new safety checks or new potential failure modes, have we documented and how to debug potential issues?
N/A

Follow-up

Are there any follow-up issues that we should pursue in the future?
No

Does this introduce new defaults that we should re-evaluate in the future?
No

sbodagala · 2023-04-10T14:56:10Z

Ran the test manually, and it failed with this error: "invariant InvariantClusterStatusAvailableWithThreshold failed".

It appears to me the upgrade completed successfully. Here's the output of "kubectl-fdb analyze" over the cluster after the test failed with the above error:

kubectl-fdb analyze fdb-cluster-cz8blukk -n sre-s3xfe49r

Checking cluster: sre-s3xfe49r/fdb-cluster-cz8blukk
✔ Cluster is available
✔ Cluster is fully replicated
✔ Cluster is reconciled
✔ ProcessGroups are all in ready condition
✔ Pods are all running and available
Checking cluster: sre-s3xfe49r/fdb-cluster-cz8blukk with auto-fix: false

foundationdb-ci · 2023-04-10T19:02:14Z

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

Commit ID: dc0ef82
Duration 4:10:42
Result: ❌ FAILED
Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

upgrades

and multiple stateless processes) during the stage phase

foundationdb-ci · 2023-04-12T00:50:08Z

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

Commit ID: eeef335
Duration 4:11:17
Result: ❌ FAILED
Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci · 2023-04-12T00:58:50Z

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

Commit ID: 7736889
Duration 3:08:27
Result: ❌ FAILED
Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

e2e/test_operator_upgrades/operator_upgrades_test.go

foundationdb-ci · 2023-04-24T22:51:59Z

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

Commit ID: 2f560b6
Duration 3:06:47
Result: ❌ FAILED
Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci · 2023-04-25T03:00:49Z

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

Commit ID: 8f32426
Duration 4:10:06
Result: ❌ FAILED
Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

johscheuer

Could we split up the tests into separate PRs? Otherwise one single test will block the PR from being merged.

e2e/test_operator_upgrades/operator_upgrades_test.go

e2e/test_operator_ha_upgrades/operator_ha_upgrade_test.go

sbodagala · 2023-04-28T15:22:29Z

Uploaded the latest version, please take a look. Thanks!

johscheuer

Changes looks fine to me, let's wait for the test result 👍

foundationdb-ci · 2023-04-28T18:06:14Z

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

Commit ID: e379366
Duration 2:50:59
Result: ❌ FAILED
Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

sbodagala · 2023-04-28T19:38:19Z

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

Commit ID: e379366

Duration 2:50:59

Result: ❌ FAILED

Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1

Build Log terminal output (available for 30 days)

Build Workspace zip file of the working directory (available for 30 days)

Reports the following failures:

operator_ha_upgrade_test.go:

Test "upgrading a cluster with operator pod chaos and without foundationdb pod chaos"" failed because "Cluster.Generation" after upgrade is 37, instead of 19.

NOTE: I modified "Upgrading a multi-DC cluster without chaos" test to check "Cluster.Generation" after upgrade - again, "Cluster.Generation" is 34, instead of 19.

operator_upgrades_test.go:

Test "upgrading a cluster where a storage and multiple stateless processes get restarted during the staging phase Upgrade" failed on this error:

<*errors.errorString | 0xc0006c1220>: {
s: "invariant InvariantClusterStatusAvailableWithThreshold failed",
}
invariant InvariantClusterStatusAvailableWithThreshold failed

sbodagala · 2023-05-01T19:45:37Z

More on the failure in operator_ha_upgrade_test.go: I don't see any helpful information in the test output, but I do see recoveries (on Splunk) seconds apart (like these: 2023-05-01T16:01:30Z and 2023-05-01T16:01:32Z; these are the timestamps of Type "MasterRecoveryState" with "Status: reading_coordinated_state"). I think this is the result of server processes getting bounced at different, but relatively closer, timestamps.

sbodagala · 2023-05-01T21:33:54Z

Test "upgrading a cluster where a storage and multiple stateless processes get restarted during the staging phase Upgrade" failed on this error:

<*errors.errorString | 0xc0006c1220>: {
s: "invariant InvariantClusterStatusAvailableWithThreshold failed",
}
invariant InvariantClusterStatusAvailableWithThreshold failed

Ran this test locally multiple times, they all succeeded. So the failure reported by CI might not be related to this specific test.

foundationdb-ci · 2023-05-02T00:25:42Z

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

Commit ID: 2b65a07
Duration 4:10:22
Result: ❌ FAILED
Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

johscheuer · 2023-05-02T08:05:30Z

Seems like the last test run hit some issues. I try another run.

johscheuer

One the e2e test pipeline passes we can merge this PR 👍

e2e/test_operator_ha_upgrades/operator_ha_upgrade_test.go

foundationdb-ci · 2023-05-02T12:01:39Z

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

Commit ID: 2b65a07
Duration 3:56:49
Result: ❌ FAILED
Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

johscheuer · 2023-05-02T12:29:17Z

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

Commit ID: 2b65a07

Duration 3:56:49

Result: ❌ FAILED

Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1

Build Log terminal output (available for 30 days)

Build Workspace zip file of the working directory (available for 30 days)

Summarizing 1 Failure:
  [FAIL] Operator Upgrades [AfterEach] upgrading a cluster where a storage and multiple stateless processes get restarted during the staging phase Upgrade from 6.3.25 to 7.1.27 with a storage and multiple stateless processes restarted during the staging phase [e2e]
  /codebuild/output/src828282037/src/github.com/FoundationDB/fdb-kubernetes-operator/e2e/fixtures/fixtures.go:62

The failure is:

2023/05/02 10:16:22 reconciled name=fdb-cluster-p5yg7xfw, namespace=test-operator-upgrades-845-1w5pkh73
  [FAILED] in [AfterEach] - /codebuild/output/src828282037/src/github.com/FoundationDB/fdb-kubernetes-operator/e2e/fixtures/fixtures.go:62 @ 05/02/23 10:16:37.822
• [FAILED] [748.764 seconds]
Operator Upgrades [AfterEach] upgrading a cluster where a storage and multiple stateless processes get restarted during the staging phase Upgrade from 6.3.25 to 7.1.27 with a storage and multiple stateless processes restarted during the staging phase [e2e]
  [AfterEach] /codebuild/output/src828282037/src/github.com/FoundationDB/fdb-kubernetes-operator/e2e/test_operator_upgrades/operator_upgrades_test.go:103
  [It] /codebuild/output/src828282037/src/github.com/FoundationDB/fdb-kubernetes-operator/e2e/fixtures/upgrade_test_configuration.go:104

  [FAILED] Unexpected error:
      <*errors.errorString | 0xc000098090>: {
          s: "invariant InvariantClusterStatusAvailableWithThreshold failed",
      }
      invariant InvariantClusterStatusAvailableWithThreshold failed
  occurred
  In [AfterEach] at: /codebuild/output/src828282037/src/github.com/FoundationDB/fdb-kubernetes-operator/e2e/fixtures/fixtures.go:62 @ 05/02/23 10:16:37.822
------------------------------

e2e/test_operator_upgrades/operator_upgrades_test.go

foundationdb-ci · 2023-05-02T20:48:12Z

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

Commit ID: e8cd310
Duration 2:43:26
Result: ✅ SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

- Test upgrading a cluster when no storage processes are restarted

dc0ef82

sbodagala requested a review from johscheuer April 10, 2023 14:51

sbodagala added 2 commits April 11, 2023 20:35

- Add tests to test status json in the context of version incompatible

eeef335

upgrades

- Add a test that restarts multiple processes (a storage process

7736889

and multiple stateless processes) during the stage phase

johscheuer reviewed Apr 14, 2023

View reviewed changes

sbodagala changed the title ~~Test upgrading a cluster when no storage processes are restarted~~ Test various upgrade scenarios Apr 21, 2023

sbodagala added 2 commits April 24, 2023 19:42

- Modify tests (based on review comments in the PR)

2f560b6

- Add a test that tests cluster generation number during upgrade

8f32426

johscheuer requested changes Apr 26, 2023

View reviewed changes

johscheuer reviewed Apr 26, 2023

View reviewed changes

- Address review comments

e379366

johscheuer reviewed Apr 28, 2023

View reviewed changes

johscheuer self-requested a review April 28, 2023 15:51

- Get a failing test to succeed.

2b65a07

johscheuer closed this May 2, 2023

johscheuer reopened this May 2, 2023

johscheuer reviewed May 2, 2023

View reviewed changes

e2e/test_operator_ha_upgrades/operator_ha_upgrade_test.go Show resolved Hide resolved

johscheuer requested changes May 2, 2023

View reviewed changes

e2e/test_operator_upgrades/operator_upgrades_test.go Outdated Show resolved Hide resolved

- Address a review comment (remove an upgrade test)

e8cd310

johscheuer approved these changes May 3, 2023

View reviewed changes

johscheuer merged commit fa57a33 into FoundationDB:main May 3, 2023

johscheuer mentioned this pull request May 3, 2023

Validate why HA upgrade test case is causing the generations to increase more than expected #1606

Closed

sbodagala mentioned this pull request May 3, 2023

Investigate the number of recoveries that happen during an HA cluster upgrade #1607

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test various upgrade scenarios #1580

Test various upgrade scenarios #1580

sbodagala commented Apr 10, 2023 •

edited

Loading

sbodagala commented Apr 10, 2023

kubectl-fdb analyze fdb-cluster-cz8blukk -n sre-s3xfe49r

foundationdb-ci commented Apr 10, 2023

foundationdb-ci commented Apr 12, 2023

foundationdb-ci commented Apr 12, 2023

foundationdb-ci commented Apr 24, 2023

foundationdb-ci commented Apr 25, 2023

johscheuer left a comment

sbodagala commented Apr 28, 2023

johscheuer left a comment

foundationdb-ci commented Apr 28, 2023

sbodagala commented Apr 28, 2023

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

sbodagala commented May 1, 2023

sbodagala commented May 1, 2023

foundationdb-ci commented May 2, 2023

johscheuer commented May 2, 2023

johscheuer left a comment

foundationdb-ci commented May 2, 2023

johscheuer commented May 2, 2023 •

edited

Loading

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

foundationdb-ci commented May 2, 2023

Test various upgrade scenarios #1580

Test various upgrade scenarios #1580

Conversation

sbodagala commented Apr 10, 2023 • edited Loading

Description

Type of change

Discussion

Testing

Documentation

Follow-up

sbodagala commented Apr 10, 2023

kubectl-fdb analyze fdb-cluster-cz8blukk -n sre-s3xfe49r

foundationdb-ci commented Apr 10, 2023

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

foundationdb-ci commented Apr 12, 2023

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

foundationdb-ci commented Apr 12, 2023

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

foundationdb-ci commented Apr 24, 2023

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

foundationdb-ci commented Apr 25, 2023

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

johscheuer left a comment

Choose a reason for hiding this comment

sbodagala commented Apr 28, 2023

johscheuer left a comment

Choose a reason for hiding this comment

foundationdb-ci commented Apr 28, 2023

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

sbodagala commented Apr 28, 2023

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

sbodagala commented May 1, 2023

sbodagala commented May 1, 2023

foundationdb-ci commented May 2, 2023

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

johscheuer commented May 2, 2023

johscheuer left a comment

Choose a reason for hiding this comment

foundationdb-ci commented May 2, 2023

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

johscheuer commented May 2, 2023 • edited Loading

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

foundationdb-ci commented May 2, 2023

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

sbodagala commented Apr 10, 2023 •

edited

Loading

johscheuer commented May 2, 2023 •

edited

Loading