Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure we upgrade process groups that are marked for removal during upgrades #1603

Conversation

johscheuer
Copy link
Member

@johscheuer johscheuer commented Apr 27, 2023

Description

Fixes: #1584
Fixes: #1586

Type of change

Please select one of the options below.

  • Bug fix (non-breaking change which fixes an issue)

Discussion

I will create a e2e test case once #1589 is merged.

Testing

Unit test and e2e tests. Will add an additional test once #1589 is merged.

Documentation

I will update the upgrade docs in a follow up.

Follow-up

@johscheuer johscheuer added the bug Something isn't working label Apr 27, 2023
@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 7de0e2118b60f75110a1df8d945022aac2c45924
  • Duration 2:40:24
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

Copy link
Member Author

@johscheuer johscheuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to add e2e tests for this behaviour next week to make sure the changes actually upgrade those Pods.

@johscheuer johscheuer force-pushed the update-process-groups-marked-removal-during-upgrade branch from 7de0e21 to 8d06871 Compare May 2, 2023 11:15
@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 8d06871fa3e482fb67a2f993380c3655d95babdf
  • Duration 2:45:50
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 21a3f086b98e782a2a0bcab5a770bedd7c4d793b
  • Duration 2:48:25
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: a5c081b773d18340ab480cf4301c12c99798a911
  • Duration 3:00:08
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@johscheuer
Copy link
Member Author

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 8d06871
  • Duration 2:45:50
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
[Container] 2023/05/02 14:00:32 Running command if $(grep -q -- "--- FAIL:" ${CODEBUILD_SRC_DIR}/logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" ${CODEBUILD_SRC_DIR}/logs/*.log; export fail_test=true; fi
TESTS FAILED SEE THESE LOGS:

/codebuild/output/src191758543/src/github.com/FoundationDB/fdb-kubernetes-operator/logs/test_operator_ha_upgrades.log

Failed test:

Summarizing 1 Failure:
  [FAIL] Operator HA Upgrades [AfterEach] Upgrading a multi-DC cluster, with a random pod deleted during the staging phase Upgrade, with a random pod deleted during the staging phase, from 6.3.25 to 7.1.31 [e2e]
  /codebuild/output/src191758543/src/github.com/FoundationDB/fdb-kubernetes-operator/e2e/fixtures/fixtures.go:62

Failure:

Operator HA Upgrades [AfterEach] Upgrading a multi-DC cluster, with a random pod deleted during the staging phase Upgrade, with a random pod deleted during the staging phase, from 6.3.25 to 7.1.31 [e2e]
  [AfterEach] /codebuild/output/src191758543/src/github.com/FoundationDB/fdb-kubernetes-operator/e2e/test_operator_ha_upgrades/operator_ha_upgrade_test.go:147
  [It] /codebuild/output/src191758543/src/github.com/FoundationDB/fdb-kubernetes-operator/e2e/fixtures/upgrade_test_configuration.go:104

  [FAILED] Unexpected error:
      <*errors.errorString | 0xc0004fbb90>: {
          s: "invariant InvariantClusterStatusAvailableWithThreshold failed",
      }
      invariant InvariantClusterStatusAvailableWithThreshold failed
  occurred
  In [AfterEach] at: /codebuild/output/src191758543/src/github.com/FoundationDB/fdb-kubernetes-operator/e2e/fixtures/fixtures.go:62 @ 05/02/23 12:18:17.104
------------------------------

Seems unrelated. I create a new issue to check the failure.

@johscheuer johscheuer requested a review from sbodagala May 4, 2023 09:28
@johscheuer johscheuer force-pushed the update-process-groups-marked-removal-during-upgrade branch from a5c081b to abf01df Compare May 4, 2023 16:53
@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: abf01df
  • Duration 3:03:18
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@johscheuer johscheuer closed this May 5, 2023
@johscheuer johscheuer reopened this May 5, 2023
@johscheuer
Copy link
Member Author

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: abf01df
  • Duration 3:03:18
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
Summarizing 1 Failure:
  [FAIL] Operator Upgrades upgrading a cluster with a crash looping sidecar process [It] Upgrade from 6.3.25 to 7.1.31 with crash-looping sidecar [e2e]
  /codebuild/output/src181577485/src/github.com/FoundationDB/fdb-kubernetes-operator/e2e/test_operator_upgrades/operator_upgrades_test.go:88

Failure:

• [FAILED] [1054.991 seconds]
Operator Upgrades upgrading a cluster with a crash looping sidecar process [It] Upgrade from 6.3.25 to 7.1.31 with crash-looping sidecar [e2e]
/codebuild/output/src181577485/src/github.com/FoundationDB/fdb-kubernetes-operator/e2e/fixtures/upgrade_test_configuration.go:104

  [FAILED] Timed out after 600.000s.
  Expected
      <string>: 6.3.25
  to equal
      <string>: 7.1.31
  In [It] at: /codebuild/output/src181577485/src/github.com/FoundationDB/fdb-kubernetes-operator/e2e/test_operator_upgrades/operator_upgrades_test.go:88 @ 05/04/23 18:03:49.948
------------------------------

I take a look into he failure.

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: abf01df
  • Duration 3:00:17
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@johscheuer
Copy link
Member Author

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: abf01df
  • Duration 3:00:17
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
Summarizing 1 Failure:
  [FAIL] Operator Upgrades upgrading a cluster with a crash looping sidecar process [It] Upgrade from 6.3.25 to 7.1.31 with crash-looping sidecar [e2e]
  /codebuild/output/src786005913/src/github.com/FoundationDB/fdb-kubernetes-operator/e2e/test_operator_upgrades/operator_upgrades_test.go:88
• [FAILED] [1049.957 seconds]
Operator Upgrades upgrading a cluster with a crash looping sidecar process [It] Upgrade from 6.3.25 to 7.1.31 with crash-looping sidecar [e2e]
/codebuild/output/src786005913/src/github.com/FoundationDB/fdb-kubernetes-operator/e2e/fixtures/upgrade_test_configuration.go:104

  [FAILED] Timed out after 600.001s.
  Expected
      <string>: 6.3.25
  to equal
      <string>: 7.1.31
  In [It] at: /codebuild/output/src786005913/src/github.com/FoundationDB/fdb-kubernetes-operator/e2e/test_operator_upgrades/operator_upgrades_test.go:88 @ 05/05/23 06:48:36.396
------------------------------

taking a closer look at this failure.

Copy link
Member Author

@johscheuer johscheuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the test manually multiple times (5x) and it succeeded every time:

------------------------------
[ReportAfterSuite] Autogenerated ReportAfterSuite for --junit-report
autogenerated by Ginkgo
[ReportAfterSuite] PASSED [0.002 seconds]
------------------------------

Ran 1 of 14 Specs in 569.462 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 13 Skipped

Might be an issue with the CI EKS cluster.

@johscheuer johscheuer closed this May 5, 2023
@johscheuer johscheuer reopened this May 5, 2023
@johscheuer johscheuer merged commit 31300e4 into FoundationDB:main May 5, 2023
@johscheuer johscheuer deleted the update-process-groups-marked-removal-during-upgrade branch May 5, 2023 09:52
@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: abf01df
  • Duration 3:01:52
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants