[test] Add e2e downgrade automatic cancellation test #19399

henrybear327 · 2025-02-12T12:01:05Z

Verify that the downgrade can be cancelled automatically when the downgrade is completed (using no inflight downgrade job as the indicator)

Please see #19365 (comment)

Reference: #17976

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

codecov · 2025-02-12T12:19:41Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 68.90%. Comparing base (14cf669) to head (24d1fc1).

Additional details and impacted files

see 20 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #19399      +/-   ##
==========================================
- Coverage   68.95%   68.90%   -0.06%     
==========================================
  Files         420      420              
  Lines       35753    35753              
==========================================
- Hits        24655    24636      -19     
- Misses       9677     9690      +13     
- Partials     1421     1427       +6

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 14cf669...24d1fc1. Read the comment docs.

ahrtr

LGTM

cc @fuweid @siyuanfoundation @serathius

ahrtr · 2025-02-12T15:58:02Z

@henrybear327 can you please rebase this PR ? I just merged #19398

henrybear327 · 2025-02-12T15:59:33Z

@henrybear327 can you please rebase this PR ? I just merged #19398

Saw and done as you posted!

tests/e2e/cluster_downgrade_test.go

ahrtr · 2025-02-12T16:11:21Z

tests/e2e/cluster_downgrade_test.go

+		time.Sleep(etcdserver.HealthInterval)
+	}
+
+	e2e.DowngradeAutoCancelCheck(t, epc)


This might be flaky, it depends on when the downgrade job gets triggered. A simple way is just to monitor/wait for the log entry "the cluster has been downgraded"

etcd/server/etcdserver/version/monitor.go

Line 143 in f30cbaa

m.lg.Info("the cluster has been downgraded", zap.String("cluster-version", targetVersion))

So it's better to

monitor/wait for the log entry "the cluster has been downgraded"

call downgradeCancellation to verify that it's indeed cancelled

Right?

The first item is required, the second one is nice to have (optional).

Got it! Will impl. in a bit

Theoretically, when you see the log message "the cluster has been downgraded", the leader hasn't send out the downgradeCancel request yet. So when you try to send downgradeCancel yourself in e2e.DowngradeAutoCancelCheck, it may succeed. I think it's easy to verify if you add a sleep in between.

But in practice, it's unlikely because there is a 5s (etcdserver.HealthInterval) sleep after the downgrade completion.

So either just fix comment https://github.com/etcd-io/etcd/pull/19399/files#r1957283593 or just remove e2e.DowngradeAutoCancelCheck (preferred)

I suggest add a TODO item to followup once the downgrade query API is implemented. #19439 (comment)

ahrtr

Please resolve the comments

k8s-ci-robot · 2025-02-12T16:27:26Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: henrybear327
Once this PR has been reviewed and has the lgtm label, please assign ahrtr for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

tests/e2e/cluster_downgrade_test.go

tests/framework/e2e/downgrade.go

ahrtr · 2025-02-15T08:37:01Z

tests/framework/e2e/downgrade.go

+	if opString == "downgrading" && len(membersToChange) == len(clus.Procs) {
+		lg.Info("Waiting for downgrade completion log line")
+		leader := clus.WaitLeader(t)
+		_, err := clus.Procs[leader].Logs().ExpectWithContext(context.TODO(), expect.ExpectedResponse{Value: "the cluster has been downgraded"})


Please set a timeout, i.e. 30s.

Done as requested!

Do you see a risk with this approach of monitoring the leader's log @ahrtr? :)

The only concern is it will fail if the leader somehow changes.

So suggest to add a retry mechanism (i.e. up to 3 times), and let's wait 15s instead of 30s in each retry.

Verify that the downgrade can be cancelled automatically when the downgrade is completed (using `no inflight downgrade job`` as the indicator) Please see: etcd-io#19365 (comment) Reference: etcd-io#17976 Signed-off-by: Chun-Hung Tseng <[email protected]>

ahrtr · 2025-02-16T09:39:22Z

tests/e2e/cluster_downgrade_test.go

@@ -207,6 +207,77 @@ func testDowngradeUpgrade(t *testing.T, numberOfMembersToDowngrade int, clusterS
 	assert.Equal(t, beforeMembers.Members, afterMembers.Members)
 }

+func TestAutomaticDowngradeCancellationAfterCompletingDowngradingInClusterOf3(t *testing.T) {


The name is too long and wordy

Suggested change

func TestAutomaticDowngradeCancellationAfterCompletingDowngradingInClusterOf3(t *testing.T) {

func TestDowngradeAutoCancelAfterCompletion(t *testing.T) {

ahrtr · 2025-02-16T09:41:20Z

tests/framework/e2e/downgrade.go

+	testutils.ExecuteWithTimeout(t, 1*time.Minute, func() {
+		t.Log("etcdctl downgrade cancel")
+		err = c.DowngradeCancel(context.TODO())
+		require.Errorf(t, err, "no inflight downgrade job")


Suggested change

require.Errorf(t, err, "no inflight downgrade job")

require.ErrorContains(t, err, "no inflight downgrade job")

henrybear327 · 2025-02-18T21:07:57Z

Is this PR still relevant as we are doing #19439?

ahrtr · 2025-02-18T21:14:25Z

Is this PR still relevant as we are doing #19439?

Yes, I think so. The auto downgrade cancel should still work.

k8s-ci-robot added area/testing size/L labels Feb 12, 2025

henrybear327 requested review from ahrtr and siyuanfoundation February 12, 2025 12:01

henrybear327 self-assigned this Feb 12, 2025

henrybear327 changed the title ~~Add e2e downgrade automatic cancellation test~~ [test] Add e2e downgrade automatic cancellation test Feb 12, 2025

henrybear327 force-pushed the e2e/downgrade_auto_cancel branch 2 times, most recently from 995bf3e to 60e8a40 Compare February 12, 2025 12:10

henrybear327 force-pushed the e2e/downgrade_auto_cancel branch from 60e8a40 to 11357be Compare February 12, 2025 13:07

ahrtr approved these changes Feb 12, 2025

View reviewed changes

k8s-ci-robot added the approved label Feb 12, 2025

henrybear327 requested review from ivanvc, serathius and fuweid February 12, 2025 15:57

henrybear327 force-pushed the e2e/downgrade_auto_cancel branch from 11357be to c0f2170 Compare February 12, 2025 15:59

ahrtr reviewed Feb 12, 2025

View reviewed changes

tests/e2e/cluster_downgrade_test.go Outdated Show resolved Hide resolved

ahrtr reviewed Feb 12, 2025

View reviewed changes

tests/e2e/cluster_downgrade_test.go Outdated Show resolved Hide resolved

tests/e2e/cluster_downgrade_test.go Outdated Show resolved Hide resolved

ahrtr reviewed Feb 12, 2025

View reviewed changes

henrybear327 force-pushed the e2e/downgrade_auto_cancel branch from c0f2170 to 43e04f9 Compare February 12, 2025 16:14

ahrtr requested changes Feb 12, 2025

View reviewed changes

k8s-ci-robot removed the approved label Feb 12, 2025

ahrtr mentioned this pull request Feb 13, 2025

Add server level feature gate #18023

Open

15 tasks

henrybear327 force-pushed the e2e/downgrade_auto_cancel branch from 43e04f9 to a18de65 Compare February 13, 2025 20:35

ahrtr reviewed Feb 14, 2025

View reviewed changes

tests/e2e/cluster_downgrade_test.go Outdated Show resolved Hide resolved

henrybear327 force-pushed the e2e/downgrade_auto_cancel branch 2 times, most recently from 7baf245 to f39e84e Compare February 15, 2025 01:36

henrybear327 commented Feb 15, 2025

View reviewed changes

tests/framework/e2e/downgrade.go Outdated Show resolved Hide resolved

henrybear327 force-pushed the e2e/downgrade_auto_cancel branch from f39e84e to 121fa72 Compare February 15, 2025 01:52

henrybear327 requested a review from ahrtr February 15, 2025 01:54

ahrtr reviewed Feb 15, 2025

View reviewed changes

henrybear327 force-pushed the e2e/downgrade_auto_cancel branch from 121fa72 to 24d1fc1 Compare February 15, 2025 10:58

ahrtr reviewed Feb 16, 2025

View reviewed changes

ahrtr mentioned this pull request Feb 20, 2025

*: support DowngradeInfo field in maintenence.Status API #19451

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[test] Add e2e downgrade automatic cancellation test #19399

[test] Add e2e downgrade automatic cancellation test #19399

henrybear327 commented Feb 12, 2025 •

edited

Loading

codecov bot commented Feb 12, 2025 •

edited

Loading

ahrtr left a comment

ahrtr commented Feb 12, 2025

henrybear327 commented Feb 12, 2025

ahrtr Feb 12, 2025 •

edited

Loading

henrybear327 Feb 12, 2025

ahrtr Feb 12, 2025

henrybear327 Feb 12, 2025

ahrtr Feb 16, 2025

siyuanfoundation Feb 18, 2025

ahrtr left a comment

k8s-ci-robot commented Feb 12, 2025

ahrtr Feb 15, 2025

henrybear327 Feb 15, 2025

ahrtr Feb 16, 2025

ahrtr Feb 16, 2025 •

edited

Loading

ahrtr Feb 16, 2025

henrybear327 commented Feb 18, 2025

ahrtr commented Feb 18, 2025

	func TestAutomaticDowngradeCancellationAfterCompletingDowngradingInClusterOf3(t *testing.T) {
	func TestDowngradeAutoCancelAfterCompletion(t *testing.T) {

	require.Errorf(t, err, "no inflight downgrade job")
	require.ErrorContains(t, err, "no inflight downgrade job")

[test] Add e2e downgrade automatic cancellation test #19399

Are you sure you want to change the base?

[test] Add e2e downgrade automatic cancellation test #19399

Conversation

henrybear327 commented Feb 12, 2025 • edited Loading

codecov bot commented Feb 12, 2025 • edited Loading

Codecov Report

ahrtr left a comment

Choose a reason for hiding this comment

ahrtr commented Feb 12, 2025

henrybear327 commented Feb 12, 2025

ahrtr Feb 12, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahrtr left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Feb 12, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahrtr Feb 16, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

henrybear327 commented Feb 18, 2025

ahrtr commented Feb 18, 2025

henrybear327 commented Feb 12, 2025 •

edited

Loading

codecov bot commented Feb 12, 2025 •

edited

Loading

ahrtr Feb 12, 2025 •

edited

Loading

ahrtr Feb 16, 2025 •

edited

Loading