-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[test] Add e2e downgrade automatic cancellation test #19399
base: main
Are you sure you want to change the base?
Conversation
995bf3e
to
60e8a40
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted filessee 20 files with indirect coverage changes @@ Coverage Diff @@
## main #19399 +/- ##
==========================================
- Coverage 68.95% 68.90% -0.06%
==========================================
Files 420 420
Lines 35753 35753
==========================================
- Hits 24655 24636 -19
- Misses 9677 9690 +13
- Partials 1421 1427 +6 Continue to review full report in Codecov by Sentry.
|
60e8a40
to
11357be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@henrybear327 can you please rebase this PR ? I just merged #19398 |
11357be
to
c0f2170
Compare
Saw and done as you posted! |
time.Sleep(etcdserver.HealthInterval) | ||
} | ||
|
||
e2e.DowngradeAutoCancelCheck(t, epc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be flaky, it depends on when the downgrade job gets triggered. A simple way is just to monitor/wait for the log entry "the cluster has been downgraded"
etcd/server/etcdserver/version/monitor.go
Line 143 in f30cbaa
m.lg.Info("the cluster has been downgraded", zap.String("cluster-version", targetVersion)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it's better to
- monitor/wait for the log entry "the cluster has been downgraded"
- call
downgradeCancellation
to verify that it's indeed cancelled
Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first item is required, the second one is nice to have (optional).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it! Will impl. in a bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically, when you see the log message "the cluster has been downgraded
", the leader hasn't send out the downgradeCancel request yet. So when you try to send downgradeCancel yourself in e2e.DowngradeAutoCancelCheck
, it may succeed. I think it's easy to verify if you add a sleep in between.
But in practice, it's unlikely because there is a 5s (etcdserver.HealthInterval
) sleep after the downgrade completion.
So either just fix comment https://github.com/etcd-io/etcd/pull/19399/files#r1957283593 or just remove e2e.DowngradeAutoCancelCheck
(preferred)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest add a TODO item to followup once the downgrade query API is implemented. #19439 (comment)
c0f2170
to
43e04f9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please resolve the comments
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: henrybear327 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
43e04f9
to
a18de65
Compare
7baf245
to
f39e84e
Compare
f39e84e
to
121fa72
Compare
tests/framework/e2e/downgrade.go
Outdated
if opString == "downgrading" && len(membersToChange) == len(clus.Procs) { | ||
lg.Info("Waiting for downgrade completion log line") | ||
leader := clus.WaitLeader(t) | ||
_, err := clus.Procs[leader].Logs().ExpectWithContext(context.TODO(), expect.ExpectedResponse{Value: "the cluster has been downgraded"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please set a timeout, i.e. 30s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done as requested!
Do you see a risk with this approach of monitoring the leader's log @ahrtr? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only concern is it will fail if the leader somehow changes.
So suggest to add a retry mechanism (i.e. up to 3 times), and let's wait 15s instead of 30s in each retry.
Verify that the downgrade can be cancelled automatically when the downgrade is completed (using `no inflight downgrade job`` as the indicator) Please see: etcd-io#19365 (comment) Reference: etcd-io#17976 Signed-off-by: Chun-Hung Tseng <[email protected]>
121fa72
to
24d1fc1
Compare
@@ -207,6 +207,77 @@ func testDowngradeUpgrade(t *testing.T, numberOfMembersToDowngrade int, clusterS | |||
assert.Equal(t, beforeMembers.Members, afterMembers.Members) | |||
} | |||
|
|||
func TestAutomaticDowngradeCancellationAfterCompletingDowngradingInClusterOf3(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name is too long and wordy
func TestAutomaticDowngradeCancellationAfterCompletingDowngradingInClusterOf3(t *testing.T) { | |
func TestDowngradeAutoCancelAfterCompletion(t *testing.T) { |
testutils.ExecuteWithTimeout(t, 1*time.Minute, func() { | ||
t.Log("etcdctl downgrade cancel") | ||
err = c.DowngradeCancel(context.TODO()) | ||
require.Errorf(t, err, "no inflight downgrade job") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
require.Errorf(t, err, "no inflight downgrade job") | |
require.ErrorContains(t, err, "no inflight downgrade job") |
Is this PR still relevant as we are doing #19439? |
Yes, I think so. The auto downgrade cancel should still work. |
Verify that the downgrade can be cancelled automatically when the downgrade is completed (using
no inflight downgrade job
as the indicator)Please see #19365 (comment)
Reference: #17976
Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.