-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TestReconcileMachinePhases is flaky #8477
Comments
Provisioned
_when_there_is_a_ProviderID_and_there_is_no_Node
/triage accepted @chrischdi - did you want to take this one on? |
/assign I'm digging into it. I think sometimes it could be the case that if:
It's for me locally reproducible via following patch diff --git a/internal/controllers/machine/machine_controller_phases_test.go b/internal/controllers/machine/machine_controller_phases_test.go
index fbfb18bb9..bbe2c33e4 100644
--- a/internal/controllers/machine/machine_controller_phases_test.go
+++ b/internal/controllers/machine/machine_controller_phases_test.go
@@ -507,7 +507,7 @@ func TestReconcileMachinePhases(t *testing.T) {
g.Expect(env.Create(ctx, bootstrapConfig)).To(Succeed())
g.Expect(env.Create(ctx, infraMachine)).To(Succeed())
g.Expect(env.Create(ctx, machine)).To(Succeed())
-
+ time.Sleep(time.Second)
modifiedMachine := machine.DeepCopy()
// Set NodeRef to nil.
machine.Status.NodeRef = nil Which then results in reconciliation before the statuses got set. Different ideas to remediate / fix:
|
To add: With this patch, the test does not even finish after 30s (current timeout is 10s), so it may also be a bigger bug. |
I don't like the idea of adding
And I'm still getting a failure - does it work for you? |
I think so yes, but not yet at 100%. |
(but only with |
It looks like the logic for setting |
@chrischdi Just took another look - I think the issue here is in the test - there shouldn't be a patch for the machine at all with the modifiedMachine.Status.LastUpdated = &lastUpdated
g.Expect(env.Status().Patch(ctx, modifiedMachine, client.MergeFrom(machine))).To(Succeed()) should be removed IMO. |
Yes maybe, I think there are two cases for this field
I don't know which of both (or both) are possible in the runtime scenario 🤔 Edit:
|
Does |
But to recap, the issue of the test could be the following scenario:
cluster-api/internal/controllers/machine/machine_controller_phases_test.go Lines 515 to 517 in 94f5468
|
And I don't think this patch is necessary or really useful - we just need to check that the field is set and that the lastupdated field in the machine is after the timestamp before the reconcile, no need to have it be updated on the machine IMO. |
I agree for this test case. But in general for these envtest tests, if we rely on some tests that the status is already in some state:
|
/reopen I checked the dashboard again: There's one more occurence happening at the test (since we merged the fix)
/ https://github.com/kubernetes-sigs/cluster-api/blob/main/internal/controllers/machine/machine_controller_phases_test.go#L243 , which we should fix too. |
@chrischdi: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Triage link: 655deea70b0ff4b80d8b |
Which jobs are flaking?
TestReconcileMachinePhases/Should_set_
Provisioned
_when_there_is_a_ProviderID_and_there_is_no_NodeWhich tests are flaking?
Part of the unit test run. Flaking on main and 1.4.
Since when has it been flaking?
#8044 moved these tests to be based on envtest instead of the fake client. Since then there has been a number of flakes.
Testgrid link
https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/periodic-cluster-api-test-release-1-4/1643034564504850432
Reason for failure (if possible)
No response
Anything else we need to know?
No response
Label(s) to be applied
/kind flake
/area testing
The text was updated successfully, but these errors were encountered: