Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition in testLifecycleOperations, Serialize updateContainer() calls, Ignore irrelevant container events #1353

Merged
merged 3 commits into from
Jul 14, 2023

Conversation

martinpitt
Copy link
Member

@martinpitt martinpitt commented Jul 13, 2023

No description provided.

The initial container gets started and immediately stopped via the CLI.
The events propagate through the UI asynchronously, so strengthen the
initial wait to ensure that the container is actually shown as "Exited".
Otherwise it could still be "Running" in the UI, and trying to open the
action menu would not show "Start".
With a burst of events these get called in parallel. But podman does not
return them in the call order [1], which led to non-current state
updates.

[1] containers/podman#19124
@martinpitt martinpitt marked this pull request as draft July 13, 2023 10:54
@martinpitt martinpitt force-pushed the fixes branch 2 times, most recently from a8183c4 to c10a8aa Compare July 13, 2023 11:33
@martinpitt martinpitt changed the title Fix race condition in testLifecycleOperations, Serialize updateContainer() calls, Handle 'health_status' container event, Ignore irrelevant container events Fix race condition in testLifecycleOperations, Serialize updateContainer() calls, Handle 'health_status' container event Jul 13, 2023
@martinpitt martinpitt added the flake unstable test label Jul 13, 2023
@martinpitt martinpitt marked this pull request as ready for review July 13, 2023 12:07
@martinpitt martinpitt requested a review from marusak July 13, 2023 12:07
src/app.jsx Outdated Show resolved Hide resolved
@martinpitt martinpitt changed the title Fix race condition in testLifecycleOperations, Serialize updateContainer() calls, Handle 'health_status' container event Fix race condition in testLifecycleOperations, Serialize updateContainer() calls, Ignore irrelevant container events Jul 13, 2023
@martinpitt martinpitt changed the title Fix race condition in testLifecycleOperations, Serialize updateContainer() calls, Ignore irrelevant container events Fix race condition in testLifecycleOperations, Serialize updateContainer() calls, Handle 'health_status' container event, Ignore irrelevant container events Jul 13, 2023
@martinpitt
Copy link
Member Author

testHealthcheck is still very flaky. But that already happens on main, investigating in #1324 (comment) . So retrying.

@martinpitt martinpitt requested a review from marusak July 14, 2023 05:45
src/app.jsx Show resolved Hide resolved
marusak
marusak previously approved these changes Jul 14, 2023
Copy link
Member

@marusak marusak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! The last commit with ignoring events may be potentially dangerous, see my comment

These are internal transient states which don't need to reflect in the
UI. They happen quickly in bursts, with a "permanent state" event
following such as "create", "died", or "remove".

This helps to reduce the API calls and thus mitigates out-of-order
results; see containers/podman#19124

We are not really interested in `podman exec` events, so
we would like to ignore `exec_died` along with `exec`. However, it is
the only thing that saves us from inconsistent `health_state` events
(see containers/podman#19237). So we cannot
rely on the latter event, but instead have to do a full update after
each `exec_died`, as some of them are the health checks.

Also fix the alphabetical sorting of the remaining events.
@martinpitt
Copy link
Member Author

martinpitt commented Jul 14, 2023

Argh, this makes the health check tests more flaky, especially on ubuntu-2204 (but not limited to that). I analyzed that in #1324 (comment) , reported it as containers/podman#19237 , and documented our accidental workaround explicitly.

Now we are back to the status quo of "the test flake a lot", instead of "all the time on ubuntu-2204" 😢

@martinpitt martinpitt changed the title Fix race condition in testLifecycleOperations, Serialize updateContainer() calls, Handle 'health_status' container event, Ignore irrelevant container events Fix race condition in testLifecycleOperations, Serialize updateContainer() calls, Ignore irrelevant container events Jul 14, 2023
@martinpitt martinpitt requested a review from marusak July 14, 2023 09:25
Copy link
Member

@marusak marusak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am so sorry you have to deal with this :/ Thanks!

@martinpitt martinpitt merged commit 92398c3 into cockpit-project:main Jul 14, 2023
@martinpitt martinpitt deleted the fixes branch July 14, 2023 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flake unstable test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants