-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test/system/260-sdnotify.bats: fix test flake #18319
Conversation
The `exec` session somestimes exits with 137 as the exec session races with the cleanup process of the exiting container. Fix the flake by running a detached exec session. Fixes: containers#10825 Signed-off-by: Valentin Rothberg <[email protected]>
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vrothberg The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but I want @edsantiago to have a look.
I keep staring at it but I'm not sure I fully understand it. Would I be correct in saying that this is the timeline in question?
Is that accurate? If so, then it's safe to say that this following, my previous mental model, is inaccurate?
...and then if so, there are a LOT of other places where I use that convention for signaling containers to terminate. This PR doesn't even address the most recently reported flake. Shouldn't all of those be updated to
|
I believe so, yes.
Can you elaborate? That is literally the flake I wanted to fix.
Good thinking. It sounds logical that these other tests are also subject to this race.
Wait for a 2nd signal and exit 0 could be an alternative approach. Then we replace the |
exec is never atomic, it just spawns a process. We have no control how the kernel scheduler runs them. I don't even think podman kill the exec session, when the main container process (pid 1 in the pid namespace) exits the kernel will SIGKILL all processes in this namespace. |
Gah. Sorry, I got myself turned around. The flake not fixed is the one in comment 0, the originally-reported one in the " |
Ah, thanks! That is good I think since it fits well into the |
/lgtm I'm going to take a hard look at the rest of the instances, with my new understanding in mind. Thanks to everyone for setting me straight. |
Appreciate it, thanks, @edsantiago |
@edsantiago One thing you could try if you really want to avoid signals is to bind mount a directory and touch the file from the host there. |
@Luap99 good suggestion, thank you |
The
exec
session somestimes exits with 137 as the exec session races with the cleanup process of the exiting container. Fix the flake by running a detached exec session.Fixes: #10825
Does this PR introduce a user-facing change?