Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman play --service-container : is not stopping (flake) #14351

Closed
edsantiago opened this issue May 24, 2022 · 2 comments · Fixed by #14354
Closed

podman play --service-container : is not stopping (flake) #14351

edsantiago opened this issue May 24, 2022 · 2 comments · Fixed by #14354
Assignees
Labels
flakes Flakes from Continuous Integration locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@edsantiago
Copy link
Member

Three flakes in my PR, finally passed on the fourth try:

# #| FAIL: Timed out waiting for container b9180307e1fd-service to enter state running=false

Unfortunately, #14338 (timeout bump) did not work. Sorry @vrothberg.

[sys] 341 podman play --service-container

@edsantiago edsantiago added the flakes Flakes from Continuous Integration label May 24, 2022
@vrothberg
Copy link
Member

Thanks, @edsantiago. I am going to take a look immediately.

@vrothberg
Copy link
Member

I smell where it's coming from. Will prepare a PR in a jiffy.

vrothberg added a commit to vrothberg/libpod that referenced this issue May 25, 2022
Simplify the work-queue implementation by using a wait group. Once all
queued work items are done, the channel can be closed.

The system tests revealed a flake (i.e., containers#14351) which indicated that
the service container does not always get stopped which suggests a race
condition when queuing items.  Those items are queued in a goroutine to
prevent potential dead locks if the queue ever filled up too quickly.
The race condition in question is that if a work item queues another,
the goroutine for queuing may not be scheduled fast enough and the
runtime shuts down; it seems to happen fairly easily on the slow CI
machines.  The wait group fixes this race and allows for simplifying
the code.

Also increase the queue's buffer size to 10 to make things slightly
faster.

[NO NEW TESTS NEEDED] as we are fixing a flake.

Fixes: containers#14351
Signed-off-by: Valentin Rothberg <[email protected]>
@vrothberg vrothberg self-assigned this May 25, 2022
cdoern pushed a commit to cdoern/podman that referenced this issue May 27, 2022
Simplify the work-queue implementation by using a wait group. Once all
queued work items are done, the channel can be closed.

The system tests revealed a flake (i.e., containers#14351) which indicated that
the service container does not always get stopped which suggests a race
condition when queuing items.  Those items are queued in a goroutine to
prevent potential dead locks if the queue ever filled up too quickly.
The race condition in question is that if a work item queues another,
the goroutine for queuing may not be scheduled fast enough and the
runtime shuts down; it seems to happen fairly easily on the slow CI
machines.  The wait group fixes this race and allows for simplifying
the code.

Also increase the queue's buffer size to 10 to make things slightly
faster.

[NO NEW TESTS NEEDED] as we are fixing a flake.

Fixes: containers#14351
Signed-off-by: Valentin Rothberg <[email protected]>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 20, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
flakes Flakes from Continuous Integration locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants