-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
work queue: simplify and use a wait group #14354
Conversation
Simplify the work-queue implementation by using a wait group. Once all queued work items are done, the channel can be closed. The system tests revealed a flake (i.e., containers#14351) which indicated that the service container does not always get stopped which suggests a race condition when queuing items. Those items are queued in a goroutine to prevent potential dead locks if the queue ever filled up too quickly. The race condition in question is that if a work item queues another, the goroutine for queuing may not be scheduled fast enough and the runtime shuts down; it seems to happen fairly easily on the slow CI machines. The wait group fixes this race and allows for simplifying the code. Also increase the queue's buffer size to 10 to make things slightly faster. [NO NEW TESTS NEEDED] as we are fixing a flake. Fixes: containers#14351 Signed-off-by: Valentin Rothberg <[email protected]>
@edsantiago FYI @flouthoc @Luap99 @mheon PTAL. No idea why I made things so complicated when adding the work queue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: flouthoc, vrothberg The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
Ironically being bitten by (network) flakes. |
/hold cancel |
Simplify the work-queue implementation by using a wait group. Once all
queued work items are done, the channel can be closed.
The system tests revealed a flake (i.e., #14351) which indicated that
the service container does not always get stopped which suggests a race
condition when queuing items. Those items are queued in a goroutine to
prevent potential dead locks if the queue ever filled up too quickly.
The race condition in question is that if a work item queues another,
the goroutine for queuing may not be scheduled fast enough and the
runtime shuts down; it seems to happen fairly easily on the slow CI
machines. The wait group fixes this race and allows for simplifying
the code.
Also increase the queue's buffer size to 10 to make things slightly
faster.
[NO NEW TESTS NEEDED] as we are fixing a flake.
Fixes: #14351
Signed-off-by: Valentin Rothberg [email protected]
Does this PR introduce a user-facing change?