-
Notifications
You must be signed in to change notification settings - Fork 620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allocator startup can perform many raft writes #1286
Comments
aaronlehmann
added a commit
to aaronlehmann/swarmkit
that referenced
this issue
Aug 1, 2016
When loading a state that contained large numbers of nodes and tasks, but no ready nodes that could accept the tasks, swarmd used large amounts of CPU repeatedly trying to schedule the full set of tasks. The allocator caused many commits on startup (see moby#1286), and this produced a large backlog of commit events, each one of which caused a full scheduling pass. To avoid this pathological behavior, debounce the commit events similarly to how the dispatcher's Tasks loop debounces events. When a commit event is received, that starts a 50 ms countdown to wait for another commit event before running the scheduling pass. If commit events keep being received and resetting this timer, the scheduler will run the scheduling pass anyway after a second. Signed-off-by: Aaron Lehmann <[email protected]>
aaronlehmann
added a commit
to aaronlehmann/swarmkit
that referenced
this issue
Aug 1, 2016
When loading a state that contained large numbers of nodes and tasks, but no ready nodes that could accept the tasks, swarmd used large amounts of CPU repeatedly trying to schedule the full set of tasks. The allocator caused many commits on startup (see moby#1286), and this produced a large backlog of commit events, each one of which caused a full scheduling pass. To avoid this pathological behavior, debounce the commit events similarly to how the dispatcher's Tasks loop debounces events. When a commit event is received, that starts a 50 ms countdown to wait for another commit event before running the scheduling pass. If commit events keep being received and resetting this timer, the scheduler will run the scheduling pass anyway after a second. Signed-off-by: Aaron Lehmann <[email protected]>
aaronlehmann
added a commit
that referenced
this issue
Aug 1, 2016
When loading a state that contained large numbers of nodes and tasks, but no ready nodes that could accept the tasks, swarmd used large amounts of CPU repeatedly trying to schedule the full set of tasks. The allocator caused many commits on startup (see #1286), and this produced a large backlog of commit events, each one of which caused a full scheduling pass. To avoid this pathological behavior, debounce the commit events similarly to how the dispatcher's Tasks loop debounces events. When a commit event is received, that starts a 50 ms countdown to wait for another commit event before running the scheduling pass. If commit events keep being received and resetting this timer, the scheduler will run the scheduling pass anyway after a second. Signed-off-by: Aaron Lehmann <[email protected]> (cherry picked from commit 77c62db)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
In
doNetworkInit
, allocations of networks, nodes, and services aren't batched. Loading a swarm state with thousands of nodes appears to result in many raft writes fromdoNetworkInit
's calls toallocateNode
. This could block for a long time in a multi-manager setup where writes need to be acknowledged by a quorum of managers.cc @mrjana
The text was updated successfully, but these errors were encountered: