-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Radix-sorted, lookup-free, typesafe, nonblocking, priority-boosting pending tasks #85751
Closed
DaveCTurner
wants to merge
67
commits into
elastic:main
from
DaveCTurner:2022-04-07-master-service-new-queues
Closed
Radix-sorted, lookup-free, typesafe, nonblocking, priority-boosting pending tasks #85751
DaveCTurner
wants to merge
67
commits into
elastic:main
from
DaveCTurner:2022-04-07-master-service-new-queues
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
DaveCTurner
added
WIP
>refactoring
:Distributed Coordination/Cluster Coordination
Cluster formation and cluster state publication, including cluster membership and fault detection.
v8.3.0
labels
Apr 7, 2022
DaveCTurner
added a commit
to DaveCTurner/elasticsearch
that referenced
this pull request
Apr 8, 2022
Today when submitting a cluster state update task the caller can indicate that it is not to be included in a batch by using an executor obtained from `ClusterStateTaskExecutor#unbatched()`. This is kind of weird: the caller must not re-use the executor, and also the master service has no practical way to handle this special case any differently. This commit introduces a dedicated API for unbatched tasks, opening the door to some future simplifications. Extracted from elastic#85751
DaveCTurner
added a commit
to DaveCTurner/elasticsearch
that referenced
this pull request
Apr 8, 2022
Today when submitting a cluster state update task the caller can indicate that it is not to be included in a batch by using an executor obtained from `ClusterStateTaskExecutor#unbatched()`. This is kind of weird: the caller must not re-use the executor, and also the master service has no practical way to handle this special case any differently. This commit introduces a dedicated API for unbatched tasks, opening the door to some future simplifications. Extracted from elastic#85751
DaveCTurner
force-pushed
the
2022-04-07-master-service-new-queues
branch
from
April 8, 2022 14:30
8d20e99
to
db5e3bd
Compare
Today in `ClusterStateChanges` we use mocks to capture tasks submitted to the master service in order to compute the relevant cluster state updates ourselves. In fact there's no need to do this, we can use the real master service with a customized executor. This commit does that.
Today when submitting a cluster state update task the caller can indicate that it is not to be included in a batch by using an executor obtained from `ClusterStateTaskExecutor#unbatched()`. This is kind of weird: the caller must not re-use the executor, and also the master service has no practical way to handle this special case any differently. This commit introduces a dedicated API for unbatched tasks, opening the door to some future simplifications.
DaveCTurner
force-pushed
the
2022-04-07-master-service-new-queues
branch
from
April 8, 2022 16:00
db5e3bd
to
8bc6804
Compare
…ending tasks Today the master's pending task queue is just the `PriorityBlockingQueue<Runnable>` belonging to the underlying `ThreadPoolExecutor`. The reasons for this date back a long way but it doesn't really reflect the structure of the queue as it exists today. In particular, we must keep track of batches independently of the queue itself, and must do various bits of unchecked casting to process multiple items of the same type at once. This commit introduces an new queueing mechanism, independent of the executor's queue, which better represents the conceptual structure of the master's pending tasks: * Today we use a priority queue to allow important tasks to preempt less-important ones. However there are only a small number of priority levels, so it is simpler to maintain a queue for each priority, effectively replacing the sorting within the priority queue with a radix sort. * Today when a task is submitted we perform a map lookup to see if it can be added to an existing batch or not. With this change we allow client code to create its own dedicated queue of tasks. The entries in the per-priority-level queues are themselves queues, one for each executor, representing the batches to be run. * Today each task in the queue holds a reference to its executor, but the executor used to run a task may belong to a different task in the same batch. In practice we know they're the same executor (that's how batches are defined) but we cannot express this knowledge in the type system so we have to do a bunch of unchecked casting to work around it. With this change we associate each per-executor queue directly with its executor, avoiding the need to do all this unchecked casting. * Today the master service must block its thread while waiting for each task to complete, because otherwise the executor would start to process the next task in the queue. This makes testing using a `DeterministicTaskQueue` harder (see `FakeThreadPoolMasterService`). With this change we avoid submitting a task to the `ThreadPoolExecutor` until the previous task is complete, which means we can make the implementation avoid blocking while a task is running and therefore run the whole production implementation in deterministic tests[^1]. * Today it's possible for a steady drip of high-priority tasks to starve low-priority tasks of access to the master for an extended period of time. With this change we separate the queue of tasks from the queue which determines the execution order. This allows us to implement more intelligent execution policies. For instance, if we detect that the queue has not processed any low-priority tasks for too long then we can make the decision to boost their priorities[^2]. [^1]: Not done yet but this is a step in the right direction. [^2]: Not done yet but this is a step in the right direction.
DaveCTurner
force-pushed
the
2022-04-07-master-service-new-queues
branch
from
April 8, 2022 16:22
8bc6804
to
ffc3a97
Compare
DaveCTurner
force-pushed
the
2022-04-07-master-service-new-queues
branch
from
April 9, 2022 20:27
76dc00e
to
e26a000
Compare
(conflicts not yet resolved fully)
DaveCTurner
added
>tech debt
Team:Distributed (Obsolete)
Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
labels
Jul 12, 2022
Superseded by #92021 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination
Cluster formation and cluster state publication, including cluster membership and fault detection.
>refactoring
Team:Distributed (Obsolete)
Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
>tech debt
v8.7.0
WIP
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Today the master's pending task queue is just the
PriorityBlockingQueue<Runnable>
belonging to the underlyingThreadPoolExecutor
. The reasons for this date back a long way but itdoesn't really reflect the structure of the queue as it exists today. In
particular, we must keep track of batches independently of the queue
itself, and must do various bits of unchecked casting to process
multiple items of the same type at once.
This commit introduces an new queueing mechanism, independent of the
executor's queue, which better represents the conceptual structure of
the master's pending tasks:
Today we use a priority queue to allow important tasks to preempt
less-important ones. However there are only a small number of priority
levels, so it is simpler to maintain a queue for each priority,
effectively replacing the sorting within the priority queue with a
radix sort.
Today when a task is submitted we perform a map lookup to see if it
can be added to an existing batch or not. With this change we allow
client code to create its own dedicated queue of tasks. The entries in
the per-priority-level queues are themselves queues, one for each
executor, representing the batches to be run.
Today each task in the queue holds a reference to its executor, but
the executor used to run a task may belong to a different task in the
same batch. In practice we know they're the same executor (that's how
batches are defined) but we cannot express this knowledge in the type
system so we have to do a bunch of unchecked casting to work around it.
With this change we associate each per-executor queue directly with its
executor, avoiding the need to do all this unchecked casting.
Today the master service must block its thread while waiting for each
task to complete, because otherwise the executor would start to process
the next task in the queue. This makes testing using a
DeterministicTaskQueue
harder (seeFakeThreadPoolMasterService
).With this change we avoid submitting a task to the
ThreadPoolExecutor
until the previous task is complete, which means we can make the
implementation avoid blocking while a task is running and therefore run
the whole production implementation in deterministic tests1.
Today it's possible for a steady drip of high-priority tasks to starve
low-priority tasks of access to the master for an extended period of
time. With this change we separate the queue of tasks from the queue
which determines the execution order. This allows us to implement more
intelligent execution policies. For instance, if we detect that the
queue has not processed any low-priority tasks for too long then we can
make the decision to boost their priorities2.
Relates #81626
Footnotes
Not done yet but this is a step in the right direction. ↩
Not done yet but this is a step in the right direction. ↩