Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Radix-sorted, lookup-free, typesafe, nonblocking, priority-boosting pending tasks #85751

Conversation

DaveCTurner
Copy link
Contributor

@DaveCTurner DaveCTurner commented Apr 7, 2022

Today the master's pending task queue is just the
PriorityBlockingQueue<Runnable> belonging to the underlying
ThreadPoolExecutor. The reasons for this date back a long way but it
doesn't really reflect the structure of the queue as it exists today. In
particular, we must keep track of batches independently of the queue
itself, and must do various bits of unchecked casting to process
multiple items of the same type at once.

This commit introduces an new queueing mechanism, independent of the
executor's queue, which better represents the conceptual structure of
the master's pending tasks:

  • Today we use a priority queue to allow important tasks to preempt
    less-important ones. However there are only a small number of priority
    levels, so it is simpler to maintain a queue for each priority,
    effectively replacing the sorting within the priority queue with a
    radix sort.

  • Today when a task is submitted we perform a map lookup to see if it
    can be added to an existing batch or not. With this change we allow
    client code to create its own dedicated queue of tasks. The entries in
    the per-priority-level queues are themselves queues, one for each
    executor, representing the batches to be run.

  • Today each task in the queue holds a reference to its executor, but
    the executor used to run a task may belong to a different task in the
    same batch. In practice we know they're the same executor (that's how
    batches are defined) but we cannot express this knowledge in the type
    system so we have to do a bunch of unchecked casting to work around it.
    With this change we associate each per-executor queue directly with its
    executor, avoiding the need to do all this unchecked casting.

  • Today the master service must block its thread while waiting for each
    task to complete, because otherwise the executor would start to process
    the next task in the queue. This makes testing using a
    DeterministicTaskQueue harder (see FakeThreadPoolMasterService).
    With this change we avoid submitting a task to the ThreadPoolExecutor
    until the previous task is complete, which means we can make the
    implementation avoid blocking while a task is running and therefore run
    the whole production implementation in deterministic tests1.

  • Today it's possible for a steady drip of high-priority tasks to starve
    low-priority tasks of access to the master for an extended period of
    time. With this change we separate the queue of tasks from the queue
    which determines the execution order. This allows us to implement more
    intelligent execution policies. For instance, if we detect that the
    queue has not processed any low-priority tasks for too long then we can
    make the decision to boost their priorities2.

Relates #81626

Footnotes

  1. Not done yet but this is a step in the right direction.

  2. Not done yet but this is a step in the right direction.

@DaveCTurner DaveCTurner added WIP >refactoring :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.3.0 labels Apr 7, 2022
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Apr 8, 2022
Today when submitting a cluster state update task the caller can
indicate that it is not to be included in a batch by using an executor
obtained from `ClusterStateTaskExecutor#unbatched()`. This is kind of
weird: the caller must not re-use the executor, and also the master
service has no practical way to handle this special case any
differently.

This commit introduces a dedicated API for unbatched tasks, opening the
door to some future simplifications.

Extracted from elastic#85751
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Apr 8, 2022
Today when submitting a cluster state update task the caller can
indicate that it is not to be included in a batch by using an executor
obtained from `ClusterStateTaskExecutor#unbatched()`. This is kind of
weird: the caller must not re-use the executor, and also the master
service has no practical way to handle this special case any
differently.

This commit introduces a dedicated API for unbatched tasks, opening the
door to some future simplifications.

Extracted from elastic#85751
@DaveCTurner DaveCTurner force-pushed the 2022-04-07-master-service-new-queues branch from 8d20e99 to db5e3bd Compare April 8, 2022 14:30
Today in `ClusterStateChanges` we use mocks to capture tasks submitted
to the master service in order to compute the relevant cluster state
updates ourselves. In fact there's no need to do this, we can use the
real master service with a customized executor. This commit does that.
Today when submitting a cluster state update task the caller can
indicate that it is not to be included in a batch by using an executor
obtained from `ClusterStateTaskExecutor#unbatched()`. This is kind of
weird: the caller must not re-use the executor, and also the master
service has no practical way to handle this special case any
differently.

This commit introduces a dedicated API for unbatched tasks, opening the
door to some future simplifications.
@DaveCTurner DaveCTurner force-pushed the 2022-04-07-master-service-new-queues branch from db5e3bd to 8bc6804 Compare April 8, 2022 16:00
…ending tasks

Today the master's pending task queue is just the
`PriorityBlockingQueue<Runnable>` belonging to the underlying
`ThreadPoolExecutor`. The reasons for this date back a long way but it
doesn't really reflect the structure of the queue as it exists today. In
particular, we must keep track of batches independently of the queue
itself, and must do various bits of unchecked casting to process
multiple items of the same type at once.

This commit introduces an new queueing mechanism, independent of the
executor's queue, which better represents the conceptual structure of
the master's pending tasks:

* Today we use a priority queue to allow important tasks to preempt
less-important ones. However there are only a small number of priority
levels, so it is simpler to maintain a queue for each priority,
effectively replacing the sorting within the priority queue with a
radix sort.

* Today when a task is submitted we perform a map lookup to see if it
can be added to an existing batch or not. With this change we allow
client code to create its own dedicated queue of tasks. The entries in
the per-priority-level queues are themselves queues, one for each
executor, representing the batches to be run.

* Today each task in the queue holds a reference to its executor, but
the executor used to run a task may belong to a different task in the
same batch. In practice we know they're the same executor (that's how
batches are defined) but we cannot express this knowledge in the type
system so we have to do a bunch of unchecked casting to work around it.
With this change we associate each per-executor queue directly with its
executor, avoiding the need to do all this unchecked casting.

* Today the master service must block its thread while waiting for each
task to complete, because otherwise the executor would start to process
the next task in the queue. This makes testing using a
`DeterministicTaskQueue` harder (see `FakeThreadPoolMasterService`).
With this change we avoid submitting a task to the `ThreadPoolExecutor`
until the previous task is complete, which means we can make the
implementation avoid blocking while a task is running and therefore run
the whole production implementation in deterministic tests[^1].

* Today it's possible for a steady drip of high-priority tasks to starve
low-priority tasks of access to the master for an extended period of
time. With this change we separate the queue of tasks from the queue
which determines the execution order. This allows us to implement more
intelligent execution policies. For instance, if we detect that the
queue has not processed any low-priority tasks for too long then we can
make the decision to boost their priorities[^2].

[^1]: Not done yet but this is a step in the right direction.

[^2]: Not done yet but this is a step in the right direction.
@DaveCTurner DaveCTurner force-pushed the 2022-04-07-master-service-new-queues branch from 8bc6804 to ffc3a97 Compare April 8, 2022 16:22
@DaveCTurner DaveCTurner force-pushed the 2022-04-07-master-service-new-queues branch from 76dc00e to e26a000 Compare April 9, 2022 20:27
@DaveCTurner DaveCTurner added >tech debt Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. labels Jul 12, 2022
@kingherc kingherc added v8.7.0 and removed v8.6.0 labels Nov 16, 2022
@DaveCTurner
Copy link
Contributor Author

Superseded by #92021

@DaveCTurner DaveCTurner deleted the 2022-04-07-master-service-new-queues branch March 6, 2023 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >refactoring Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >tech debt v8.7.0 WIP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants