Allow scaling executors to reject tasks after shutdown #81856

tlrx · 2021-12-17T13:19:50Z

Today scaling thread pools never reject tasks but always add them to the queue of task the execute, even in the case the thread pool executor is shutting down or terminated. This behaviour does not work great when a task is blocked waiting for another task from another scaling thread pool to complete an I/O operation which will never be executed if the task was enqueued just before the scaling thread pool was shutting down.

This situation is more likely to happen with searchable snapshots in which multiple threads can be blocked waiting for parts of Lucene files to be fetched and made available in cache. We saw tests failures in CI where Lucene 9 uses concurrent threads (to asynchronously checks indices) that were blocked waiting for cache files to be available and failing because of leaking files handles (see #77017, #77178).

This pull request changes the ForceQueuePolicy used by scaling thread pools so that it now accepts a rejectAfterShutdown flag which can be set on a per thread pool basis to indicate when tasks should just be rejected once the thread pool is shut down. Because we rely on many scaling thread pools to be black holes and never reject tasks, this flag is set to false on most of them to keep the current behavior. In some cases where the rejection logic was already implemented correctly this flag has been set to true.

This pull request also reimplements the interface XRejectedExecutionHandler into an abstract class EsRejectedExecutionHandler that allows to share some logic for rejections.

elasticmachine · 2021-12-17T13:19:53Z

Pinging @elastic/es-core-infra (Team:Core/Infra)

elasticsearchmachine · 2021-12-17T13:20:16Z

Hi @tlrx, I've created a changelog YAML for you.

MaratCrash

@tlrx great PR, thank you!

MaratCrash · 2021-12-18T14:31:37Z

server/src/internalClusterTest/java/org/elasticsearch/snapshots/SnapshotStressTestsIT.java

@@ -210,7 +210,7 @@ private static void logAndFailTest(Exception e) {
        private final ThreadPool threadPool = new TestThreadPool(
            "TrackedCluster",
            // a single thread for "client" activities, to limit the number of activities all starting at once
-            new ScalingExecutorBuilder(CLIENT, 1, 1, TimeValue.ZERO, CLIENT)
+            new ScalingExecutorBuilder(CLIENT, 1, 1, TimeValue.ZERO, true, CLIENT)


Please, can we import static constant here and use ZERO instead TimeValue.ZERO? The code will be cleaner. But its just my opinion.

I prefer to keep it the way it is :)

MaratCrash · 2021-12-18T14:36:56Z

server/src/main/java/org/elasticsearch/common/util/concurrent/EsRejectedExecutionHandler.java

+    }
+
+    protected final EsRejectedExecutionException newRejectedException(Runnable r, ThreadPoolExecutor executor, boolean isExecutorShutdown) {
+        rejected.inc();


For what purpose the inc() method is called here? I mean the method name is newRejectedException, but in the method implementation we also have an increment. I think its not so clear.

I agree, I pushed 6e2d87a

MaratCrash · 2021-12-18T14:38:09Z

server/src/main/java/org/elasticsearch/threadpool/ThreadPool.java

-        builders.put(Names.GENERIC, new ScalingExecutorBuilder(Names.GENERIC, 4, genericThreadPoolMax, TimeValue.timeValueSeconds(30)));
+        builders.put(
+            Names.GENERIC,
+            new ScalingExecutorBuilder(Names.GENERIC, 4, genericThreadPoolMax, TimeValue.timeValueSeconds(30), false)


Could we move these values (4 and 30) to constants please?

This PR does not change that so I would prefer to leave this as is. We would have to make all the numbers here constants if we did this and I think that would make the code here harder to read.

I agree with Henning here.

henningandersen

This looks good. I left a few comments.

henningandersen · 2021-12-19T13:46:15Z

server/src/main/java/org/elasticsearch/common/util/concurrent/EsExecutors.java


        @Override
        public void rejectedExecution(Runnable r, ThreadPoolExecutor executor) {
+            if (rejectAfterShutdown && executor.isShutdown()) {
+                throw newRejectedException(r, executor, true);
+            }


Is there a potential for a race condition here, though very unlikely? If we get into rejectedExecution due to all threads active but pool is shutdown here and then all threads go inactive too before we put the runnable on the queue?

After reading the code again I think you are right. I pushed a939ea6 to add another test that executes tasks while concurrently shutting down the executor, making it more likely to have the race condition (on slow machines, it only fails 1 on 10K on my workstation though).

I reworked the logic in 0d89228 to avoid the race condition, let me know what you think.

henningandersen · 2021-12-19T15:51:27Z

server/src/main/java/org/elasticsearch/threadpool/ThreadPool.java

-        builders.put(Names.GENERIC, new ScalingExecutorBuilder(Names.GENERIC, 4, genericThreadPoolMax, TimeValue.timeValueSeconds(30)));
+        builders.put(
+            Names.GENERIC,
+            new ScalingExecutorBuilder(Names.GENERIC, 4, genericThreadPoolMax, TimeValue.timeValueSeconds(30), false)


This PR does not change that so I would prefer to leave this as is. We would have to make all the numbers here constants if we did this and I think that would make the code here harder to read.

henningandersen · 2021-12-19T15:54:15Z

...snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/SearchableSnapshots.java

@@ -561,19 +561,25 @@ protected XPackLicenseState getLicenseState() {

    public static ScalingExecutorBuilder[] executorBuilders(Settings settings) {
        final int processors = EsExecutors.allocatedProcessors(settings);
+        // searchable snapshots cache thread pools should always reject tasks once they are shutting down, otherwise some threads might be


Did you look into adding a test provoking this specific issue consistently?

I reproduced the issue in a test but it did not fail consistently and was way much too complex to maintain, while the current black hole behavior is more easily reproducible consistently in a thread pool test, so I went this way.

henningandersen · 2021-12-19T19:39:18Z

qa/evil-tests/src/test/java/org/elasticsearch/threadpool/EvilThreadPoolTests.java

@@ -190,6 +191,7 @@ public void testExecutionExceptionOnScalingESThreadPoolExecutor() throws Interru
            1,
            10,
            TimeUnit.SECONDS,
+            false,


I think we can check both true and false here?

Suggested change

false,

randomBoolean(),

Sure, I changed this in ad729c9

server/src/main/java/org/elasticsearch/common/util/concurrent/EsExecutors.java

henningandersen · 2021-12-19T19:47:41Z

server/src/test/java/org/elasticsearch/threadpool/ScalingThreadPoolTests.java

+            for (int i = 0; i < queuedAfterShutdown; i++) {
+                execute(scalingExecutor, () -> {}, executed, rejected, failed);
+            }
+            assertThat(scalingExecutor.getQueue().size(), rejectAfterShutdown ? equalTo(queued) : equalTo(queued + queuedAfterShutdown));


Can we also validate that rejected has 0 or queuedAfterShutdown dependent on rejectAfterShutdown here?

Sure, I changed this in ad729c9

henningandersen · 2021-12-19T19:50:58Z

server/src/test/java/org/elasticsearch/threadpool/ScalingThreadPoolTests.java

+
+            block.countDown();
+
+            assertBusy(() -> assertTrue(scalingExecutor.isTerminated()));


Should we also verify adding new tasks after termination are rejected?

++, changed in ad729c9

server/src/test/java/org/elasticsearch/threadpool/ScalingThreadPoolTests.java

tlrx · 2022-01-11T07:58:09Z

Sorry for the delay @henningandersen, the race condition took me some time to fix. Can you please have another look? Thanks

henningandersen

LGTM. I left a few smaller comments, but no need for another round.

henningandersen · 2022-01-20T10:59:43Z

server/src/main/java/org/elasticsearch/common/util/concurrent/EsExecutors.java

-                assert executor.getQueue() instanceof ExecutorScalingQueue;
-                executor.getQueue().put(r);
+                assert queue instanceof ExecutorScalingQueue;
+                queue.put(task);


I wonder if we should check if it is shutdown prior to adding to the queue (in addition to the check after adding it)? That would avoid the risk of the task being picked up during shutdown and only leave this "risk" for concurrent races, where it would be perfectly OK to run the task.

Makes sense, I changed that in 3a4e5f4

henningandersen · 2022-01-20T11:02:19Z

server/src/main/java/org/elasticsearch/common/util/concurrent/EsExecutors.java

+                if (rejectAfterShutdown) {
+                    if (executor.isShutdown() && executor.remove(task)) {


nit: I would find this slightly more logical as:

Suggested change

if (rejectAfterShutdown) {

if (executor.isShutdown() && executor.remove(task)) {

if (rejectAfterShutdown && executor.isShutdown()) {

if (executor.remove(task)) {

since that seems to be the special case. With the remove being mutating I like that in its own condition to not think about and/or and order of evaluation, but could also collapse into one if statement.

This is more logical I agree. (I collapsed the statements)

henningandersen · 2022-01-20T11:04:43Z

server/src/main/java/org/elasticsearch/common/util/concurrent/EsRejectedExecutionHandler.java

+    }
+
+    protected final EsRejectedExecutionException newRejectedException(Runnable r, ThreadPoolExecutor executor, boolean isExecutorShutdown) {
+        return new EsRejectedExecutionException("rejected execution of " + r + " on " + executor, isExecutorShutdown);


I wonder if we should add info to the exception message about it being shutdown when isExecutorShutdown=true? I think that logging the exception will not show the flag otherwise.

I added (shutdown) in the message for that.

henningandersen · 2022-01-20T11:16:43Z

server/src/test/java/org/elasticsearch/threadpool/ScalingThreadPoolTests.java

+
+            final Matcher<Long> executionsMatcher = rejectAfterShutdown
+                ? equalTo((long) max + queued)
+                : greaterThanOrEqualTo((long) max + queued);


Can we also check that it is lessThanOrEqualTo(max + queued + queuedAfterShutdown)?

henningandersen · 2022-01-20T11:21:24Z

server/src/test/java/org/elasticsearch/threadpool/ScalingThreadPoolTests.java

+            }
+
+            assertBusy(() -> assertTrue(scalingExecutor.isTerminated()));
+            assertThat(scalingExecutor.getCompletedTaskCount(), greaterThanOrEqualTo((long) max));


also here can we check less than or equal to (max + barrier.getParties() - 1)?

henningandersen · 2022-01-20T11:23:02Z

server/src/test/java/org/elasticsearch/threadpool/ScalingThreadPoolTests.java

+            assertThat(scalingExecutor.getCompletedTaskCount(), greaterThanOrEqualTo((long) max));
+            assertThat(scalingExecutor.getQueue().size(), equalTo(0));
+            assertThat(scalingExecutor.getActiveCount(), equalTo(0));
+


Can we check that scalingExecutor.getCompletedTaskCount() + rejected.get() == max + barrier.getParties() - 1? To ensure every request is accounted for exactly once.

Today scaling thread pools never reject tasks but always add them to the queue of task the execute, even in the case the thread pool executor is shutting down or terminated. This behaviour does not work great when a task is blocked waiting for another task from another scaling thread pool to complete an I/O operation which will never be executed if the task was enqueued just before the scaling thread pool was shutting down. This situation is more likely to happen with searchable snapshots in which multiple threads can be blocked waiting for parts of Lucene files to be fetched and made available in cache. We saw tests failures in CI where Lucene 9 uses concurrent threads (to asynchronously checks indices) that were blocked waiting for cache files to be available and failing because of leaking files handles (see elastic#77017, elastic#77178). This pull request changes the `ForceQueuePolicy` used by scaling thread pools so that it now accepts a `rejectAfterShutdown` flag which can be set on a per thread pool basis to indicate when tasks should just be rejected once the thread pool is shut down. Because we rely on many scaling thread pools to be black holes and never reject tasks, this flag is set to `false` on most of them to keep the current behavior. In some cases where the rejection logic was already implemented correctly this flag has been set to `true`. This pull request also reimplements the interface `XRejectedExecutionHandler` into an abstract class `EsRejectedExecutionHandler` that allows to share some logic for rejections.

elasticsearchmachine · 2022-01-24T10:36:58Z

💔 Backport failed

Status	Branch	Result
✅	8.0
❌	7.17	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 81856

Today scaling thread pools never reject tasks but always add them to the queue of task the execute, even in the case the thread pool executor is shutting down or terminated. This behaviour does not work great when a task is blocked waiting for another task from another scaling thread pool to complete an I/O operation which will never be executed if the task was enqueued just before the scaling thread pool was shutting down. This situation is more likely to happen with searchable snapshots in which multiple threads can be blocked waiting for parts of Lucene files to be fetched and made available in cache. We saw tests failures in CI where Lucene 9 uses concurrent threads (to asynchronously checks indices) that were blocked waiting for cache files to be available and failing because of leaking files handles (see elastic#77017, elastic#77178). This pull request changes the `ForceQueuePolicy` used by scaling thread pools so that it now accepts a `rejectAfterShutdown` flag which can be set on a per thread pool basis to indicate when tasks should just be rejected once the thread pool is shut down. Because we rely on many scaling thread pools to be black holes and never reject tasks, this flag is set to `false` on most of them to keep the current behavior. In some cases where the rejection logic was already implemented correctly this flag has been set to `true`. This pull request also reimplements the interface `XRejectedExecutionHandler` into an abstract class `EsRejectedExecutionHandler` that allows to share some logic for rejections.

Today scaling thread pools never reject tasks but always add them to the queue of task the execute, even in the case the thread pool executor is shutting down or terminated. This behaviour does not work great when a task is blocked waiting for another task from another scaling thread pool to complete an I/O operation which will never be executed if the task was enqueued just before the scaling thread pool was shutting down. This situation is more likely to happen with searchable snapshots in which multiple threads can be blocked waiting for parts of Lucene files to be fetched and made available in cache. We saw tests failures in CI where Lucene 9 uses concurrent threads (to asynchronously checks indices) that were blocked waiting for cache files to be available and failing because of leaking files handles (see #77017, #77178). This pull request changes the `ForceQueuePolicy` used by scaling thread pools so that it now accepts a `rejectAfterShutdown` flag which can be set on a per thread pool basis to indicate when tasks should just be rejected once the thread pool is shut down. Because we rely on many scaling thread pools to be black holes and never reject tasks, this flag is set to `false` on most of them to keep the current behavior. In some cases where the rejection logic was already implemented correctly this flag has been set to `true`. This pull request also reimplements the interface `XRejectedExecutionHandler` into an abstract class `EsRejectedExecutionHandler` that allows to share some logic for rejections.

Today scaling thread pools never reject tasks but always add them to the queue of task the execute, even in the case the thread pool executor is shutting down or terminated. This behaviour does not work great when a task is blocked waiting for another task from another scaling thread pool to complete an I/O operation which will never be executed if the task was enqueued just before the scaling thread pool was shutting down. This situation is more likely to happen with searchable snapshots in which multiple threads can be blocked waiting for parts of Lucene files to be fetched and made available in cache. We saw tests failures in CI where Lucene 9 uses concurrent threads (to asynchronously checks indices) that were blocked waiting for cache files to be available and failing because of leaking files handles (see #77017, #77178). This pull request changes the `ForceQueuePolicy` used by scaling thread pools so that it now accepts a `rejectAfterShutdown` flag which can be set on a per thread pool basis to indicate when tasks should just be rejected once the thread pool is shut down. Because we rely on many scaling thread pools to be black holes and never reject tasks, this flag is set to `false` on most of them to keep the current behavior. In some cases where the rejection logic was already implemented correctly this flag has been set to `true`. This pull request also reimplements the interface `XRejectedExecutionHandler` into an abstract class `EsRejectedExecutionHandler` that allows to share some logic for rejections. Backport of #81856

Allow scaling executors to reject tasks after shutdown

6b87b99

tlrx added >enhancement :Core/Infra/Core Core issues without another label v8.0.0 v8.1.0 v7.17.0 labels Dec 17, 2021

elasticmachine added the Team:Core/Infra Meta label for core/infra team label Dec 17, 2021

Update docs/changelog/81856.yaml

b3578c9

tlrx requested a review from henningandersen December 17, 2021 20:01

MaratCrash reviewed Dec 18, 2021

View reviewed changes

henningandersen reviewed Dec 19, 2021

View reviewed changes

tlrx added 9 commits December 26, 2021 17:21

update tests

ad729c9

Merge branch 'master' into allow-scaling-thread-pool-executors-to-reject

5a60bf7

spotless

63bf0ea

Merge branch 'master' into allow-scaling-thread-pool-executors-to-reject

554311d

Merge branch 'master' into allow-scaling-thread-pool-executors-to-reject

5c5a3fa

incrementRejections

6e2d87a

test

a939ea6

remove task

0d89228

comment

cb3b486

tlrx requested a review from henningandersen January 11, 2022 07:57

tlrx added 2 commits January 14, 2022 18:12

Merge branch 'master' into allow-scaling-thread-pool-executors-to-reject

23f89ad

Merge branch 'master' into allow-scaling-thread-pool-executors-to-reject

e67e1de

henningandersen approved these changes Jan 20, 2022

View reviewed changes

albertzaharovits added v7.17.1 and removed v7.17.0 labels Jan 20, 2022

Merge branch 'master' into allow-scaling-thread-pool-executors-to-reject

208fdc4

tlrx added 2 commits January 20, 2022 18:11

feedback

6a15408

feedback

3a4e5f4

tlrx added auto-backport-and-merge auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) labels Jan 24, 2022

Merge branch 'master' into allow-scaling-thread-pool-executors-to-reject

b36ca63

elasticsearchmachine merged commit 24e1888 into elastic:master Jan 24, 2022

tlrx mentioned this pull request Jan 24, 2022

[8.0] Allow scaling executors to reject tasks after shutdown (#81856) #82926

Merged

tlrx mentioned this pull request Jan 24, 2022

Allow scaling executors to reject tasks after shutdown (#81856) #82931

Merged

tlrx mentioned this pull request Jan 24, 2022

[CI] FrozenSearchableSnapshotsIntegTests classMethod failing #77017

Closed

albertzaharovits added v7.17.0 and removed v7.17.1 labels Jan 25, 2022


		block.countDown();

		assertBusy(() -> assertTrue(scalingExecutor.isTerminated()));

		if (rejectAfterShutdown) {
		if (executor.isShutdown() && executor.remove(task)) {

Allow scaling executors to reject tasks after shutdown #81856

Allow scaling executors to reject tasks after shutdown #81856

Conversation

tlrx commented Dec 17, 2021

elasticmachine commented Dec 17, 2021

elasticsearchmachine commented Dec 17, 2021

MaratCrash left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

henningandersen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlrx commented Jan 11, 2022

henningandersen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elasticsearchmachine commented Jan 24, 2022

💔 Backport failed