Repo test kit to use more than 1 snapshot thread #92520

kingherc · 2022-12-22T09:59:40Z

The original issue surfaced when we started testing with 1 processor in 8.5+. It resulted in 1 SNAPSHOT thread and made the timeout of test as explained in this comment.

In 8.6 at some point the SNAPSHOT threads were turned to max 10, so this issue stopped. The issue was solved for the 8.5 branch with #91226 that made the test use 2 SNAPSHOT threads.

But #92392 made it so that there can be 1 SNAPSHOT thread in cases of low max memory, so the issue resurfaced in 8.6 branch and main. This PR makes the test use 2 SNAPSHOT threads in 8.6 and main.

Fixes elastic#90353

elasticsearchmachine · 2022-12-22T10:00:04Z

Pinging @elastic/es-distributed (Team:Distributed)

DaveCTurner · 2022-12-22T10:28:53Z

Hmm this sounds like it could be a genuine bug with one thread tho - at least, increasing to two threads shouldn't fix this because we don't interrupt that sleep() anywhere so if this does fix things then we're not waiting for that upload to complete for some reason.

1B/s is unrealistically slow - we check for cancellation on every read from the simulated stream but we're using Streams.copy which has an 8kiB buffer, so we'll only check for cancellation every 8k seconds. Could we instead set the minimum upload speed to something that does allow the timeout to work even with a single thread?

DaveCTurner · 2022-12-22T10:35:41Z

... or use a CancellableThreads to interrupt the sleep. That'd be nice IMO.

kingherc · 2022-12-22T10:44:53Z

Hi @DaveCTurner ! Thanks for the comment. Indeed you're right. I had tried many variations (see #90749 ) but ultimately discussed with @original-brownbear that simply setting to 2 threads would be preferable to make sure the test really tests a very (unrealistically) slow repos.

I think tinkering with the speeds will also make it prone to future errors if H/W characteristics change the speed of how things are executing. I had a really hard time finding back at the time some settings that made it work.

The main cause is that org.elasticsearch.repositories.blobstore.testkit.RepositoryAnalyzeAction.AsyncAction#onWorkerCompletion() handles the timeout exception via the SNAPSHOT thread pool, which in this case has 1 thread which is stuck in sleep() in the RateLimiter as you mentioned. We could use another thread pool here as an alternative solution?

I understand the scary part is the stuck thread through. It is stuck due to the 1b/s rate limit. Your idea of CancellableThreads is interesting, I will investigate the code to see how these work and whether we can use it. But also, just to mention, that the test is unrealistic. I am not sure this can happen in real life. I mean, if this could happen, then this would be a problem for all threads using a RateLimiter, such as recoveries and snapshots, not only for the repos analysis component, right? Theoretically all those threads could get stuck in the sleep() of the RateLimiter I fear. And thus I am also not sure how much effort these unrealistic scenarios are worth putting in?

DaveCTurner · 2022-12-22T10:50:43Z

You're right, this is something of a general problem. However recoveries already use CancellableThreads to abort IO (and rate-limiting sleeps) if necessary, and snapshots don't really have such a strong notion of prompt cancellation, so I think it's not as bad as it might first appear.

DaveCTurner · 2022-12-22T10:54:19Z

Oh I see, we time out in the transport layer so we consider this worker to have completed even though it's still running. So really as well as the CancellableThreads change we'd need to address #66992.

DaveCTurner · 2022-12-22T11:01:34Z

... and note that the default for ?concurrency in this API is 10, which is also the snapshot threadpool size, so if this is a problem in this test then it will be a problem in production too (although in practice we don't sleep for very long in prod, and IO should never get stuck forever).

kingherc · 2022-12-22T13:58:25Z

OK, @DaveCTurner , I've been reading the code around the repos analysis and the cancellable tasks and what you mentioned. I see that the RepositoryAnalyzeAction basically spawns a bunch of BlobAnalyzeAction children tasks (using the transportService.sendChildRequest method). When the 1second timeout is reached, the parent RepositoryAnalyzeAction task's handleException() method is called which then calls cancelTaskAndDescendants() on the parent task. However, if I understand correctly, this method does NOT send any cancellation messages to the children tasks, right? It seems to be only banning new tasks, and waiting for the ongoing tasks to complete, right?

DaveCTurner · 2022-12-22T14:38:21Z

Oh good point, I missed that in this specific case we have a workaround for #66992. This will ban new tasks, mark all child tasks as cancelled and then wait for them to complete. If each child were to register a listener with org.elasticsearch.tasks.CancellableTask#addListener then those listeners would be called on cancellation, which would be the right spot to interrupt any blocked threads.

kingherc · 2022-12-22T14:43:58Z

@DaveCTurner , I think we are getting closer :)

I did try that, by wrapping the main functionality inside the beginning of the org.elasticsearch.repositories.blobstore.testkit.BlobAnalyzeAction.BlobAnalysis#run function with the following:

            final CancellableThreads cancellableThreads = new CancellableThreads();
            task.addListener(() -> {
                logger.warn("RECEIVED CANCELLATION");
                cancellableThreads.cancel(task.getReasonCancelled());
            });
            cancellableThreads.execute(
                () -> writeRandomBlob(
                    request.readEarly || request.getAbortWrite() || (request.targetLength <= MAX_ATOMIC_WRITE_SIZE && random.nextBoolean()),
                    true,
                    this::onLastReadForInitialWrite,
                    write1Step
                )
            );

However, I did not see it triggered. So I think I am missing something. I still haven't found the spot in the code that cancelTaskAndDescendants() actually sends some sort of cancellation message to remote children tasks...?

kingherc · 2022-12-22T14:46:23Z

mark all child tasks as cancelled

I still haven't found where this is happening

kingherc · 2022-12-22T14:54:08Z

Hmm, it seems I had to register the listener on the parent task rather than the the child task itself. Because this works:

            final CancellableThreads cancellableThreads = new CancellableThreads();
            CancellableTask parentTask = (CancellableTask) taskManager.getTask(task.getParentTaskId().getId());
            parentTask.addListener(() -> {
                cancellableThreads.cancel(task.getReasonCancelled());
            });
            cancellableThreads.execute(
                () -> writeRandomBlob(
                    request.readEarly || request.getAbortWrite() || (request.targetLength <= MAX_ATOMIC_WRITE_SIZE && random.nextBoolean()),
                    true,
                    this::onLastReadForInitialWrite,
                    write1Step
                )
            );

I was mistakenly thinking that the cancellation of the parent task would send "cancellation requests" to the children tasks.

I'll re-work the PR.

DaveCTurner · 2022-12-22T15:06:33Z

I still haven't found where this is happening

It's here, when the receiving node receives the ban request it cancels all the running children of the banned parent:

elasticsearch/server/src/main/java/org/elasticsearch/tasks/TaskCancellationService.java

Lines 320 to 322 in 2d2b82b

    
           for (CancellableTask childTask : childTasks) { 
        
               cancelTaskAndDescendants(childTask, request.reason, request.waitForCompletion, listener); 
        
           }

Hmm, it seems I had to register the listener on the parent task rather than the the child task itself. Because this works:

I think this will only work if the parent and child tasks are on the same node. Otherwise taskManager.getTask(task.getParentTaskId().getId()) won't find the parent task.

kingherc · 2022-12-22T15:09:37Z

OK, so my thinking was correct. Then, there is some bug or some issue I'm ignoring because adding a listener on the child task did not work. Maybe there's some sort of optimization for local tasks? I'll try to figure out why I'm not getting the notification by working my way in the code from the method you mentioned (thanks!).

kingherc · 2022-12-22T15:17:57Z

Hmm, that code path you mentioned is not called. Specifically I see the following order of executions in the log:

[2022-12-22T17:13:13,732][TRACE][o.e.r.b.t.RepositoryAnalyzeAction] [yamlRestTest-0] processing [VerifyBlobTask{node={yamlRestTest-0}{1FYznhT7RsebJrEr-S-3dg}{FWA_LR_SSY-BPlzEyWMJdQ}{yamlRestTest-0}{127.0.0.1}{127.0.0.1:37789}{cdfhilmrstw}{testattr=test, ml.allocated_processors_double=1.0, ml.machine_memory=66789920768, xpack.installed=true, ml.max_jvm_size=536870912, ml.allocated_processors=1}, request=BlobAnalyzeAction.Request{blob analysis [test_repo_slow:temp-analysis-C9WIbkqxSUCjZQwKZf1Uhg/test-blob-0-xLBn58ElSpOtVrBNpKZVWg, length=2147483648, seed=-1818594403331045661, readEarly=false, writeAndOverwrite=false, abortWrite=false]}}]
[2022-12-22T17:13:13,733][TRACE][o.e.t.TaskManager        ] [yamlRestTest-0] register child connection [local node connection] task [148]
[2022-12-22T17:13:13,733][TRACE][o.e.t.T.tracer           ] [yamlRestTest-0] [44][cluster:admin/repository/analyze/blob] sent to [{yamlRestTest-0}{1FYznhT7RsebJrEr-S-3dg}{FWA_LR_SSY-BPlzEyWMJdQ}{yamlRestTest-0}{127.0.0.1}{127.0.0.1:37789}{cdfhilmrstw}{testattr=test, ml.allocated_processors_double=1.0, ml.machine_memory=66789920768, xpack.installed=true, ml.max_jvm_size=536870912, ml.allocated_processors=1}] (timeout: [1s])
[2022-12-22T17:13:13,733][TRACE][o.e.t.T.tracer           ] [yamlRestTest-0] [44][cluster:admin/repository/analyze/blob] received request
[2022-12-22T17:13:13,733][TRACE][o.e.t.TaskManager        ] [yamlRestTest-0] unregister task for id: 146
[2022-12-22T17:13:13,816][TRACE][o.e.x.w.t.s.e.TickerScheduleTriggerEngine] [yamlRestTest-0] checking jobs [2022-12-22T15:13:13.816925827Z]
[2022-12-22T17:13:14,317][TRACE][o.e.x.w.t.s.e.TickerScheduleTriggerEngine] [yamlRestTest-0] checking jobs [2022-12-22T15:13:14.317303862Z]
[2022-12-22T17:13:14,551][TRACE][o.e.x.t.t.s.TransformScheduler] [yamlRestTest-0] Processing scheduled tasks started
[2022-12-22T17:13:14,551][TRACE][o.e.x.t.t.s.TransformScheduler] [yamlRestTest-0] Processing scheduled tasks finished, took 0ms
[2022-12-22T17:13:14,733][TRACE][o.e.t.TaskManager        ] [yamlRestTest-0] unregister child connection [local node connection] task [148]
[2022-12-22T17:13:14,734][DEBUG][o.e.r.b.t.RepositoryAnalyzeAction] [yamlRestTest-0] failed [VerifyBlobTask{node={yamlRestTest-0}{1FYznhT7RsebJrEr-S-3dg}{FWA_LR_SSY-BPlzEyWMJdQ}{yamlRestTest-0}{127.0.0.1}{127.0.0.1:37789}{cdfhilmrstw}{testattr=test, ml.allocated_processors_double=1.0, ml.machine_memory=66789920768, xpack.installed=true, ml.max_jvm_size=536870912, ml.allocated_processors=1}, request=BlobAnalyzeAction.Request{blob analysis [test_repo_slow:temp-analysis-C9WIbkqxSUCjZQwKZf1Uhg/test-blob-0-xLBn58ElSpOtVrBNpKZVWg, length=2147483648, seed=-1818594403331045661, readEarly=false, writeAndOverwrite=false, abortWrite=false]}}] org.elasticsearch.transport.ReceiveTimeoutTransportException: [yamlRestTest-0][127.0.0.1:37789][cluster:admin/repository/analyze/blob] request_id [44] timed out after [1001ms]
[2022-12-22T17:13:14,734][TRACE][o.e.t.TaskCancellationService] [yamlRestTest-0] cancelling task [1FYznhT7RsebJrEr-S-3dg:148] and its descendants
[2022-12-22T17:13:14,734][TRACE][o.e.t.TaskCancellationService] [yamlRestTest-0] child tasks of parent [1FYznhT7RsebJrEr-S-3dg:148] are completed
[2022-12-22T17:13:14,734][TRACE][o.e.t.TaskManager        ] [yamlRestTest-0] cancelling task with id 148

You will notice that the unregister child connection happens BEFORE the failure of the parent task that calls the cancelTaskAndDescendants(). That means that it thinks it has no children and does not send ban messages. I wonder why this asynchrony is happening.

DaveCTurner · 2022-12-22T15:27:46Z

Huh, yes, this looks like a bug.

kingherc · 2022-12-22T16:37:17Z

I discussed further with @DaveCTurner . We think it makes sense to make some small infrastructure to also send a new Action to cancel a remote child task when org.elasticsearch.tasks.TaskManager.CancellableTaskHolder#unregisterChildConnection() is called with a failure (e.g., a timeout). I will see if I can implement this easily.

There's a more general conversation that we could extend such a logic to generic tasks, but we thought it was complex to handle for now.

kingherc · 2022-12-28T14:02:35Z

Closing in favor of #92588

Repo test kit to use more than 1 snapshot thread

411069a

Fixes elastic#90353

kingherc requested review from pxsalehi and original-brownbear December 22, 2022 09:59

kingherc self-assigned this Dec 22, 2022

kingherc mentioned this pull request Dec 28, 2022

Child requests proactively cancel children tasks #92588

Merged

kingherc closed this Dec 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repo test kit to use more than 1 snapshot thread #92520

Repo test kit to use more than 1 snapshot thread #92520

kingherc commented Dec 22, 2022

elasticsearchmachine commented Dec 22, 2022

DaveCTurner commented Dec 22, 2022

DaveCTurner commented Dec 22, 2022

kingherc commented Dec 22, 2022

DaveCTurner commented Dec 22, 2022

DaveCTurner commented Dec 22, 2022

DaveCTurner commented Dec 22, 2022

kingherc commented Dec 22, 2022

DaveCTurner commented Dec 22, 2022

kingherc commented Dec 22, 2022

kingherc commented Dec 22, 2022

kingherc commented Dec 22, 2022

DaveCTurner commented Dec 22, 2022

kingherc commented Dec 22, 2022

kingherc commented Dec 22, 2022

DaveCTurner commented Dec 22, 2022

kingherc commented Dec 22, 2022

kingherc commented Dec 28, 2022

Repo test kit to use more than 1 snapshot thread #92520

Repo test kit to use more than 1 snapshot thread #92520

Conversation

kingherc commented Dec 22, 2022

elasticsearchmachine commented Dec 22, 2022

DaveCTurner commented Dec 22, 2022

DaveCTurner commented Dec 22, 2022

kingherc commented Dec 22, 2022

DaveCTurner commented Dec 22, 2022

DaveCTurner commented Dec 22, 2022

DaveCTurner commented Dec 22, 2022

kingherc commented Dec 22, 2022

DaveCTurner commented Dec 22, 2022

kingherc commented Dec 22, 2022

kingherc commented Dec 22, 2022

kingherc commented Dec 22, 2022

DaveCTurner commented Dec 22, 2022

kingherc commented Dec 22, 2022

kingherc commented Dec 22, 2022

DaveCTurner commented Dec 22, 2022

kingherc commented Dec 22, 2022

kingherc commented Dec 28, 2022