Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TransportGetTaskAction: Wait for the task asynchronously #93375

Merged
merged 25 commits into from
Feb 6, 2023

Conversation

arteam
Copy link
Contributor

@arteam arteam commented Jan 31, 2023

Wait for the requested task asynchronously in a similar fashion to TransportListTaskAction from #90977

See #90977

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work 😄

@arteam arteam force-pushed the transport-get-task-action-async branch from 938eb31 to a3c20f3 Compare January 31, 2023 11:08
@arteam
Copy link
Contributor Author

arteam commented Jan 31, 2023

@elasticmachine update branch

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, now it just needs a test 😁

@arteam arteam added the :Distributed Coordination/Task Management Issues for anything around the Tasks API - both persistent and node level. label Feb 1, 2023
@arteam
Copy link
Contributor Author

arteam commented Feb 1, 2023

@elasticmachine update branch

@@ -152,6 +152,10 @@
- set: {task: task}

- do:
# Inherits the warning from the previous task
warnings:
- The sort option in reindex is deprecated. Instead consider using query
Copy link
Contributor Author

@arteam arteam Feb 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like with the latest change getTask now inherits warnings from the previous task (not sure why it wasn't the case with a blocking call in the generic pool?) which is technically is a breaking change :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this is because we call the RemovedTaskListener (and therefore complete the future) in the thread context of the task that's being removed. We need to fix that, we can't just leak context across responses like this.

@arteam arteam marked this pull request as ready for review February 1, 2023 18:03
@elasticsearchmachine elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Feb 1, 2023
@elasticsearchmachine
Copy link
Collaborator

Hi @arteam, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@@ -152,6 +152,10 @@
- set: {task: task}

- do:
# Inherits the warning from the previous task
warnings:
- The sort option in reindex is deprecated. Instead consider using query
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this is because we call the RemovedTaskListener (and therefore complete the future) in the thread context of the task that's being removed. We need to fix that, we can't just leak context across responses like this.

// The task has already finished, we can run the completion listener in the same thread
waitedForCompletionListener.onResponse(null);
} else {
future.addListener(waitedForCompletionListener);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... like this

Suggested change
future.addListener(waitedForCompletionListener);
future.addListener(
ContextPreservingActionListener.wrapPreservingContext(waitedForCompletionListener, threadPool.getThreadContext())
);

We also need to do this for the list-tasks action, and ideally we'd have a test for it there too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry this was wrong, when we wrap it like this the response headers will still propagate. We have to do it like this to drop the response headers:

new ContextPreservingActionListener<>(threadPool.getThreadContext().newRestorableContext(false), waitedForCompletionListener)

arteam and others added 2 commits February 1, 2023 22:57
…de/tasks/get/TransportGetTaskAction.java

Co-authored-by: David Turner <[email protected]>
@DaveCTurner
Copy link
Contributor

#93431 (almost) adds a test for this. The only missing bit is the // TODO line. So if we merge #93431 first we can uncomment that line in this PR and then it's good to go.

@@ -128,7 +128,7 @@ public void testWaitForCompletion() throws Exception {
}));

// briefly fill up the generic pool so that (a) we know the wait has started and (b) we know it's not blocking
// flushThreadPool(threadPool, ThreadPool.Names.GENERIC); // TODO it _is_ blocking right now!!, unmute this in #93375
flushThreadPool(threadPool, ThreadPool.Names.GENERIC);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm now that I look at this again, with these changes we're not using the GENERIC pool any more (indeed there's no forking at all when waiting for completion) so this line doesn't make sense. We can just drop this (and therefore inline flushThreadPool at its only other call-site).

@DaveCTurner
Copy link
Contributor

Just a couple of outstanding comments here but I'd quite like to get this in today @arteam - let us know if you won't have capacity and I'll find some elsewhere.

@arteam
Copy link
Contributor Author

arteam commented Feb 6, 2023

@DaveCTurner I spent some time looking at it on Thursday, but becase the reindex/30_search/Sorting deprecated wait_for_completion false test was still failing due to inheriting of deprecation messages.

Using ContextPreservingActionListener.wrapPreservingContext didn't help there. Also, unmuting the flushThreadPool lined in failed the ListTasksIT test.

@DaveCTurner
Copy link
Contributor

Hmm, it seems to work for me? I pushed the fix I think we need, let's see what I'm missing in CI.

I think we don't need the flushThreadPool call either, see #93375 (comment).

@arteam
Copy link
Contributor Author

arteam commented Feb 6, 2023

Thank you! I missed the comment about using newRestorableContext(false) and committed the first suggestion. It's obvious in the retrospective why the test failed.

@arteam arteam requested a review from DaveCTurner February 6, 2023 14:55
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 😄

@arteam arteam merged commit 7bd5613 into elastic:main Feb 6, 2023
@arteam arteam deleted the transport-get-task-action-async branch February 6, 2023 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Task Management Issues for anything around the Tasks API - both persistent and node level. >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants