Deserialize responses on the handling thread-pool #91367

original-brownbear · 2022-11-07T19:13:24Z

This is the start of moving message deserialization off of the transport threads where possible. This PR introduces the basic facilities to ref count and fork serialization of transport message instances which already provides some tangible benefits to transport thread latencies.

We can't not fork for large messages (which are mostly responses) in scenarios where responses can grow beyond O(1M) as this introduces unmanageable latency on the transport pool when e.g. deserializing a O(100M) cluster state or a similarly sized search response.

Some experimenting with aggressively forking things like index stats response handling shows visible master response time improvements and resulting speedups in e.g. snapshotting a large number of shards which benefits from more responsive master nodes.

relates #77466 (improves some API latencies but also follow-ups needed here for e.g. transport broadcast actions)
relates #90622 (though doesn't fix it, that would need additional work to fork the response handling)

This is the start of moving message deserialization off of the transport threads where possible. This PR introduces the basic facilities to ref count and fork serialization of transport message instances which already provides some tangible benefits to transport thread latencies. We can't not fork for large messages (which are mostly responses) in scenarios where responses can grow beyond O(1M) as this introduces unmanageable latency on the transport pool when e.g. deserializing a O(100M) cluster state or a similarly sized search response.

elasticsearchmachine · 2022-11-07T19:13:48Z

Pinging @elastic/es-distributed (Team:Distributed)

elasticsearchmachine · 2022-11-07T19:13:48Z

Hi @original-brownbear, I've created a changelog YAML for you.

Tim-Brooks · 2022-11-29T02:11:40Z

server/src/main/java/org/elasticsearch/transport/InboundHandler.java

+        InetSocketAddress remoteAddress,
+        final StreamInput stream,
+        final Header header,
+        Releasable releaseResponse


Can we name this like releaseResponseBuffer or releaseResponseBytes or something?

Tim-Brooks · 2022-11-29T02:12:20Z

server/src/main/java/org/elasticsearch/transport/InboundHandler.java

    ) {
        final T response;
        try {
-            response = handler.read(stream);
+            try (releaseResponse) {


Can't this go in the try block above? Is the second try necessary?

Right sorry about that, it's obviously unnecessary.

Tim-Brooks · 2022-11-29T02:16:25Z

server/src/main/java/org/elasticsearch/transport/InboundHandler.java

+            threadPool.executor(executor).execute(new ForkingResponseHandlerRunnable(handler, null) {
+                @Override
+                protected void doRun() {
+                    doHandleResponse(handler, remoteAddress, stream, inboundMessage.getHeader(), releaseBuffer);


I guess I don't totally understand why we need to pass the releaseBuffer mechanism into the method here.

onAfter already handles the release. I'm not totally clear why it matters if the doHandleResponse method clearly releases the thing. It's already being released in onAfter no matter what.

The motivation here was to release the buffer asap and not needlessly hold on to it until the handler is is done with the deserialized message. The onAfter was just put in place as a final fail-safe.

In fact we should always release it in doHandleResponse, a response handler should never be rejected (see assertions in ForkingResponseHandlerRunnable) and there's no chance an exception could prevent it either.

That said, I'm 👍 on paranoid leak prevention.

Tim-Brooks · 2022-11-29T02:17:30Z

server/src/main/java/org/elasticsearch/transport/InboundMessage.java

 import org.elasticsearch.core.IOUtils;
 import org.elasticsearch.core.Releasable;

 import java.io.IOException;
 import java.util.Objects;

-public class InboundMessage implements Releasable {
+public class InboundMessage extends AbstractRefCounted {


Can we assert that ref count is greater than 0 when openOrGetStreamInput is called?

original-brownbear · 2022-12-01T13:46:18Z

Thanks Tim, all points addressed now I think!

DaveCTurner

I left some suggestions/comments

DaveCTurner · 2022-12-01T14:55:00Z

server/src/main/java/org/elasticsearch/transport/InboundHandler.java

+            threadPool.executor(executor).execute(new ForkingResponseHandlerRunnable(handler, null) {
+                @Override
+                protected void doRun() {
+                    doHandleResponse(handler, remoteAddress, stream, inboundMessage.getHeader(), releaseBuffer);


In fact we should always release it in doHandleResponse, a response handler should never be rejected (see assertions in ForkingResponseHandlerRunnable) and there's no chance an exception could prevent it either.

That said, I'm 👍 on paranoid leak prevention.

server/src/main/java/org/elasticsearch/transport/InboundHandler.java

Tim-Brooks

LGTM

original-brownbear · 2022-12-02T10:30:28Z

Thanks Tim and David!

DaveCTurner · 2022-12-07T17:19:54Z

Just a follow-up comment/question here about memory usage and backpressure. Previously, a single thread would read bytes from the network and turn them into objects before passing them off to the executor's queue. Now we effectively put the raw bytes on the executor's queue which I guess can happen substantially faster.

I think this is ok, we don't really track the size of the deserialised objects anywhere and they're typically going to be larger than the raw bytes. But we can enqueue more items in the queue now, and maybe the fuller queue of bytes takes more memory than the previous bottlenecked queue of objects.

What backpressure mechanisms kick in to avoid this becoming a problem? These queues are (mostly) bounded so eventually we hit a limit there. And I think this doesn't apply to indexing which IIUC manages the forking itself so looks like it runs on SAME from the transport layer's perspective. Do we also track the bytes in a circuit breaker until they've been deserialised for instance?

original-brownbear · 2022-12-07T19:05:05Z

But we can enqueue more items in the queue now, and maybe the fuller queue of bytes takes more memory than the previous bottlenecked queue of objects.

Maybe, but on the other hand you have effects that go in the other direction. Namely, outbound messages go out quicker and they always take buffer + object memory for example (+ have higher overhead from our 16k pagesize than inbound buffers that are a little more optimally sized).
Also on a high level, yes we can queue more of these messages up now. But we save bytes per message in most cases while they are queued up. Plus we are talking about responses here, the node must have sent out requests in the first place to trigger these responses. Those requests themselves will probably often be triggered by other requests to the node, so real memory + transport breaker.
I would also add, that in recent times (last year+) I have not see a Netty buffer use issue that wasn't a result of outbound messages pilling up. That situation is improved here (potentially massively I believe).

What backpressure mechanisms kick in to avoid this becoming a problem?

As in the above, real memory + transport circuit breaker should indirectly deal with this in practice for most cases I believe. See above though, I don't think this is a real concern practically.
I've been talking about this for years though, we could just turn off auto-reading in some scenarios to reliably deal with this issue once and for all.

Do we also track the bytes in a circuit breaker until they've been deserialised for instance?

I don't think so, this is responses.

-> instinctively I'm not worried here and this definitely fixes observed bugs ... but I agree that it would be nice to go further and do better on the auto-read situation.

original-brownbear added >enhancement :Distributed Coordination/Network Http and internode communication implementations v8.6.0 labels Nov 7, 2022

elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Nov 7, 2022

Update docs/changelog/91367.yaml

923fb45

original-brownbear requested review from DaveCTurner, Tim-Brooks and henningandersen November 9, 2022 13:00

kingherc added v8.7.0 and removed v8.6.0 labels Nov 16, 2022

Tim-Brooks reviewed Nov 29, 2022

View reviewed changes

original-brownbear added 2 commits December 1, 2022 14:34

Merge remote-tracking branch 'elastic/main' into simple

7902e10

CR: comments

8064726

original-brownbear requested a review from Tim-Brooks December 1, 2022 13:46

DaveCTurner reviewed Dec 1, 2022

View reviewed changes

original-brownbear added 2 commits December 1, 2022 17:12

Merge remote-tracking branch 'elastic/main' into simple

8a0e7d4

noop super

6a26b8b

original-brownbear requested a review from DaveCTurner December 1, 2022 16:23

Tim-Brooks approved these changes Dec 1, 2022

View reviewed changes

original-brownbear merged commit 9139dd9 into elastic:main Dec 2, 2022

original-brownbear deleted the simple branch December 2, 2022 10:30

original-brownbear mentioned this pull request Dec 28, 2022

Handle Deserialization off of Transport Threads if Action is Forking #66828

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deserialize responses on the handling thread-pool #91367

Deserialize responses on the handling thread-pool #91367

original-brownbear commented Nov 7, 2022

elasticsearchmachine commented Nov 7, 2022

elasticsearchmachine commented Nov 7, 2022

Tim-Brooks Nov 29, 2022

original-brownbear Dec 1, 2022

Tim-Brooks Nov 29, 2022

original-brownbear Dec 1, 2022

Tim-Brooks Nov 29, 2022

original-brownbear Dec 1, 2022

DaveCTurner Dec 1, 2022

Tim-Brooks Nov 29, 2022

original-brownbear Dec 1, 2022

original-brownbear commented Dec 1, 2022

DaveCTurner left a comment

DaveCTurner Dec 1, 2022

Tim-Brooks left a comment

original-brownbear commented Dec 2, 2022

DaveCTurner commented Dec 7, 2022

original-brownbear commented Dec 7, 2022

Deserialize responses on the handling thread-pool #91367

Deserialize responses on the handling thread-pool #91367

Conversation

original-brownbear commented Nov 7, 2022

elasticsearchmachine commented Nov 7, 2022

elasticsearchmachine commented Nov 7, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

original-brownbear commented Dec 1, 2022

DaveCTurner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Tim-Brooks left a comment

Choose a reason for hiding this comment

original-brownbear commented Dec 2, 2022

DaveCTurner commented Dec 7, 2022

original-brownbear commented Dec 7, 2022