Server-side gRPC flow control results in small window sizes and client message backlog #11723

jammyMarse · 2024-12-03T12:54:13Z

We are experiencing an issue with gRPC where the server-side stalls, leading to very small flow control window sizes. This causes gRPC client messages to accumulate in the DefaultHttp2RemoteFlowController.pendingWriteQueue.

During high throughput scenarios with numerous streams, we have noticed that even when employing stream.isReady() for flow control, the off-heap memory usage significantly increases, ultimately leading to an Out of Memory (OOM) situation.

Currently, flow control is primarily based at the HTTP/2 layer. Would it be possible to expose some TCP-level metrics, such as a TCP equivalent of isReady()? This could help in finer-grained resource management, especially under high concurrency situations.

Having more granular control over traffic at the TCP level might help alleviate performance issues due to flow control, while also reducing the risk of OOM due to excessive memory pressure.

Is there any plan from the gRPC team to consider such TCP-level improvements in future releases? Or, are there other recommended approaches to address such issues?

Thank you!

The text was updated successfully, but these errors were encountered:

kannanjgithub · 2024-12-06T13:01:16Z

Is it possible to scale up your server side so it does not stall?
As far as the client side, I see this discussion Eric (@ejona86 ) has had with the Netty team regarding flow control and some fixes had come out of it.

ejona86 · 2024-12-06T17:16:38Z

TCP is sorta neither here nor there, as we have per-connection flow control already. It is also very hard to use such signals while also avoiding stream starvation/unfairness.

Do you have a limit to the number of concurrent RPCs you're performing? Unary RPCs don't have flow control either, and if you create an unbounded number of them you'd experience the same problem.

hlx502 · 2024-12-10T13:45:38Z

TCP is sorta neither here nor there, as we have per-connection flow control already. It is also very hard to use such signals while also avoiding stream starvation/unfairness.

Do you have a limit to the number of concurrent RPCs you're performing? Unary RPCs don't have flow control either, and if you create an unbounded number of them you'd experience the same problem.

Currently, we use onReady before onNext, but memory overflow still occurs due to too many streams. However, in our real business scenarios, we do need streams of this magnitude. Is there a plan to expose the correspondence between netty channel and stream to the application layer?

ejona86 · 2024-12-10T16:54:56Z

We aren't going to expose TCP details to the streams. It wouldn't work as it is inherently unfair.

This issue really doesn't describe your problem; I would hope to see numbers. It more asserts the problem and expects a particular solution. But there are other options. Is the memory use actually expected, or is the problem simply #11719? Or should the per-stream buffer be reduced in this case from its default of 32 KiB using CallOptions.withOnReadyThreshold()? Or is the memory use expected and you should increase the JVM's -XX:MaxDirectMemorySize? Or are you simply trying to do to many RPCs concurrently for the machine size? For those sorts of things we'd need to understand the memory impact you are seeing, the number of concurrent RPCs, and approximate message size.

ejona86 · 2024-12-26T18:20:43Z

Without knowing more, I don't see anything more that can be done here. Closing, but comment with more info and it can be reopened.

Note also that if direct memory is the main problem (the amount of memory is fine, just the type of memory is the problem), it is possible to make your own ByteBufAllocator instance that prefers heap memory and pass it to gRPC's builders with Netty's ChannelOption.ALLOCATOR. For an application, you can maybe use the system property -Dio.netty.noPreferDirect=true instead (it is easy to use, but you may not want it to apply process-wide).

jammyMarse added the question label Dec 3, 2024

ejona86 closed this as not planned Won't fix, can't repro, duplicate, stale Dec 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Server-side gRPC flow control results in small window sizes and client message backlog #11723

Server-side gRPC flow control results in small window sizes and client message backlog #11723

jammyMarse commented Dec 3, 2024

kannanjgithub commented Dec 6, 2024 •

edited

Loading

ejona86 commented Dec 6, 2024

hlx502 commented Dec 10, 2024

ejona86 commented Dec 10, 2024

ejona86 commented Dec 26, 2024

Server-side gRPC flow control results in small window sizes and client message backlog #11723

Server-side gRPC flow control results in small window sizes and client message backlog #11723

Comments

jammyMarse commented Dec 3, 2024

kannanjgithub commented Dec 6, 2024 • edited Loading

ejona86 commented Dec 6, 2024

hlx502 commented Dec 10, 2024

ejona86 commented Dec 10, 2024

ejona86 commented Dec 26, 2024

kannanjgithub commented Dec 6, 2024 •

edited

Loading