-
Notifications
You must be signed in to change notification settings - Fork 654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IllegalReferenceCountException when using sendObject #2760
Labels
for/netty
This belongs to the Netty project
Comments
pderop
added
type/bug
A general bug
status/need-triage
A new issue that still need to be evaluated as a whole
labels
Apr 3, 2023
pderop
removed
the
status/need-triage
A new issue that still need to be evaluated as a whole
label
Apr 3, 2023
normanmaurer
pushed a commit
to netty/netty
that referenced
this issue
May 22, 2023
…ss (#13388) Motivation: In the context of Reactor-Netty, we experience an issue during HTTP2 (or H2C) load tests. We have an HTTP2 (or H2C, it does not matter) server which is using in the pipeline an Http2FrameCodec, followed by a Http2MultiplexHandler, and when the client sends a new request, then the Http2MultiplexHandler will create a child `Http2MultiplexHandlerStreamChannel` that will contain in its pipeline an Http2StreamFrameToHttpObjectCodec followed by some other specific reactor-netty handlers. Now, the issue is the following: when the client sends the last request frame (with EOS flag=true), and when the server sends the last response frame (also with EOS flag = true), then in the server the stream will be closed, and the stream channel handlers will be called in `channelInactive` (that is ok). But sometimes, under high load, when the last server response can't be flushed when the parent channel is non-writtable, then we see that the server stream channel handlers may be called in`channelInactive` while the server last response frame is still writing and is not yet flushed. In other words, the `ChannelFuture` of the last server response sent to the client is not "success", but is "incomplete" at the time the child stream channel handlers are called in `channelInactive`. Normally, if I'm correct, when a channel handler is called in `channelInactive`, it means that it is now inactive and has reached its end of lifetime. So when our handlers are called in `channelInactive` while the response is not yet flushed (because parent channel was non-writable), then then we are getting into troubles, because we are then trying to cleanup resources, like the buffer of the last server response, but a bit later, when the last response is now flushed, then the buffer will be freed again, and we end up with many IllegalReferenceCountExceptions. So, we think that `channelInactive` should only be invoked after the last response is fully written and released. To reproduce the issue and investigate it with the debugger, you can first run the `Http2MultiplexTransportTest.streamHandlerInactivatedResponseFlushed` from this PR, but without applying the patch. Only pay attention to the "serverloop" thread. I tend to think that the problem may start from `DefaultHttp2RemoteFlowController.FlowState.writeAllocatedBytes(int allocated)`, line 368, where `frame.writeComplete()` is called: this method will indirectly trigger `AbstractHttp2StreamChannel.fireChannelInactiveAndDeregister(), line 742`, but without waiting for the frame promise to complete (the promise is in `DefaultHttp2ConnectionEncoder.FlowControlledBase`). Modification: Added a reproducer test in`Http2MultiplexTransportTest.streamHandlerInactivatedResponseFlushed`test. This test simulates non-writable socket by configuring the SO_SNDBUF of the server side connection to `1`. This will trigger the issue, because when the server will respond, it will get many writability events and the response won't be flushed immediately. I have tried to apply the following simple patch which seems to resolve the problem: In. `Http2ConnectionHandler.closeStream` method, ensures that the stream is closed once the future is completed: instead of doing: ``` @OverRide public void closeStream(final Http2Stream stream, ChannelFuture future) { stream.close(); if (future.isDone()) { checkCloseConnection(future); } else { future.addListener(new ChannelFutureListener() { @OverRide public void operationComplete(ChannelFuture future) throws Exception { checkCloseConnection(future); } }); } } ``` the closing is then done in the future listener, like this: ``` Override public void closeStream(final Http2Stream stream, ChannelFuture future) { if (future.isDone()) { doCloseStream(stream, future); } else { future.addListener(f -> doCloseStream(stream, future)); } } private void doCloseStream(final Http2Stream stream, ChannelFuture future) { stream.close(); checkCloseConnection(future); } ``` This seems to resolve the issue, because the stream will be closed only once the last response frame has been fully flushed. The Http2ConnectionHandlerTest.canCloseStreamWithVoidPromise has also been modified in order to set the promise to success before doing the test: ``` @test public void canCloseStreamWithVoidPromise() throws Exception { handler = newHandler(); handler.closeStream(stream, ctx.voidPromise().setSuccess()); verify(stream, times(1)).close(); verifyNoMoreInteractions(stream); } ``` Result: Fixes reactor/reactor-netty#2760
This issue is resolved by netty/netty#13388, but I keep this issue opened because I will add some tests in reactor-netty for this issue. |
violetagg
added
for/netty
This belongs to the Netty project
and removed
type/bug
A general bug
labels
May 23, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When an HTTP2 reactor-netty server is using
sendObject
when returning the response, sometimes, we see this exception on the server:Expected Behavior
Actual Behavior
Steps to Reproduce
See attached reproducer.tgz:
reproducer.tgz
To reproduce:
Use java 19
./gradlew build
from one console, start the frontend server (using java19):
java -DPROTOCOL=H2 -jar frontend-rn-11x/build/libs/frontend-rn-11x-1.0.0.jar
from another console, start gatling (using java19):
java -DPROTOCOL=H2 -DDURATION=20 -jar gatling/build/libs/gatling-1.0.0-all.jar application “H2 simulation / JsonGet” JsonGet
You should then see the IllegalReferenceCountException from the frontend console.
Analysis:
sometimes, it seems that we are facing this situation:
In fact, the above stack trace is triggered by this one (the stream is being closed):
So, the buffer is released at HttpOperations.java:491, but the buffer has already been written to the channel (either here or here:), but is not yet flushed and still present in the SSLHandler.pendingUnencryptedWrites data structure
Finally, when SSLHandler.flush is called after channelReadComplete, then it gets the IllegalReferenceCountException when accessing to the buffer from the pendingUnencryptedWrites data structure, because the buffer has already been released at HttpOperations.java:491
Possible Solution
Your Environment
The text was updated successfully, but these errors were encountered: