Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IllegalReferenceCountException when using sendObject #2760

Closed
pderop opened this issue Apr 3, 2023 · 1 comment · Fixed by netty/netty#13388
Closed

IllegalReferenceCountException when using sendObject #2760

pderop opened this issue Apr 3, 2023 · 1 comment · Fixed by netty/netty#13388
Assignees
Labels
for/netty This belongs to the Netty project

Comments

@pderop
Copy link
Contributor

pderop commented Apr 3, 2023

When an HTTP2 reactor-netty server is using sendObject when returning the response, sometimes, we see this exception on the server:

12:53:50.905 [reactor-http-nio-1] WARN  i.n.channel.DefaultChannelPipeline - An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
io.netty.util.IllegalReferenceCountException: refCnt: 0
	at io.netty.buffer.AbstractByteBuf.ensureAccessible(AbstractByteBuf.java:1454)
	at io.netty.buffer.AbstractByteBuf.checkIndex(AbstractByteBuf.java:1383)
	at io.netty.buffer.PooledByteBuf.internalNioBuffer(PooledByteBuf.java:203)
	at io.netty.handler.ssl.SslHandler.wrap(SslHandler.java:1026)
	at io.netty.handler.ssl.SslHandler.wrap(SslHandler.java:829)
	at io.netty.handler.ssl.SslHandler.wrapAndFlush(SslHandler.java:800)
	at io.netty.handler.ssl.SslHandler.flush(SslHandler.java:781)
	at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:925)
	at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:907)
	at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:893)
	at io.netty.handler.logging.LoggingHandler.flush(LoggingHandler.java:304)
	at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:923)
	at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:907)
	at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:893)
	at io.netty.handler.codec.http2.Http2ConnectionHandler.flush(Http2ConnectionHandler.java:197)
	at io.netty.handler.codec.http2.Http2ConnectionHandler.channelReadComplete(Http2ConnectionHandler.java:555)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelReadComplete(AbstractChannelHandlerContext.java:486)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelReadComplete(AbstractChannelHandlerContext.java:463)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelReadComplete(AbstractChannelHandlerContext.java:456)
	at reactor.netty.tcp.SslProvider$SslReadHandler.channelReadComplete(SslProvider.java:842)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelReadComplete(AbstractChannelHandlerContext.java:486)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelReadComplete(AbstractChannelHandlerContext.java:463)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelReadComplete(AbstractChannelHandlerContext.java:456)
	at io.netty.handler.logging.LoggingHandler.channelReadComplete(LoggingHandler.java:272)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelReadComplete(AbstractChannelHandlerContext.java:484)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelReadComplete(AbstractChannelHandlerContext.java:463)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelReadComplete(AbstractChannelHandlerContext.java:456)
	at io.netty.handler.ssl.SslHandler.channelReadComplete0(SslHandler.java:1313)
	at io.netty.handler.ssl.SslHandler.channelReadComplete(SslHandler.java:1302)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelReadComplete(AbstractChannelHandlerContext.java:486)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelReadComplete(AbstractChannelHandlerContext.java:463)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelReadComplete(AbstractChannelHandlerContext.java:456)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelReadComplete(DefaultChannelPipeline.java:1415)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelReadComplete(AbstractChannelHandlerContext.java:482)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelReadComplete(AbstractChannelHandlerContext.java:463)
	at io.netty.channel.DefaultChannelPipeline.fireChannelReadComplete(DefaultChannelPipeline.java:925)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:171)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:1589)

Expected Behavior

Actual Behavior

Steps to Reproduce

static Consumer<? super HttpServerRoutes> routesBuilder() {
	return r -> r.get("/user/{id}", findUser());
    }
    
static BiFunction<HttpServerRequest, HttpServerResponse, Publisher<Void>> findUser() {
	return (req, res) ->
			res.header("Content-Type", "application/json")
					.sendObject(toByteBuf(new User(req.param("id"), "Ben Chmark")));
}

See attached reproducer.tgz:
reproducer.tgz

To reproduce:

  • Use java 19

  • ./gradlew build

  • from one console, start the frontend server (using java19):
    java -DPROTOCOL=H2 -jar frontend-rn-11x/build/libs/frontend-rn-11x-1.0.0.jar

  • from another console, start gatling (using java19):
    java -DPROTOCOL=H2 -DDURATION=20 -jar gatling/build/libs/gatling-1.0.0-all.jar application “H2 simulation / JsonGet” JsonGet

You should then see the IllegalReferenceCountException from the frontend console.

Analysis:

sometimes, it seems that we are facing this situation:

  • A buffer is currently present in the SSLHandler.pendingUnencryptedWrites data structure here.
  • Then, the buffer is released by HttpOperations here, line 491. This is because the stream is being cancelled/closed from this stacktrace:
 "reactor.netty.http.HttpOperations$PostHeadersNettyOutbound.run(HttpOperations.java:491)"
 "reactor.core.publisher.FluxPeekFuseable$PeekFuseableSubscriber.cancel(FluxPeekFuseable.java:152)"
 "reactor.core.publisher.Operators.terminate(Operators.java:1277)"
 "reactor.netty.channel.ChannelOperations.terminate(ChannelOperations.java:481)"
 "reactor.netty.http.server.HttpServerOperations.onInboundClose(HttpServerOperations.java:677)"
 "reactor.netty.channel.ChannelOperationsHandler.channelInactive(ChannelOperationsHandler.java:73)"
 "io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:305)"
 "io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281)"
 "io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:274)"
 "io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405)"
 "io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:301)"
 "io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281)"
 "io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901)"
 "io.netty.handler.codec.http2.AbstractHttp2StreamChannel$Http2ChannelUnsafe$2.run(AbstractHttp2StreamChannel.java:737)"
 "io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)"
 "io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)"
 "io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)"
 "io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:566)"
 "io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)"
 "io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)"
 "io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)"
 "java.base/java.lang.Thread.run(Thread.java:1589)"

In fact, the above stack trace is triggered by this one (the stream is being closed):

fireChannelInactiveAndDeregister:733, AbstractHttp2StreamChannel$Http2ChannelUnsafe (io.netty.handler.codec.http2)
close:702, AbstractHttp2StreamChannel$Http2ChannelUnsafe (io.netty.handler.codec.http2)
closeForcibly:707, AbstractHttp2StreamChannel$Http2ChannelUnsafe (io.netty.handler.codec.http2)
doBeginRead:805, AbstractHttp2StreamChannel$Http2ChannelUnsafe (io.netty.handler.codec.http2)
streamClosed:299, AbstractHttp2StreamChannel (io.netty.handler.codec.http2)
userEventTriggered:249, Http2MultiplexHandler (io.netty.handler.codec.http2)
invokeUserEventTriggered:398, AbstractChannelHandlerContext (io.netty.channel)
invokeUserEventTriggered:376, AbstractChannelHandlerContext (io.netty.channel)
fireUserEventTriggered:368, AbstractChannelHandlerContext (io.netty.channel)
onHttp2StreamStateChanged:703, Http2FrameCodec (io.netty.handler.codec.http2)
onHttp2StreamStateChanged0:524, Http2FrameCodec$ConnectionListener (io.netty.handler.codec.http2)
onStreamClosed:513, Http2FrameCodec$ConnectionListener (io.netty.handler.codec.http2)
notifyClosed:357, DefaultHttp2Connection (io.netty.handler.codec.http2)
removeFromActiveStreams:1007, DefaultHttp2Connection$ActiveStreams (io.netty.handler.codec.http2)
deactivate:963, DefaultHttp2Connection$ActiveStreams (io.netty.handler.codec.http2)
close:515, DefaultHttp2Connection$DefaultStream (io.netty.handler.codec.http2)
close:521, DefaultHttp2Connection$DefaultStream (io.netty.handler.codec.http2)
closeStream:628, Http2ConnectionHandler (io.netty.handler.codec.http2)
closeStreamLocal:601, Http2ConnectionHandler (io.netty.handler.codec.http2)
writeComplete:623, DefaultHttp2ConnectionEncoder$FlowControlledBase (io.netty.handler.codec.http2)
writeAllocatedBytes:368, DefaultHttp2RemoteFlowController$FlowState (io.netty.handler.codec.http2)
write:547, DefaultHttp2RemoteFlowController$WritabilityMonitor (io.netty.handler.codec.http2)
write:612, WeightedFairQueueByteDistributor$State (io.netty.handler.codec.http2)
distribute:292, WeightedFairQueueByteDistributor (io.netty.handler.codec.http2)
distributeToChildren:325, WeightedFairQueueByteDistributor (io.netty.handler.codec.http2)
distribute:273, WeightedFairQueueByteDistributor (io.netty.handler.codec.http2)
writePendingBytes:627, DefaultHttp2RemoteFlowController$WritabilityMonitor (io.netty.handler.codec.http2)
writePendingBytes:267, DefaultHttp2RemoteFlowController (io.netty.handler.codec.http2)
flush:196, Http2ConnectionHandler (io.netty.handler.codec.http2)
invokeFlush0:925, AbstractChannelHandlerContext (io.netty.channel)
invokeFlush:907, AbstractChannelHandlerContext (io.netty.channel)
flush:893, AbstractChannelHandlerContext (io.netty.channel)
processPendingReadCompleteQueue:349, Http2MultiplexHandler (io.netty.handler.codec.http2)
channelReadComplete:330, Http2MultiplexHandler (io.netty.handler.codec.http2)
invokeChannelReadComplete:484, AbstractChannelHandlerContext (io.netty.channel)
invokeChannelReadComplete:463, AbstractChannelHandlerContext (io.netty.channel)
fireChannelReadComplete:456, AbstractChannelHandlerContext (io.netty.channel)
channelReadComplete0:570, Http2ConnectionHandler (io.netty.handler.codec.http2)
channelReadComplete:553, Http2ConnectionHandler (io.netty.handler.codec.http2)
invokeChannelReadComplete:486, AbstractChannelHandlerContext (io.netty.channel)
invokeChannelReadComplete:463, AbstractChannelHandlerContext (io.netty.channel)
fireChannelReadComplete:456, AbstractChannelHandlerContext (io.netty.channel)
channelReadComplete:272, LoggingHandler (io.netty.handler.logging)
invokeChannelReadComplete:484, AbstractChannelHandlerContext (io.netty.channel)
invokeChannelReadComplete:463, AbstractChannelHandlerContext (io.netty.channel)
fireChannelReadComplete:456, AbstractChannelHandlerContext (io.netty.channel)
channelReadComplete0:1329, SslHandler (io.netty.handler.ssl)
channelReadComplete:1318, SslHandler (io.netty.handler.ssl)
invokeChannelReadComplete:486, AbstractChannelHandlerContext (io.netty.channel)
invokeChannelReadComplete:463, AbstractChannelHandlerContext (io.netty.channel)
fireChannelReadComplete:456, AbstractChannelHandlerContext (io.netty.channel)
channelReadComplete:1415, DefaultChannelPipeline$HeadContext (io.netty.channel)
invokeChannelReadComplete:482, AbstractChannelHandlerContext (io.netty.channel)
invokeChannelReadComplete:463, AbstractChannelHandlerContext (io.netty.channel)
fireChannelReadComplete:925, DefaultChannelPipeline (io.netty.channel)
read:171, AbstractNioByteChannel$NioByteUnsafe (io.netty.channel.nio)
processSelectedKey:788, NioEventLoop (io.netty.channel.nio)
processSelectedKeysOptimized:724, NioEventLoop (io.netty.channel.nio)
processSelectedKeys:650, NioEventLoop (io.netty.channel.nio)
run:562, NioEventLoop (io.netty.channel.nio)
run:997, SingleThreadEventExecutor$4 (io.netty.util.concurrent)
run:74, ThreadExecutorMap$2 (io.netty.util.internal)
run:30, FastThreadLocalRunnable (io.netty.util.concurrent)
run:1589, Thread (java.lang)
  • So, the buffer is released at HttpOperations.java:491, but the buffer has already been written to the channel (either here or here:), but is not yet flushed and still present in the SSLHandler.pendingUnencryptedWrites data structure

  • Finally, when SSLHandler.flush is called after channelReadComplete, then it gets the IllegalReferenceCountException when accessing to the buffer from the pendingUnencryptedWrites data structure, because the buffer has already been released at HttpOperations.java:491

Possible Solution

Your Environment

  • Reactor version(s) used: latest 1.0.x branch
  • JVM version: 19
@pderop pderop added type/bug A general bug status/need-triage A new issue that still need to be evaluated as a whole labels Apr 3, 2023
@pderop pderop removed the status/need-triage A new issue that still need to be evaluated as a whole label Apr 3, 2023
@violetagg violetagg assigned pderop and unassigned violetagg Apr 3, 2023
@pderop pderop added this to the 1.0.x Backlog milestone Apr 7, 2023
normanmaurer pushed a commit to netty/netty that referenced this issue May 22, 2023
…ss (#13388)

Motivation:

In the context of Reactor-Netty, we experience an issue during HTTP2 (or
H2C) load tests.
We have an HTTP2 (or H2C, it does not matter) server which is using in
the pipeline an Http2FrameCodec, followed by a Http2MultiplexHandler,
and when the client sends a new request, then the Http2MultiplexHandler
will create a child `Http2MultiplexHandlerStreamChannel` that will
contain in its pipeline an Http2StreamFrameToHttpObjectCodec followed by
some other specific reactor-netty handlers.

Now, the issue is the following: when the client sends the last request
frame (with EOS flag=true), and when the server sends the last response
frame (also with EOS flag = true), then in the server the stream will be
closed, and the stream channel handlers will be called in
`channelInactive` (that is ok).
But sometimes, under high load, when the last server response can't be
flushed when the parent channel is non-writtable, then we see that the
server stream channel handlers may be called in`channelInactive` while
the server last response frame is still writing and is not yet flushed.
In other words, the `ChannelFuture` of the last server response sent to
the client is not "success", but is "incomplete" at the time the child
stream channel handlers are called in `channelInactive`.

Normally, if I'm correct, when a channel handler is called in
`channelInactive`, it means that it is now inactive and has reached its
end of lifetime. So when our handlers are called in `channelInactive`
while the response is not yet flushed (because parent channel was
non-writable), then then we are getting into troubles, because we are
then trying to cleanup resources, like the buffer of the last server
response, but a bit later, when the last response is now flushed, then
the buffer will be freed again, and we end up with many
IllegalReferenceCountExceptions. So, we think that `channelInactive`
should only be invoked after the last response is fully written and
released.

To reproduce the issue and investigate it with the debugger, you can
first run the
`Http2MultiplexTransportTest.streamHandlerInactivatedResponseFlushed`
from this PR, but without applying the patch. Only pay attention to the
"serverloop" thread.

I tend to think that the problem may start from
`DefaultHttp2RemoteFlowController.FlowState.writeAllocatedBytes(int
allocated)`, line 368, where `frame.writeComplete()` is called: this
method will indirectly trigger
`AbstractHttp2StreamChannel.fireChannelInactiveAndDeregister(), line
742`, but without waiting for the frame promise to complete (the promise
is in `DefaultHttp2ConnectionEncoder.FlowControlledBase`).

Modification:

Added a reproducer test
in`Http2MultiplexTransportTest.streamHandlerInactivatedResponseFlushed`test.
This test simulates non-writable socket by configuring the SO_SNDBUF of
the server side connection to `1`. This will trigger the issue, because
when the server will respond, it will get many writability events and
the response won't be flushed immediately.

I have tried to apply the following simple patch which seems to resolve
the problem: In. `Http2ConnectionHandler.closeStream` method, ensures
that the stream is closed once the future is completed:

instead of doing:

```
    @OverRide
    public void closeStream(final Http2Stream stream, ChannelFuture future) {
        stream.close();

        if (future.isDone()) {
            checkCloseConnection(future);
        } else {
            future.addListener(new ChannelFutureListener() {
                @OverRide
                public void operationComplete(ChannelFuture future) throws Exception {
                    checkCloseConnection(future);
                }
            });
        }
    }
```

the closing is then done in the future listener, like this:
```
Override
    public void closeStream(final Http2Stream stream, ChannelFuture future) {
        if (future.isDone()) {
            doCloseStream(stream, future);
        } else {
            future.addListener(f -> doCloseStream(stream, future));
        }
    }

    private void doCloseStream(final Http2Stream stream, ChannelFuture future) {
        stream.close();
        checkCloseConnection(future);
    }
```

This seems to resolve the issue, because the stream will be closed only
once the last response frame has been fully flushed.
The Http2ConnectionHandlerTest.canCloseStreamWithVoidPromise has also
been modified in order to set the promise to success before doing the
test:

```
    @test
    public void canCloseStreamWithVoidPromise() throws Exception {
        handler = newHandler();
        handler.closeStream(stream, ctx.voidPromise().setSuccess());
        verify(stream, times(1)).close();
        verifyNoMoreInteractions(stream);
    }
```

Result:

Fixes reactor/reactor-netty#2760
@pderop
Copy link
Contributor Author

pderop commented May 22, 2023

This issue is resolved by netty/netty#13388, but I keep this issue opened because I will add some tests in reactor-netty for this issue.

@violetagg violetagg added for/netty This belongs to the Netty project and removed type/bug A general bug labels May 23, 2023
@violetagg violetagg removed this from the 1.0.x Backlog milestone May 23, 2023
@violetagg violetagg reopened this Jun 2, 2023
@pderop pderop closed this as completed in c71390e Jun 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
for/netty This belongs to the Netty project
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants