Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP always above zero in SimpleDequePool #118

Open
machao23 opened this issue Jan 17, 2021 · 24 comments
Open

WIP always above zero in SimpleDequePool #118

machao23 opened this issue Jan 17, 2021 · 24 comments
Labels
for/user-attention This issue needs user attention (feedback, rework, etc...) status/need-investigation This needs more in-depth investigation type/bug A general bug

Comments

@machao23
Copy link

machao23 commented Jan 17, 2021

I'm using reactor-netty to make API calls and facing an issue wherein reactor-netty is not able to create new connections for the incoming requests. And I debuged codes and found the result of WIP.getAndIncrement(this) always above zero.

Expected Behavior

Even if the drainLoop is broken by some requests, it shouldn't affect further requests.

Actual Behavior

The darainLoop never called anymore after some exceptions occurred.

Steps to Reproduce

Unfortunately, I don't know how to reproduce this because I don't know the root cause.
If this helps in any manner, I checked the logs on prod and there're some exceptions occurred during drainLoop:

Failed to mark a promise as failure because it has failed already: DefaultChannelPromise@5a854314(failure: io.netty.handler.codec.EncoderException: io.netty.util.IllegalReferenceCountException: refCnt: 0, decrement: 1), unnotified cause: io.netty.handler.codec.EncoderException: io.netty.util.IllegalReferenceCountException: refCnt: 0, decrement: 1
	at io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:104)
	at io.netty.channel.CombinedChannelDuplexHandler.write(CombinedChannelDuplexHandler.java:346)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:709)
	at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:792)
	at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:702)
	at io.netty.handler.timeout.IdleStateHandler.write(IdleStateHandler.java:304)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:709)
	at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:792)
	at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:702)
	at io.netty.handler.timeout.WriteTimeoutHandler.write(WriteTimeoutHandler.java:112)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764)
	at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:790)
	at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:758)
	at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:808)
	at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1025)
	at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:294)
	at reactor.netty.http.HttpOperations.lambda$send$0(HttpOperations.java:123)
	at reactor.core.publisher.FluxFlatMap.trySubscribeScalarMap(FluxFlatMap.java:151)
	at reactor.core.publisher.MonoFlatMap.subscribeOrReturn(MonoFlatMap.java:53)
	at reactor.core.publisher.Mono.subscribe(Mono.java:4198)
	at reactor.netty.NettyOutbound.subscribe(NettyOutbound.java:336)
	at reactor.core.publisher.MonoSource.subscribe(MonoSource.java:65)
	at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52)
	at reactor.netty.http.client.HttpClientConnect$HttpIOHandlerObserver.onStateChange(HttpClientConnect.java:446)
	at reactor.netty.ReactorNetty$CompositeConnectionObserver.onStateChange(ReactorNetty.java:518)
	at reactor.netty.resources.PooledConnectionProvider$DisposableAcquire.run(PooledConnectionProvider.java:633)
	at reactor.netty.resources.PooledConnectionProvider$DisposableAcquire.onNext(PooledConnectionProvider.java:506)
	at reactor.netty.resources.PooledConnectionProvider$DisposableAcquire.onNext(PooledConnectionProvider.java:457)
	at reactor.netty.internal.shaded.reactor.pool.AbstractPool$Borrower.deliver(AbstractPool.java:419)
	at reactor.netty.internal.shaded.reactor.pool.SimpleDequePool.lambda$drainLoop$15(SimpleDequePool.java:391)
	at reactor.core.scheduler.ImmediateScheduler.schedule(ImmediateScheduler.java:47)
	at reactor.netty.internal.shaded.reactor.pool.SimpleDequePool.drainLoop(SimpleDequePool.java:391)
	at reactor.netty.internal.shaded.reactor.pool.SimpleDequePool.drain(SimpleDequePool.java:261)
	at reactor.netty.internal.shaded.reactor.pool.SimpleDequePool.doAcquire(SimpleDequePool.java:256)
	at reactor.netty.internal.shaded.reactor.pool.AbstractPool$Borrower.request(AbstractPool.java:382)
	at reactor.netty.resources.PooledConnectionProvider$DisposableAcquire.onSubscribe(PooledConnectionProvider.java:532)
	at reactor.netty.internal.shaded.reactor.pool.SimpleDequePool$QueueBorrowerMono.subscribe(SimpleDequePool.java:578)
	at reactor.netty.resources.PooledConnectionProvider.disposableAcquire(PooledConnectionProvider.java:218)
	at reactor.netty.resources.PooledConnectionProvider.lambda$acquire$3(PooledConnectionProvider.java:182)
	at reactor.core.publisher.MonoCreate.subscribe(MonoCreate.java:57)
	at reactor.netty.http.client.HttpClientConnect$MonoHttpConnect.lambda$subscribe$0(HttpClientConnect.java:327)
	at reactor.core.publisher.MonoCreate.subscribe(MonoCreate.java:57)
	at reactor.core.publisher.FluxRetryWhen$RetryWhenMainSubscriber.resubscribe(FluxRetryWhen.java:204)
	at reactor.core.publisher.FluxRetryWhen$RetryWhenOtherSubscriber.onNext(FluxRetryWhen.java:250)
	at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.innerNext(FluxConcatMap.java:274)
	at reactor.core.publisher.FluxConcatMap$ConcatMapInner.onNext(FluxConcatMap.java:851)
	at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1782)
	at reactor.core.publisher.MonoFlatMap$FlatMapInner.onNext(MonoFlatMap.java:241)
	at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1782)
	at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.drain(MonoIgnoreThen.java:147)
	at reactor.core.publisher.MonoIgnoreThen.subscribe(MonoIgnoreThen.java:56)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:150)
	at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1782)
	at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.drain(MonoIgnoreThen.java:147)
	at reactor.core.publisher.MonoIgnoreThen.subscribe(MonoIgnoreThen.java:56)
	at reactor.core.publisher.Mono.subscribe(Mono.java:4213)
	at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.drain(FluxConcatMap.java:441)
	at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.onNext(FluxConcatMap.java:243)
	at reactor.core.publisher.DirectProcessor$DirectInner.onNext(DirectProcessor.java:333)
	at reactor.core.publisher.DirectProcessor.onNext(DirectProcessor.java:142)
	at reactor.core.publisher.SerializedSubscriber.onNext(SerializedSubscriber.java:99)
	at reactor.core.publisher.FluxRetryWhen$RetryWhenMainSubscriber.onError(FluxRetryWhen.java:180)
	at reactor.core.publisher.MonoCreate$DefaultMonoSink.error(MonoCreate.java:185)
	at reactor.netty.http.client.HttpClientConnect$HttpObserver.onUncaughtException(HttpClientConnect.java:407)
	at reactor.netty.ReactorNetty$CompositeConnectionObserver.onUncaughtException(ReactorNetty.java:511)
	at reactor.netty.resources.PooledConnectionProvider$DisposableAcquire.onUncaughtException(PooledConnectionProvider.java:549)
	at reactor.netty.resources.PooledConnectionProvider$PooledConnection.onUncaughtException(PooledConnectionProvider.java:385)
	at reactor.netty.channel.FluxReceive.drainReceiver(FluxReceive.java:230)
	at reactor.netty.channel.FluxReceive.onInboundError(FluxReceive.java:435)
	at reactor.netty.channel.ChannelOperations.onInboundError(ChannelOperations.java:442)
	at reactor.netty.channel.ChannelOperationsHandler.exceptionCaught(ChannelOperationsHandler.java:129)
	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302)
	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281)
	at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:273)
	at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireExceptionCaught(CombinedChannelDuplexHandler.java:424)
	at io.netty.channel.ChannelHandlerAdapter.exceptionCaught(ChannelHandlerAdapter.java:92)
	at io.netty.channel.CombinedChannelDuplexHandler$1.fireExceptionCaught(CombinedChannelDuplexHandler.java:145)
	at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:143)
	at io.netty.channel.CombinedChannelDuplexHandler.exceptionCaught(CombinedChannelDuplexHandler.java:231)
	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302)
	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281)
	at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:273)
	at io.netty.handler.ssl.SslHandler.exceptionCaught(SslHandler.java:1144)
	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302)
	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281)
	at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:273)
	at io.netty.handler.proxy.ProxyHandler.exceptionCaught(ProxyHandler.java:241)
	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302)
	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281)
	at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:273)
	at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireExceptionCaught(CombinedChannelDuplexHandler.java:424)
	at io.netty.channel.CombinedChannelDuplexHandler$1.fireExceptionCaught(CombinedChannelDuplexHandler.java:161)
	at io.netty.channel.CombinedChannelDuplexHandler.exceptionCaught(CombinedChannelDuplexHandler.java:233)
	at io.netty.handler.proxy.HttpProxyHandler$HttpClientCodecWrapper.exceptionCaught(HttpProxyHandler.java:247)
	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302)
	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281)
	at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:273)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.exceptionCaught(DefaultChannelPipeline.java:1377)
	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302)
	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281)
	at io.netty.channel.DefaultChannelPipeline.fireExceptionCaught(DefaultChannelPipeline.java:907)
	at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.handleReadException(AbstractEpollStreamChannel.java:728)
	at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:818)
	at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:475)
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: io.netty.util.IllegalReferenceCountException: refCnt: 0, decrement: 1
	at io.netty.util.internal.ReferenceCountUpdater.toLiveRealRefCnt(ReferenceCountUpdater.java:74)
	at io.netty.util.internal.ReferenceCountUpdater.release(ReferenceCountUpdater.java:138)
	at io.netty.buffer.AbstractReferenceCountedByteBuf.release(AbstractReferenceCountedByteBuf.java:100)
	at io.netty.handler.codec.http.DefaultFullHttpRequest.release(DefaultFullHttpRequest.java:102)
	at io.netty.util.ReferenceCountUtil.release(ReferenceCountUtil.java:88)
	at io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:91)
	... 111 more

io.netty.channel.unix.Errors$NativeIoException: writevAddresses(..) failed: Connection reset by peer

Possible Solution

From what I understand from the source code for SimpleDequePool.java, the method drainLoop() just incremented WIP while checking the (WIP.getAndIncrement(this) == 0) condition and then decrementing WIP for itself.

If some exceptions occured during drainLoop(), it would fail to handle decrementing WIP, the further requests to drain() will never pass the below if condition.

void drain() {
    if (WIP.getAndIncrement(this) == 0) {
        drainLoop();
    }
}

So I think it'd be better to implement fallback to decrement the WIP in case something wrong happen during drainLoop.

Your Environment

reactor-netty: 0.9.14.RELEASE
netty: 4.1.51.Final
JVM version: openjdk version "11.0.4"
OS and version: 4.19.118-1.el7.centos.x86_64

@reactorbot reactorbot added the ❓need-triage This issue needs triage, hasn't been looked at by a team member yet label Jan 17, 2021
@simonbasle
Copy link
Member

hi @machao23 and thanks for the report. There has been a few bugs fixed in the later versions of reactor-netty and reactor-pool, could you confirm you still see the issue with the latest versions ? (ie 0.9.16.RELEASE).

I'll look into what pieces of the drainLoop can throw an Exception and see about protecting the loop from that.

@simonbasle simonbasle added for/user-attention This issue needs user attention (feedback, rework, etc...) status/need-investigation This needs more in-depth investigation type/bug A general bug and removed ❓need-triage This issue needs triage, hasn't been looked at by a team member yet labels Jan 18, 2021
@machao23
Copy link
Author

machao23 commented Jan 18, 2021

@simonbasle Thank you for replying. This issue happened once for some months, so it probably needs a log of time to check if the latest version works well.

@simonbasle
Copy link
Member

can you let us know more about your netty/reactor-netty usage? for instance, are you setting up netty handlers, codecs, etc... ? What's the configuration like? The stacktrace you provided seems to indicate something goes wrong during the encoding phase.

@simonbasle
Copy link
Member

cc @violetagg

@machao23
Copy link
Author

machao23 commented Jan 20, 2021

@simonbasle our codes related to reactor-netty are below:

 return () -> httpClient.headers(h -> header.forEach(h::add))
                        .tcpConfiguration(tcpClient -> {
                            if (!Strings.isNullOrEmpty(searchContext.getProxyHost())) {
                                tcpClient = tcpClient.proxy((spec) ->
                                        spec.type(ProxyProvider.Proxy.HTTP)
                                            .host(searchContext.getProxyHost())
                                            .port(searchContext.getProxyPort()));
                                NettyWebClient.LOGGER.info("tcp proxy ProxyHost is {}, ProxyPort is {}",
                                        searchContext.getProxyHost(), searchContext.getProxyPort());
                            }

                            return tcpClient.option(ChannelOption.CONNECT_TIMEOUT_MILLIS,
                                    (int) searchContext.getTimeout().getSeconds() * 1000)
                                    .doOnConnected(connection -> connection.addHandlerLast(
                                            new ReadTimeoutHandler(searchContext.getTimeout().toMillis(),
                                                    TimeUnit.MILLISECONDS))
                                            .addHandlerLast(new WriteTimeoutHandler(
                                                    searchContext.getTimeout().toMillis(), TimeUnit.MILLISECONDS)));
                                })
                        .post()
                        .uri(searchContext.getSearchUrl())
                        .send(Mono.just(Unpooled.wrappedBuffer(bytes)))
                        .responseSingle((resp, buf) -> {
                            HttpResponseStatus httpResponseStatus = resp.status();
                            HttpHeaders httpHeaders = resp.responseHeaders();
                            response.setStatusCode(httpResponseStatus.code());
                            response.setContentLength(getContentLength(httpHeaders.get("Content-Length")));
                            return buf;
                        })
                        .map(ByteBuf::retain)
                        .map(byteBuf -> {
                            response.setResponseBodyStream(new ByteBufInputStream(byteBuf, true));
                            return response;
                        });

Yes, it throwed IllegalReferenceCountException during sending request. I'm really confused where the ByteBuf is released early because we don't relese it manually in our code.
Is there any way to trace the ByteBuf reference count?

@violetagg
Copy link
Member

@machao23 I tried to reproduce the issue based on the example above but with no success.
However the WARNING that you observed Failed to mark a promise as failure because it has failed already... cannot lead to a connection not properly returned to the pool or an exception not properly propagated.
Did you have any other logs/exceptions - for example WARN r.n.http.client.HttpClientConnect - ... The connection observed an error, so that we can better understand what the cause is.
You wrote that this issue happens very rare, on the other hand you were able to verify that a fix in the drain can solve it. Can you provide some steps that we can use in order to verify a fix in the drain.

Now some Netty code:

It seems that the encoder here
https://github.com/netty/netty/blob/16f2a5e49f7bc018d3a0c3d3a8f090749229f301/codec/src/main/java/io/netty/handler/codec/MessageToMessageEncoder.java#L89

fails and releases the byte buffer (unfortunately I cannot say why)
then in the finally block the buffer is released again which causes IllegalReferenceCountException
https://github.com/netty/netty/blob/16f2a5e49f7bc018d3a0c3d3a8f090749229f301/codec/src/main/java/io/netty/handler/codec/MessageToMessageEncoder.java#L91

exceptions in Netty are not just thrown but propagated via promises
https://github.com/netty/netty/blob/16f2a5e49f7bc018d3a0c3d3a8f090749229f301/transport/src/main/java/io/netty/channel/AbstractChannelHandlerContext.java#L719

If the promise was already marked as failed (remember encoder failed for some reason), Netty will just log that the new exception cannot be propagated because the promise was already marked as failed and actually that's the WARNING that you observed.
https://github.com/netty/netty/blob/16f2a5e49f7bc018d3a0c3d3a8f090749229f301/common/src/main/java/io/netty/util/internal/PromiseNotificationUtil.java#L63

@machao23
Copy link
Author

@violetagg Thank you for your clarification.
I guess the new exception that you mentioned I've already appended after the exception stack :
io.netty.channel.unix.Errors$NativeIoException: writevAddresses(..) failed: Connection reset by peer

Because I don't find any other new exceptions in our logs. Please confirm it, thanks.

@violetagg
Copy link
Member

violetagg commented Jan 26, 2021

io.netty.channel.unix.Errors$NativeIoException: writevAddresses(..) failed: Connection reset by peer

How many of those do you observe?

What's your connection pool configurations? (especially max connections)

I'm using reactor-netty to make API calls and facing an issue wherein reactor-netty is not able to create new connections for the incoming requests.

If you run with the default configurations you should have many exceptions, in order to exhaust the connection pool, although the one above should not lead to a connection not returned to the pool.

@machao23
Copy link
Author

machao23 commented Jan 26, 2021

@violetagg Just a few, it's only occurred with IllegalReferenceCountException. I don't understand why you'd think there were many exceptions. Because from my understanding, our service has only a few connection when it's running.

If you run with the default configurations you should have many exceptions, in order to exhaust the connection pool, although the one above should not lead to a connection not returned to the pool.run the service with the default configurations.

Yes, we do run with default connection.
Just confusing why the connection pool would be exhausted, because I personally think this WIP issue would keep request from calling drainLoop() to get connection from the pool?

@violetagg
Copy link
Member

@machao23 I had a wrong impression that you see connections always stay active and not returned to the pool.
So I don't think the WARNING above and the exception are related to Reactor Pool not able to acquire/create a resource.

@machao23
Copy link
Author

machao23 commented Jan 26, 2021

@violetagg Maybe I didn't express my meaning before. Let me explain again:
A few IllegalReferenceCountException occured with Connection reset by peer first. You could see the exception stack from my first comment.
Then SimpleDequePool.drainLoop() doesn't decrement the WIP to zero because it doesn't catch and handle those exceptions. So the further requests don't pass the WIP.getAndIncrement(this) == 0 and failed to call the drainLoop(). It led the service can't send any more request scuccessfully.
I just want to know the root cause of IllegalReferenceCountException . I think the timeline probably is :

io.netty.channel.unix.Errors$NativeIoException: writevAddresses(..) failed: Connection reset by peer -> IllegalReferenceCountException -> WIP.getAndIncrem.kMMMMMMent(this) != 0 -> failed to call drainLoop()

@violetagg
Copy link
Member

violetagg commented Jan 26, 2021

@violetagg Maybe I didn't express my meaning before. Let me explain again:
A few IllegalReferenceCountException occured with Connection reset by peer first. You could see the exception stack from my first comment.
Then SimpleDequePool.drainLoop() doesn't decrement the WIP to zero because it doesn't catch and handle those exceptions. So the further requests don't pass the WIP.getAndIncrement(this) == 0 and failed to call the drainLoop(). It led the service can't send any more request scuccessfully.
I just want to know the root cause of IllegalReferenceCountException . I think the timeline probably is :

io.netty.channel.unix.Errors$NativeIoException: writevAddresses(..) failed: Connection reset by peer -> IllegalReferenceCountException -> WIP.getAndIncrem.kMMMMMMent(this) != 0 -> failed to call drainLoop()

Please see my comment above - the exceptions in Netty are propagated via promises i.e. the promise is marked as failed and you can obtain the exception from the promise, it is not just thrown.

@machao23
Copy link
Author

machao23 commented Jan 27, 2021

Please see my comment above - the exceptions in Netty are propagated via promises i.e. the promise is marked as failed and you can obtain the exception from the promise, it is not just thrown.

@violetagg Oh, I see.
I'm confused why the io.netty.channel.unix.Errors$NativeIoException: writevAddresses(..) failed: Connection reset by peer happened during MessageToMessageEncoder.encode(). I just want to express if the remote server did close the connection, it'd interrupt the encode?

@violetagg
Copy link
Member

@simonbasle @machao23 I was able to reproduce the warning with this test case reactor/reactor-netty@7b08d3e
However I do not see problems with the pool.

@machao23
Copy link
Author

machao23 commented Feb 2, 2021

@violetagg Thanks your code, I would debug this case later. I noticed you mentioned retry in that case, does reactor-netty retry when the Connection reset by peer occurred ?

@violetagg
Copy link
Member

@violetagg Thanks your code, I would debug this case later. I noticed you mentioned retry in that case, does reactor-netty retry when the Connection reset by peer occurred ?

yes

@machao23
Copy link
Author

machao23 commented Feb 2, 2021

@violetagg Could you please tell me where the retry happens?

@machao23
Copy link
Author

machao23 commented Feb 3, 2021

@violetagg The test failed on expectation "expectError(Class)" failed (expected error of type: EncoderException; actual type: java.io.IOException: Connection reset by peer
And I debugged that MessageToMessageEncoder.encode() is completed without exception.

The exception stack is

java.lang.AssertionError: expectation "expectError(Class)" failed (expected error of type: EncoderException; actual type: java.io.IOException: 远程主机强迫关闭了一个现有的连接。)

	at reactor.test.MessageFormatter.assertionError(MessageFormatter.java:115)
	at reactor.test.MessageFormatter.failPrefix(MessageFormatter.java:104)
	at reactor.test.MessageFormatter.fail(MessageFormatter.java:73)
	at reactor.test.MessageFormatter.failOptional(MessageFormatter.java:88)
	at reactor.test.DefaultStepVerifierBuilder.lambda$expectError$6(DefaultStepVerifierBuilder.java:370)
	at reactor.test.DefaultStepVerifierBuilder$SignalEvent.test(DefaultStepVerifierBuilder.java:2213)
	at reactor.test.DefaultStepVerifierBuilder$DefaultVerifySubscriber.onSignal(DefaultStepVerifierBuilder.java:1485)
	at reactor.test.DefaultStepVerifierBuilder$DefaultVerifySubscriber.onExpectation(DefaultStepVerifierBuilder.java:1433)
	at reactor.test.DefaultStepVerifierBuilder$DefaultVerifySubscriber.onError(DefaultStepVerifierBuilder.java:1092)
	at reactor.core.publisher.FluxMap$MapSubscriber.onError(FluxMap.java:126)
	at reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain.onError(MonoFlatMapMany.java:197)
	at reactor.core.publisher.SerializedSubscriber.onError(SerializedSubscriber.java:124)
	at reactor.core.publisher.FluxRetryWhen$RetryWhenMainSubscriber.whenError(FluxRetryWhen.java:213)
	at reactor.core.publisher.FluxRetryWhen$RetryWhenOtherSubscriber.onError(FluxRetryWhen.java:255)
	at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.drain(FluxConcatMap.java:406)
	at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.onNext(FluxConcatMap.java:243)
	at reactor.core.publisher.DirectProcessor$DirectInner.onNext(DirectProcessor.java:333)
	at reactor.core.publisher.DirectProcessor.onNext(DirectProcessor.java:142)
	at reactor.core.publisher.SerializedSubscriber.onNext(SerializedSubscriber.java:99)
	at reactor.core.publisher.FluxRetryWhen$RetryWhenMainSubscriber.onError(FluxRetryWhen.java:180)
	at reactor.core.publisher.MonoCreate$DefaultMonoSink.error(MonoCreate.java:185)
	at reactor.netty.http.client.HttpClientConnect$HttpObserver.onUncaughtException(HttpClientConnect.java:407)
	at reactor.netty.ReactorNetty$CompositeConnectionObserver.onUncaughtException(ReactorNetty.java:511)
	at reactor.netty.resources.PooledConnectionProvider$DisposableAcquire.onUncaughtException(PooledConnectionProvider.java:549)
	at reactor.netty.resources.PooledConnectionProvider$PooledConnection.onUncaughtException(PooledConnectionProvider.java:385)
	at reactor.netty.channel.FluxReceive.drainReceiver(FluxReceive.java:230)
	at reactor.netty.channel.FluxReceive.onInboundError(FluxReceive.java:435)
	at reactor.netty.channel.ChannelOperations.onInboundError(ChannelOperations.java:442)
	at reactor.netty.channel.ChannelOperationsHandler.exceptionCaught(ChannelOperationsHandler.java:129)
	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302)
	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281)
	at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:273)
	at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireExceptionCaught(CombinedChannelDuplexHandler.java:424)
	at io.netty.channel.ChannelHandlerAdapter.exceptionCaught(ChannelHandlerAdapter.java:92)
	at io.netty.channel.CombinedChannelDuplexHandler$1.fireExceptionCaught(CombinedChannelDuplexHandler.java:145)
	at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:143)
	at io.netty.channel.CombinedChannelDuplexHandler.exceptionCaught(CombinedChannelDuplexHandler.java:231)
	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302)
	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281)
	at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:273)
	at io.netty.handler.logging.LoggingHandler.exceptionCaught(LoggingHandler.java:205)
	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302)
	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281)
	at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:273)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.exceptionCaught(DefaultChannelPipeline.java:1377)
	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302)
	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281)
	at io.netty.channel.DefaultChannelPipeline.fireExceptionCaught(DefaultChannelPipeline.java:907)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.handleReadException(AbstractNioByteChannel.java:125)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:174)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:834)
	Suppressed: java.io.IOException: 远程主机强迫关闭了一个现有的连接at java.base/sun.nio.ch.SocketDispatcher.read0(Native Method)
		at java.base/sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43)
		at java.base/sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:276)
		at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:233)
		at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:223)
		at java.base/sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:358)
		at io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:253)
		at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1133)
		at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:350)
		at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:148)
		... 8 more

@violetagg
Copy link
Member

@machao23 What OS and JDK do you use?

@machao23
Copy link
Author

machao23 commented Feb 4, 2021

@violetagg windows10 + openjdk 11+28

@violetagg
Copy link
Member

@machao23 This issue might be related to this one reactor/reactor-netty#1622

@tenghaoxiang
Copy link

@violetagg
I meet the same problem, WIP always above zero in SimpleDequePool,I use reactor-netty in Spring Cloud Gateway, version is 1.0.39. When WIP always above zero in SimpleDequePool,Gateway doesn't forwarding requests to the specialed domain t-hintap, which use k8s service. Here's the state that arthas observed:

@Poolkey[
fqdn=null,
holder=@InetSocketAddress[t-hintap:8080],
pipelineKey=@integer[-657641164],
]:@SimpleDequePool[
TERMINATED=@ConcurrentLinkedDeque[isEmpty=true;size=0],
idleResourceLeastRecentlyUsed=@boolean[false],
pendingBorrowerFirstInFirstServed=@boolean[true],
idleResources=@ConcurrentLinkedDeque[isEmpty=false;size=3],
IDLE_RESOURCES=@AtomicReferenceFieldUpdaterImpl[java.util.concurrent.atomic.AtomicReferenceFieldUpdater$AtomicReferenceFieldUpdaterImpl@45c55417],
acquired=@integer[0],
ACQUIRED=@AtomicIntegerFieldUpdaterImpl[java.util.concurrent.atomic.AtomicIntegerFieldUpdater$AtomicIntegerFieldUpdaterImpl@6a1c58e],
wip=@integer[289151],
WIP=@AtomicIntegerFieldUpdaterImpl[java.util.concurrent.atomic.AtomicIntegerFieldUpdater$AtomicIntegerFieldUpdaterImpl@1b634602],
pending=@ConcurrentLinkedDeque[isEmpty=false;size=142831],
PENDING=@AtomicReferenceFieldUpdaterImpl[java.util.concurrent.atomic.AtomicReferenceFieldUpdater$AtomicReferenceFieldUpdaterImpl@7319dcf8],
pendingSize=@integer[142831],
PENDING_SIZE=@AtomicIntegerFieldUpdaterImpl[java.util.concurrent.atomic.AtomicIntegerFieldUpdater$AtomicIntegerFieldUpdaterImpl@1c134b60],
idleSize=@integer[3],
IDLE_SIZE=@AtomicIntegerFieldUpdaterImpl[java.util.concurrent.atomic.AtomicIntegerFieldUpdater$AtomicIntegerFieldUpdaterImpl@141660d9],
evictionTask=@SchedulerTask[reactor.core.scheduler.SchedulerTask@5bf2c579],
$assertionsDisabled=@boolean[true],
logger=@Slf4JLogger[reactor.util.Loggers$Slf4JLogger@6d11263],
poolConfig=@DefaultPoolConfig[reactor.netty.internal.shaded.reactor.pool.DefaultPoolConfig@4c87da4b],
metricsRecorder=@NoOpPoolMetricsRecorder[reactor.netty.internal.shaded.reactor.pool.NoOpPoolMetricsRecorder@97e3cd0],
clock=@Systemclock[SystemClock[Z]],
lastInteractionTimestamp=@long[1715583173969],
],

@violetagg
Copy link
Member

@tenghaoxiang Are you able to provide some reproducible example or at least steps how to try to reproduce this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
for/user-attention This issue needs user attention (feedback, rework, etc...) status/need-investigation This needs more in-depth investigation type/bug A general bug
Projects
None yet
Development

No branches or pull requests

5 participants