-
Notifications
You must be signed in to change notification settings - Fork 657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ongoing connection reset by peer #3175
Comments
@AbhiramDwivedi Have you checked https://projectreactor.io/docs/netty/release/reference/index.html#faq.connection-closed, especially the part where a Network Component drops a connection silently. |
If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed. |
Hi @violetagg : Those links were quite useful in understanding the TCP settings. We tried with pretty aggressive settings and that did not help. This is almost solved with TCP changes on target cluster, with following changes:
A hundreds of million project was delayed due to this, and is now live with known intermittent issues. All network and dev teams have exhausted their capacities. Sometimes, its OK to move on than staying stuck to solve. For a case like this, or other future cases, I would expect project developers to create an option to kill the pool and act as resttemplate. We are probably going to make that code change anyway at our end, and use two different ways of invoking endpoints. This bug is not about "my" issue, but rather a permanent solution |
@AbhiramDwivedi You changed the timeouts on the target but did you add any configuration on your client e.g. |
If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed. |
Closing due to lack of requested feedback. If you would like us to look at this issue, please provide the requested information and we will re-open. |
Hi @violetagg
For the SCG springCloudVersion , "2023.0.0" and boot version is 3.2.4 so the reactor netty core version is 1.1.17 I see some similar problems noted for WebClient but I couldn't see your approval for those solutions suggested Do you have any suggestions ? |
@AbhiramDwivedi Did you check the FAQ that I provided in the comment above? Do you have TCP dump? If yes please open a new issue. |
Two of our microservices running Spring Boot deployed on AWS EKS keep running into intermittent errors with "connection reset by peer"
We have already applied #1774 (comment) and actually used shorter timeouts and evictions, but it does not help.
The problem does not happen when invocations are made from react applications to Spring Boot server, or from Spring Boot clients to non-reactor based microservices. It is possible that the problem is in Infrastructure, but AWS refuses to accept. As an essence, this is hard to replicate outside of "our" environment, or outside of individual environments that others have used and faced this in.
Expected Behavior
The subscriber should validate a connection before it uses it. If this is not the default, this should at least be an option. SO, reactor-netty are flood with issues like this, going on for years, it only makes sense to provide code level option that would work across scenarios.
Actual Behavior
Intermittent error:
Caused by: org.springframework.web.reactive.function.client.WebClientRequestException: recvAddress(..) failed: Connection reset by peer; nested exception is io.netty.channel.unix.Errors$NativeIoException: recvAddress(..) failed: Connection reset by peer
at org.springframework.web.reactive.function.client.ExchangeFunctions$DefaultExchangeFunction.lambda$wrapException$9(ExchangeFunctions.java:141)
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Error has been observed at the following site(s):
*__checkpoint ? Request to GET http://application-URL [DefaultWebClient]
Original Stack Trace:
at org.springframework.web.reactive.function.client.ExchangeFunctions$DefaultExchangeFunction.lambda$wrapException$9(ExchangeFunctions.java:141)
at reactor.core.publisher.MonoErrorSupplied.subscribe(MonoErrorSupplied.java:55)
Independent of this, the server has following logs, that may / may not be related:
Steps to Reproduce
Unable to replicate outside of our environment. In our environment too, this happens only when calls are made between Spring Boot applications running in two different EKS clusters. It does not happen when applications are running in same EKS cluster.
Possible Solution
Your Environment
Spring Boot applications running in two different EKS clusters.
netty
, ...): reactor-netty-core:jar:1.0.38java -version
): openjdk version "11.0.22" 2024-01-16 LTS, OpenJDK Runtime Environment (Red_Hat-11.0.22.0.7-1) (build 11.0.22+7-LTS)uname -a
): rhel 8The text was updated successfully, but these errors were encountered: