Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in netty4.1 io.netty.channel.socket.nio.NioSocketChannel #11942

Closed
starsliao opened this issue Aug 4, 2024 · 7 comments · Fixed by #12003
Closed

Memory leak in netty4.1 io.netty.channel.socket.nio.NioSocketChannel #11942

starsliao opened this issue Aug 4, 2024 · 7 comments · Fixed by #12003
Assignees

Comments

@starsliao
Copy link

I encountered the issue of memory leaks with Netty. I am using version 2.5.0 of autoinstrumentation-java, and I have experienced the same problem.

In long-running Java microservices (running for more than 20 days,high volume of requests), there is insufficient Java heap memory. Many microservices are experiencing this issue, and some of these microservices are not even using Netty.

I previously had the same issue when using version 2.3.0 of autoinstrumentation-java.

This is latest Java dump file.

I am an operations engineer, and this is the phenomenon I observed. Below is the screenshot information provided by my development colleagues.

企业微信截图_17200595874833 ![企业微信截图_17200793969702](https://github.com/user-attachments/assets/c98fd2f7-633e-40e3-8b36-3ac1676d21b2) 企业微信截图_17200797606265

企业微信截图_17200793969702

Originally posted by @starsliao in #11399 (comment)

@starsliao
Copy link
Author

图片
图片

@laurit
Copy link
Contributor

laurit commented Aug 5, 2024

Is this a custom http server implemented on top of netty? Or are you using some framework. As far as I can tell there are a couple of long running connections that have processed a lot of requests. Connections that don't serve too many requests shouldn't cause this issue as the stale data would get cleaned when the connection is closed. Probably the issue is in

where server contexts are removed and spans are ended only on certain inputs to the write method. It would help to know what the server code is sending to the write method.

@starsliao
Copy link
Author

@laurit
Thank you for your answer. After communicating with the development colleagues, it was confirmed that this microservice uses Spring Boot Tomcat as the web container and doesn't use Netty.

However, most of our microservices communicate with xxljob, which is a long connection and has a heartbeat detection to maintain the long connection. Xxljob uses Netty. So I suspect if this scenario has caused the memory resources of the microservice not to be released.

Could opentelemetry-java-instrumentation be optimized for such a long connection scenario? Or are there any other ways to avoid this problem?

@laurit
Copy link
Contributor

laurit commented Aug 6, 2024

I think it is actually xxl-remoting not xxl-job that triggers the issue. What version of xxl-remoting are you using.

Could opentelemetry-java-instrumentation be optimized for such a long connection scenario? Or are there any other ways to avoid this problem?

Sure, we gladly accept pull requests that fix issues.

@laurit
Copy link
Contributor

laurit commented Aug 7, 2024

It is actually called xxl-rpc not xxl-remoting.

@laurit
Copy link
Contributor

laurit commented Aug 8, 2024

I think this happens because of https://github.com/xuxueli/xxl-rpc/blob/eeaa1bd7fc8f2249de13f971dda4f6689d66f318/xxl-rpc-core/src/main/java/com/xxl/rpc/core/remoting/net/impl/netty_http/server/NettyHttpServerHandler.java#L85-L88 There is no response for heartbeat requests. Our assumption is that every request has a matching response. When there is a request without a response we'll miss cleaning up.

@starsliao
Copy link
Author

I think this happens because of https://github.com/xuxueli/xxl-rpc/blob/eeaa1bd7fc8f2249de13f971dda4f6689d66f318/xxl-rpc-core/src/main/java/com/xxl/rpc/core/remoting/net/impl/netty_http/server/NettyHttpServerHandler.java#L85-L88 There is no response for heartbeat requests. Our assumption is that every request has a matching response. When there is a request without a response we'll miss cleaning up.

Thank you for your analysis. I will relay your description to our development team shortly.

We are currently attempting to restart the XXL-Job service. After doing so, the memory of the microservices experiencing heap memory leaks has been released.

Microservices with Memory Leaks:Weekly Memory Usage Trend Chart:
图片

before :
图片

after:
图片

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants