-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TLSWrap leaking huge amounts of memory #3047
Comments
Same issue here.. |
Hello! Thanks for submitting this. Is there any way to reproduce it locally? |
Nope, no idea. So far I tried different attempts at reproducing this locally, and none of them seemed to work. I pulled heapdumps from many other workers we have running (we're using the node.js cluster module), and I can see similar behaviour on other nodes as well. It looks like some of these TLSWrap instances are simply left behind and never freed. I know that a locally reproducible test case would be awesome, but I don't have any. 😞 Any ideas on how I could provide more info on this issue? |
If you don't mind, can you uploader heap dump ? I can go further analyse. @arthurschreiber |
@yjhjstz Can I send you the dump via email? I don't want to upload it to a public place, as I'm not sure how much sensitive data is in there. I can compress the dumps down to ~30MB. Is that ok? |
Ok, this is fixed by #3059. Our application has an uncaughtException handler which does catch exceptions and forces the process to continue. (We know this is bad, and are actively working on changing it to gracefully restart the process instead). The issue that is fixed by #3059 plus this uncaughtException handler caused requests and their associated TLSWrap object to leak and be never collected. Thanks everyone for the help! |
Everything below is an offtopic and represents my personal opinion (well, except the «don't continue after an
That is very bad and leaves the process in an corrupted state. Also, restarting your process from within the IMO, an ideal approach in production would be a system-level supervisor that launches your process and automatically restarts it when it dies for some reason. Also, by sending periodic «I'm alive» pings from your process to the supervisor you can protect your process from hangs. If the process doesn't send a ping in the given time, the supervisor will restart it. See http://0pointer.de/blog/projects/watchdog.html for an example. |
@ChALkeR Thanks for your insight. The problem is that just restarting the application when a lot of requests are currently in flight is reaaally uncool. We already have a process supervisor and a watchdog + ping in place, so that's already covered. The approach we want to take is to use the uncaught exception handler to try and shut down everything gracefully, but force a shutdown after some time in case the exception has left the process in some state where a graceful shutdown is not possible. |
LET THE UNCAUGHT EXCEPTION CONTROVERSY BEGIN |
@ChALkeR The issue with uncaughtException is that it can range from anything to a typo in javascript code to some underlying bad thing with sockets (as in this case) and just die-and-restart is indeed destructive for in-flight requests in the case of a typo which realistically might affect only a single user or call versus taking the whole system down. There may be a non-trivial cost to booting back up. Not arguing one way or the other but it is important to understand why uncaughtException leads folks to try and continue because sometimes you just miss a silly thing and there are no consequences to continuing. |
Refs: #3068 |
Hey guys, I have the same issues with these TLSWrap objects. After a few hours of running, there are more than 10 huge objects per worker.
Any ideas? |
We recently upgraded our application to node 4.0 (and 4.1). During load and performance testing we recognized that memory usage is going through the roof.
Here's a graph that shows this:
Basically, with the start of the load test memory usage quickly climbed from ~150MB up to ~1GB and then stabilizing at 800MB.
I went ahead and took a heapdump and opened it in the Chrome Development Tools.
Here, you can see that ~400MB are used by 2 TLSWrap instances.
(EDIT: This is on 4.1, but we've seen this also on 4.0)
//cc @indutny @ChALkeR
The text was updated successfully, but these errors were encountered: