-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High number of log streaming go-routines: Panic: out of memory #2840
Comments
@mlehner616 Thanks for the great bug reports! |
@mlehner616 Do you by chance have the full panic? |
@dadgar The entire log for this crash is ~327MB, do you have a preferred method for me to get this to you? |
@dadgar This is the only panic block I see (same as the gist). The rest of the log (>4 million lines) contains repeats of goroutine stack traces. I saved these logs off the server and zipped them to an impressive 10MB so if you want the whole thing I'd be happy to get it over to you somehow.
|
@mlehner616 Yeah if you can upload it here that would be great, if not send it to [email protected] |
@mlehner616 The panic contains ~600k log streaming related goroutines, so I'm guessing we're leaking them somewhere. Are you using the log API (either via HTTP or the Thanks! |
@dadgar @schmichael Yes, I may have been experimenting with the nomad logs cli command at least a few minutes before it crashed. I was definitely looking at logs through the hashi-ui and i imagine that uses the API. I was deploying job updates with changes mostly to the template stanza trying to figure out how to get my particular desired consul integration right. While I was doing this, I was checking both logs and the /config directory to verify my template rendered correctly. Is it odd though that only one node spun out of control here though? I suppose i could have gotten lucky and was checking logs for a number of allocs that were all on the same server, although I'm not familiar with how nomad bin packs. |
Going to change title as it is (very likely) unrelated to the release candidate changes. |
@mlehner616 Any chance you've tried to reproduce this on Nomad 0.7.1? It contained what seems like a relevant fix except that it really should have only affected Windows client nodes. |
I've been unable to reproduce this and log streaming has changed substantially since 0.6, so I'm going to close this. Sorry we couldn't find a direct resolution, but please don't hesitate to reopen if you run into this issue again! |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
A nomad client node crashed with an out of memory error despite having reserved resources configured. The instances have just shy of 4GB of memory available and the memory utilization at the time of the crash (based on relocated allocations) is estimated at about 650MB. I can say, based on how few allocations there were at the time, the client nodes were barely utilized which leads me to think the nomad client on this particular node hit some kind of memory leak until it crashed. I suppose it's also possible my ulimit settings are misconfigured, I will look into this.
Nomad version
Nomad v0.6.0-rc1 (5187fcf+CHANGES)
Operating system and Environment details
Amazon Linux kernel: 4.9.32-15.41.amzn1.x86_64
Issue
nomad client crashed with the following error:
fatal error: runtime: out of memory
Reproduction steps
I was deploying updates to a job that included additions of invalid template stanzas.
Nomad Server logs (if appropriate)
Nomad Client logs (if appropriate)
https://gist.github.com/mlehner616/20d82b2df37054e62fc9041a96498ce7
First example allocation attempt (Invalid Template stanza)
Second example allocation attempt (Invalid App config)
Job file (the consul keys were not escaped correctly)
The text was updated successfully, but these errors were encountered: