-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nomad 0.9.0 Client deadlocks when starting after reboot #5566
Comments
Killing the docker_logger unblocks the client and causes it to start a new alloc. Task events from original alloc:
|
maybe this is related/duplicate to #5561 ? |
@MorphBonehunter Oh interesting! Yes! The only difference is that you kill all Nomad processes but the container is still running. I actually like your repro much better. Going to close #5561 just because we've already started referencing this issue in PRs and internal comms. Sorry for missing yours and thanks for commenting. Fix is landing soon. |
Fixes #5566 . Fix a case where docker logging process may lock up nomad agent restart. Looks like we have a case where docker logger is started even through logmon isn't. In such case, the fifo writer blocks indefinitely and because the open operation happens in the main goroutine, nomad agent blocks indefinitely. This fixes the issue where the fifo open operation happens in goroutine instead of main goroutine. We should follow up independently to ensure logmon <-> dockerlogger ordering and consider having task recovery happen in non-main goroutine with some sensible timeouts.
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Originally reported by @vkiranananda here: #2560 (comment)
Nomad version
Nomad v0.9.0 (18dd59056ee1d7b2df51256fe900a98460d3d6b9)
Operating system and Environment details
dev vagrant box
Issue
When a node (the actual VM or server) is reboot Nomad deadlocks on startup trying to restore Docker tasks.
Reproduction steps
On the dev vagrant box with Nomad 0.9.0 installed with this config file.
Terminal 1:
Terminal 2:
Job file (if appropriate)
Example job (w/o service stanza)
Nomad Client logs (if appropriate)
Full logs + goroutine dump here: https://gist.github.com/schmichael/b0f663b293c9f2c47e2790c4f9f8fb70
The text was updated successfully, but these errors were encountered: