-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nomad should provide a way to know if a task failed due to OOM killer #2203
Comments
@dadgar yes please, we spent several days trying to launch owasp-zap container via nomad while it was working totally fine via docker before fixed it by assigning 1gb of memory to nomad job. |
I started to put a fix here: https://github.com/multani/nomad/commits/fix-oom-notification I was happy with this initial patch and was going to submit a PR but obviously it's only for Docker so I'm not sure if this could be proposed already. I need to have a look how to properly support the other drivers as well. |
this would be great thing to have |
@dadgar What needs to happen for this to be releasable? We sometimes run into this issue when a task's memory requirements are not completely understood/managed in advance. Setting a restart policy of |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Reference: https://groups.google.com/forum/#!msg/nomad-tool/h7qNXEsavFw/s5HEnyPWEQAJ
Nomad version
0.5.2
Operating system and Environment details
Debian Stable, running Docker 1.12
Issue
When a task gets killed because it used more memory than what has been declared in the job, the task gets killed and Nomad marks it as failed, but we don't get any more information about what the underlying problem is from Nomad. Looking at the kernel logs clearly shows the OOM killer in action.
Then, it is difficult to track down the error while looking at Nomad alone (I thought there was problem in our application) and to provide feedbacks to the Nomad operators on how to properly fix their jobs.
Reproduction steps
Related dmesg message:
Nomad Client logs (if appropriate)
Job file (if appropriate)
The text was updated successfully, but these errors were encountered: