-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
drivers/executor: set oom_score_adj for raw_exec #19515
Conversation
This might not be wholly true since I don't know all configurations of Nomad, but in our use cases, we run some of our tasks as `raw_exec` for reasons. We observed that our tasks were running with `oom_score_adj = -1000`, which prevents them from being OOM'd. This value is being inherited from the nomad agent parent process, as configured by systemd. Similar to hashicorp#10698, we also were shocked to have this value inherited down to every child process and believe that we should also set this value to 0 explicitly. I have no idea if there are other paths that might leverage this or other ways that `raw_exec` can manifest, but this is how I was able to observe and fix in one of our configurations. We have been running in production our tasks wrapped in a script that does: `echo 0 > /proc/self/oom_score_adj` to avoid this issue.
To add, we applied this patch to a production host, patched over 1.7.1 tag and it behaves as expected. |
Thanks @mattrobenolt! Hope you don't mind I've cleaned up the code just a bit and added an e2e test. Spot checking to make sure it works: before:
after:
|
That's great! Thank you. :) Do you know if this will be applicable for a 1.7.x release or need to wait for 1.8? |
It will go into the next 1.7.x bugfix release (1.7.3 assuming we don't get a CVE release before then) |
This might not be wholly true since I don't know all configurations of Nomad, but in our use cases, we run some of our tasks as
raw_exec
for reasons.We observed that our tasks were running with
oom_score_adj = -1000
, which prevents them from being OOM'd. This value is being inherited from the nomad agent parent process, as configured by systemd.Similar to #10698, we also were shocked to have this value inherited down to every child process and believe that we should also set this value to 0 explicitly.
I have no idea if there are other paths that might leverage this or other ways that
raw_exec
can manifest, but this is how I was able to observe and fix in one of our configurations.We have been running in production our tasks wrapped in a script that does:
echo 0 > /proc/self/oom_score_adj
to avoid this issue.