-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce OOM watcher to allow graceful shutdown #628
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
hiddeco
force-pushed
the
oom-watcher
branch
4 times, most recently
from
March 3, 2023 23:01
20ba683
to
304cbed
Compare
hiddeco
force-pushed
the
oom-watcher
branch
5 times, most recently
from
March 4, 2023 10:43
d7f919a
to
3dbb013
Compare
pjbgf
reviewed
Mar 5, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits, but otherwise really good stuff. 👏
hiddeco
force-pushed
the
oom-watcher
branch
2 times, most recently
from
March 6, 2023 17:45
718daea
to
27cd21e
Compare
pjbgf
approved these changes
Mar 6, 2023
This commit introduces an OOM watcher, which can be enabled using `--feature-gates=OOMWatch=true`. The OOM watcher watches the current memory usage as reported by cgroups via `memory.current` and cancels the context when it reaches a certain threshold compared to `memory.max` (default `95`%, configurable using `--oom-watch-memory-threshold`). This allows ongoing Helm processes to gracefully exit with a failure before the controller is forcefully OOM killed, preventing a deadlock of releases in a pending state. The OOM watcher polls the `memory.current` file on an interval (default `500ms`, configurable using `--oom-watch-interval`), as subscribing to file updates using inotify is not possible for cgroups (v2) except for `*.events` files. Which does provide signals using `memory.events`, but these will generally be too late for our use case. As for example `high` equals `max` in most containers, buying us little time to gracefully stop our processes. In addition, because we simply watch current usage compared to max usage in bytes. This approach should work for cgroups v1 as well, given this has (most of the time) files for these values available, albeit at times at different locations. For which this commit does not introduce a flag yet, but the library takes into account that it could be configured at some point. Signed-off-by: Hidde Beydals <[email protected]>
Signed-off-by: Hidde Beydals <[email protected]>
- Change memory usage percent threshold to `uint8` to no longer allow fractions. - Validate interval to prevent configurations `<50ms`. Signed-off-by: Hidde Beydals <[email protected]>
stefanprodan
approved these changes
Mar 7, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @hiddeco 🏅
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces an OOM watcher, which can be enabled using
--feature-gates=OOMWatch=true
. The OOM watcher watches the currentmemory usage as reported by cgroups via
memory.current
and cancelsthe context when it reaches a certain threshold compared to
memory.max
(default95
%, configurable using--oom-watch-memory-threshold
).This allows ongoing Helm processes to gracefully exit with a failure
before the controller is forcefully OOM killed, preventing a deadlock
of releases in a pending state.
The OOM watcher polls the
memory.current
file on an interval (default500ms
, configurable using--oom-watch-interval
), as subscribing tofile updates using inotify is not possible for cgroups (v2) except for
*.events
files. Which does provide signals usingmemory.events
, butthese will generally be too late for our use case as for example
high
equals
max
in most containers, buying us little time to gracefullystop our processes.
In addition, because we simply watch current usage compared to max
usage in bytes. This approach should work for cgroups v1 as well, given
this has (most of the time) files for these values available as well,
albeit at times at different locations (for which this commit does not
introduce a flag yet, but the library takes into account that it could
be configured at some point).
Should help aid #149 when enabled.