-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VPA] Incorrect OOM handling #3078
Comments
/cc @bskiba @jbartosik |
Based on extensive debugging and deep code analysis I found following logic assumptions:
|
The problematic scenario looks like:
Recommender will process metrics, setting lastMemorySampleStart to 00:00:45. It will process OOM after metrics. OOM sample will be skipped due to its measurement time being before lastMemorySampleStart. This results with lower memory value being stored. Moreover there is always a chance for this unwanted behavior, so there can exist a scenario where memory is never correctly bumped up. |
Quick update on the impact of the change:
Test duration consistently keeps below 5 minutes. |
@krzysied would it make sense to make a release for this fix? VPA has not been released since January |
@QuentinBisson The fix was picked up in the newer versions of VPA. It's definitely available in VPA 0.8.1 and VPA 0.9.x. |
Oh I'm sorry you are right. I did not notice this was from 2020 🤦🏻 |
It often happens that OOM memory values are not recorder.
In logs there are many entries like:
W0421 16:47:20.131118 1 cluster_feeder.go:406] Failed to record OOM {Timestamp:2020-04-21 16:46:24 +0000 UTC Memory:476450463 ContainerID:{PodID:{Namespace:vertical-pod-autoscaling-4228 PodName:hamster-6ddf9fb54d-wzzzd} ContainerName:hamster}}. Reason: error while recording OOM for {{vertical-pod-autoscaling-4228 hamster-6ddf9fb54d-wzzzd} hamster}, Reason: adding OOM sample failed
The text was updated successfully, but these errors were encountered: