-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FrequentDockerRestart rule fine tuning #603
Comments
I'm not sure if NPD behaves properly with this, but we could try creating two or more rules, each one with different time period and count. In that way, we could catch problems which might not be detected with a single rule. If config does not support two or more rules for the same condition, I agree we should decrement the count, maybe setting an initial delay for booting nodes. |
I think it would be the best to have multiple rules.
Can you check on this? |
Sure. I'll take a dive and update with my findings. |
Good idea! Thanks Mike. |
After some research, I've found that there is no restriction for a plugin to have more than one rule for a particular condition. However, only the first rule will be executed and all the other ones with the same condition get ignored. Knowing this, these are the options we have:
I would like to open the floor for discussion. What would be our best option here? |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Looking at the rule detecting frequent restarts, it looks like detection parameters were picked initially and never changed since: #223
I observed an issue when docker was restarting every ~5 minutes consistently. And this behavior was not caught by NPD as troublesome. So I wanted to discuss whether these parameters needs to be tuned.
One suggestion would be to change detection to
count=3
in 20 minutes. I don't think it's expected to see 3 restarts in 20 minutes. Another suggestion that will affect perf - do longer period. Like 40 minutes and expect no more than 5 restarts.So I wonder if there are valid scenarios where 3 restarts in 20 minutes is an expected behavior.
The text was updated successfully, but these errors were encountered: