-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Un-inhibiting alarms #1255
Un-inhibiting alarms #1255
Conversation
This is fix for: #1153 I'm not sure tho how changes in |
Thanks for looking into this! We are subscribing to incoming alerts, which however does not handle alerts that are not updated but just resolve via timeout. This is what the comment in indicates: alertmanager/inhibit/inhibit.go Lines 85 to 86 in f4c226c
So adding an alertmanager/inhibit/inhibit.go Lines 228 to 231 in f4c226c
The memory gets cleaned up periodically here: alertmanager/inhibit/inhibit.go Lines 242 to 252 in f4c226c
Due to the way we skip over resolved alerts on reads, by my reasoning, the source of #1153 cannot be that we are not tracking resolved alerts correctly but rather. Instead my thinking is that the reason must be that an alert does not get resolved. I might be missing something crucial of course. |
Hello
Correct me if I am wrong please. After debugging the running app I found out that alerts which are not "firing" anymore are resolved. E.g. condition For example. If there is an alarm and condition for it's activation is met it will be "fired". Once the condition is not met anymore the alert becomes inactive and As far as I understand, this is the moment when we need to invalidate cache for depending alarms. When alarm X becomes inactive, if it is inhibits alarm Y, we need to "un-inhibit" it.
Could you please mention an example of a case (condition) which will not be solved?
Whenever alert becomes inactive after it's activity it is successfully resolved. At least I see it in a debugger as resolved. |
@fabxc do you have time to finish reviewing this or would you like me to take it? |
Hello guys, any update on this? |
#1309 also tries to fix this issue, and according to the test it seems to fix it, but I haven't gotten a clear description why it does this. I'll take a look at your PR and comments shortly, sorry for the long delay |
@@ -138,7 +141,7 @@ func (ih *Inhibitor) Stop() { | |||
} | |||
} | |||
|
|||
// Mutes returns true iff the given label set is muted. | |||
// Mutes returns true if the given label set is muted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iff
is not a typo, it means "if and only if".
The code path that had Looking through the code, I came to the same conclusion as @fabxc: As part of the inhibit stage, the provided Checking if an alert's labels are muted relies on reading the internal cache of alerts that match user-defined inhibition rules.
If an alert in this internal cache is
Have you run a debugger in the hey ... https://github.com/prometheus/alertmanager/blob/master/inhibit/inhibit.go#L84-L88 an alert comes in, if it's resolved, we skip updating the internal cache with that alert. So even though it's the "same alert", with the same fingerprint, maybe it's not being merged correctly in This could explain why in the other PR, removing the EDIT: |
fixed in #1331 |
Include additional unit types in the default systemd collector blacklist. Signed-off-by: Ben Kochie <[email protected]>
No description provided.