-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alertmanager routing tree doesn't respect the active time interval (?) #3249
Comments
Have you checked the final configuration generated by the operator? I've tried a similar example on my local machine and it works. I'd enable |
You've defined the times for the hours you expect, but by default I believe the config is going to use UTC time. I would make sure the times you have are correct in UTC. v0.25.0 release support for time zones which has made my life easier. |
I'm experiencing a similar issue using the following config
I've tried copying and pasting into https://prometheus.io/webtools/alerting/routing-tree-editor/ and using I only want the dev and uat namespace alerts to be notified during working hours, not out of hours and weekends |
It looks to me like the matching is working fine. Matching is done top-down, so the DevOutOfHours route is matching first. Perhaps give the configuration doc a re-read to make sure you are familiar with the routing and configuration option. It seems like you may be confused about what active_time_intervals and mute_time_intervals do (I believe you have them inverted). To me, it seems like you would want to set up sub-routes for the teams routes. My general structure for each team's routing looks like this: I have the default route that matches only team name, then I have sub-routes that match more specific alerts, etc. In your case, this would look like:
This matches any label_team="TeamA", then if namespace=~".+dev|.+uat" alertmanager will match the sub-route and fire under those times specified. And under this model, you can completely remove the DevOutOfHours route. Let me know if that all makes sense, and solves your problem! |
thanks for the reply, I'll give it a go. I was using the example found here https://prometheus.io/docs/alerting/latest/configuration/#example
I would expect my DevOutOfHours route only to be active out of hours and move on to the next route during office hours. Maybe I've understood it wrong. |
just saw this bit let me try it with different subroutes |
not having much luck. Shouldn't the below just route to the default receiver? It seems like the time_intervals aren't having any impact. Running it using
|
this doesn't work either. This should route to TeamC on weekends and TeamD in office hours, but it always routes to TeamC. Are my time_intervals set correctly??
|
It's possible there is a formatting issue with the time_intervals. Yours do look different than my own. I believe the routes look correct. I would check using Amtool and the online routing tree editor to confirm. I find this tool extremely useful: https://www.prometheus.io/webtools/alerting/routing-tree-editor/ Here's an example of one of my own time intervals:
|
I think the subtle difference between the config of @cbryant42 and @dtwilliamsWork is a '-'. Take for example (A): time_intervals:
- name: officehours
time_intervals:
- weekdays: ['monday:friday']
- times:
- start_time: "08:00"
end_time: "20:00"
location: Europe/London And (B): time_intervals:
- name: officehours
time_intervals:
- weekdays: ['monday:friday']
times:
- start_time: "08:00"
end_time: "20:00"
location: Europe/London In (A) there are two time interval definitions.
In (B) there is one time interval defined, the weekdays Monday through Friday and the time range between 8 o'clock and 20 o'clock. I think the alert manager uses multiple time intervals as a logical OR rather than an AND. I had the same problem and noticed this subtle difference. I still need to verify that my hunch is correct though ;) |
I struggled with this as well, something that was unclear to me, but the doc states something important for active_time_intervals (and similar for mute_time_intervals):
I was thinking that using active_time_intervals would ignore entirely the route outside the interval, but it only mutes the notifications, the route is still active at all time and will show up in the alertmanager UI when the alerts triggers. Hope that clarifies something for some people like me. |
Hi devs,
I'm trying to use the recently added
time_intervals
andactive_time_intervals
features to enable an alert to go to our pager only during working hours. I'm defining my time interval and corresponding route as follows:Furthermore, I have a prometheus rule with label
severity="workinghours"
defined. With this configuration, I would expect my prometheus rule to only be active during working hours, i.e., Monday to Friday from 8am until 6pm. However, for some reason my prometheus rule also fires and moreover gets routed to our pager outside of the hours 8am until 6pm. That is, I get paged even when I shouldn't get paged according to the above-defined time intervalworkinghours
.Did I do something wrong here in the configuration? Or might this be a problem with the K8s
prometheus-operator
(or thekube-prometheus-stack
helm chart) through which I'm using this new alertmanager feature?At the moment, I'm using the following versions:
alertmanager: v0.24.0
prometheus-operator: v0.62.0
kube-prometheus-stack helm chart: 44.4.1
Originally posted by @bollmann in #2779 (comment)
The text was updated successfully, but these errors were encountered: