-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Notify for alerts failed #1683
Comments
can you check if the an alert would look like |
thank you @geekodour, we will start monitoring that metric (alertmanager_notifications_failed_total). But questions still remains: we need to find out the root cause of this (why this happened out-of-blue on configuration reloads) and we do not know the name of alert which didn't get dispatched. Any ideas? |
I get the same errors on reload and have |
My explanation is that when the configuration gets reloaded, AlertManager stops the notification's dispatcher which in turn cancels its context to stop the running aggregation groups. So the logs are "expected" but we need to do a better job at not logging this as error (though real errors still need to be logged as such). |
may be related to #1307 ? You mention "routes changed", did you also made some change in the DNS ? |
caller=dispatch.go:309 component=dispatcher msg="Notify for alerts failed" num_alerts=1 err="gmail-notifications/email[0]: notify retry canceled after 2 attempts: create SMTP client: EOF" Getting the above error on triggering alerts on GMAIL mail from prometheus via alertManager , any help |
@XI1062-abhisheksinghal got the same error, don't know how to proceed. Let me know if it is resolved! |
No Still it is not resolved , did you find the solution @pratnaparkhi2000 |
No, I couldn't find the solution. So I switched to slack notifications instead |
Do you want to request a feature or report a bug?
Bug
What did you do?
We are running 2 node alertmanager cluster. Its been running just fine from past few months, until tomorrow. We noticed that after adding few new receivers and route configurations, and reloading the configuration by (kill -s SIGHUP), the alertmanager immediately spitted out some messages like this:
What did you expect to see?
As I mentioned that we have 2 node cluster, we didn't notice any such error messages on the second server, after reloading configuration. We didn't see those errors prior to this and error messages do not make mention of the alert or receiver which is failing. This happened twice, right after reloading alertmanager.
I have already take a look at #282 and #346 but firstly I think its different because this happened only right after reloads and secondly the error message are of not much help because there is no mention of the alert which wasn't dispatched.
Environment
System information:
Linux 4.4.110-1.el7.elrepo.x86_64 x86_64
Alertmanager version:
alertmanager, version 0.15.2 (branch: HEAD, revision: d19fae3)
build user: root@3101e5b68a55
build date: 20180814-10:53:39
go version: go1.10.3
Prometheus version:
prometheus, version 2.4.3 (branch: HEAD, revision: 167a4b4e73a8eca8df648d2d2043e21bdb9a7449)
build user: root@1e42b46043e9
build date: 20181004-08:42:02
go version: go1.11.1
Alertmanager configuration file:
The text was updated successfully, but these errors were encountered: