Notify for alerts failed #1683

fshabir · 2018-12-25T01:46:40Z

Do you want to request a feature or report a bug?

Bug

What did you do?

We are running 2 node alertmanager cluster. Its been running just fine from past few months, until tomorrow. We noticed that after adding few new receivers and route configurations, and reloading the configuration by (kill -s SIGHUP), the alertmanager immediately spitted out some messages like this:

2018-12-23T10:14:50.812179+00:00 prometheus1 alertmanager: level=info ts=2018-12-23T10:14:50.811699345Z caller=main.go:322 msg="Loading configuration file" file=/opt/alertmanager/alertmanager.yml
2018-12-23T10:14:50.817745+00:00 prometheus1 alertmanager: level=error ts=2018-12-23T10:14:50.817560618Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="context canceled"
2018-12-23T10:14:50.817893+00:00 prometheus1 alertmanager: level=error ts=2018-12-23T10:14:50.817580974Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="context canceled"
2018-12-23T10:14:50.817981+00:00 prometheus1 alertmanager: level=error ts=2018-12-23T10:14:50.817600149Z caller=dispatch.go:280 component=dispatcher msg="Notify for alerts failed" num_alerts=1 err="context canceled"
2018-12-23T10:14:50.818057+00:00 prometheus1 alertmanager: level=error ts=2018-12-23T10:14:50.817597885Z caller=dispatch.go:280 component=dispatcher msg="Notify for alerts failed" num_alerts=1 err="context canceled"

What did you expect to see?

As I mentioned that we have 2 node cluster, we didn't notice any such error messages on the second server, after reloading configuration. We didn't see those errors prior to this and error messages do not make mention of the alert or receiver which is failing. This happened twice, right after reloading alertmanager.

I have already take a look at #282 and #346 but firstly I think its different because this happened only right after reloads and secondly the error message are of not much help because there is no mention of the alert which wasn't dispatched.

Environment

System information:

Linux 4.4.110-1.el7.elrepo.x86_64 x86_64
Alertmanager version:

alertmanager, version 0.15.2 (branch: HEAD, revision: d19fae3)
build user: root@3101e5b68a55
build date: 20180814-10:53:39
go version: go1.10.3
Prometheus version:

prometheus, version 2.4.3 (branch: HEAD, revision: 167a4b4e73a8eca8df648d2d2043e21bdb9a7449)
build user: root@1e42b46043e9
build date: 20181004-08:42:02
go version: go1.11.1
Alertmanager configuration file:

Standard configuration working well otherwise.

Prometheus configuration file:

Standard configuration working well otherwise.

Logs:

2018-12-23T10:14:50.812179+00:00 prometheus1 alertmanager: level=info ts=2018-12-23T10:14:50.811699345Z caller=main.go:322 msg="Loading configuration file" file=/opt/alertmanager/alertmanager.yml
2018-12-23T10:14:50.817745+00:00 prometheus1 alertmanager: level=error ts=2018-12-23T10:14:50.817560618Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="context canceled"
2018-12-23T10:14:50.817893+00:00 prometheus1 alertmanager: level=error ts=2018-12-23T10:14:50.817580974Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="context canceled"
2018-12-23T10:14:50.817981+00:00 prometheus1 alertmanager: level=error ts=2018-12-23T10:14:50.817600149Z caller=dispatch.go:280 component=dispatcher msg="Notify for alerts failed" num_alerts=1 err="context canceled"
2018-12-23T10:14:50.818057+00:00 prometheus1 alertmanager: level=error ts=2018-12-23T10:14:50.817597885Z caller=dispatch.go:280 component=dispatcher msg="Notify for alerts failed" num_alerts=1 err="context canceled"
2018-12-24T06:13:27.917149+00:00 prometheus1 alertmanager: level=info ts=2018-12-24T06:13:27.916675987Z caller=main.go:322 msg="Loading configuration file" file=/opt/alertmanager/alertmanager.yml
2018-12-24T06:13:27.921093+00:00 prometheus1 alertmanager: level=error ts=2018-12-24T06:13:27.920944281Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="context canceled"
2018-12-24T06:13:27.921225+00:00 prometheus1 alertmanager: level=error ts=2018-12-24T06:13:27.92097041Z caller=dispatch.go:280 component=dispatcher msg="Notify for alerts failed" num_alerts=1 err="context canceled"
2018-12-25T01:17:26.809024+00:00 prometheus1 alertmanager: level=info ts=2018-12-25T01:17:26.808452365Z caller=main.go:322 msg="Loading configuration file" file=/opt/alertmanager/alertmanager.yml```

The text was updated successfully, but these errors were encountered:

geekodour · 2018-12-25T05:25:21Z

can you check if the alertmanager_notifications_failed_total metric doesn't change on this?

an alert would look like rate(alertmanager_notifications_failed_total[1m]) > 0 you need to have the /mertrics endpoint of the alertmanager for this metric.

fshabir · 2018-12-25T05:33:39Z

thank you @geekodour, we will start monitoring that metric (alertmanager_notifications_failed_total).

But questions still remains: we need to find out the root cause of this (why this happened out-of-blue on configuration reloads) and we do not know the name of alert which didn't get dispatched. Any ideas?

tzz · 2018-12-28T20:06:27Z

I get the same errors on reload and have alertmanager_notifications_failed_total 0 going back 12h. It would be very helpful to have more detail in these messages.

simonpasquier · 2019-01-04T11:48:24Z

My explanation is that when the configuration gets reloaded, AlertManager stops the notification's dispatcher which in turn cancels its context to stop the running aggregation groups. So the logs are "expected" but we need to do a better job at not logging this as error (though real errors still need to be logged as such).

julienlau · 2020-11-10T10:48:10Z

may be related to #1307 ?

You mention "routes changed", did you also made some change in the DNS ?

XI1062-abhisheksinghal · 2020-12-31T15:35:22Z

caller=dispatch.go:309 component=dispatcher msg="Notify for alerts failed" num_alerts=1 err="gmail-notifications/email[0]: notify retry canceled after 2 attempts: create SMTP client: EOF"

Getting the above error on triggering alerts on GMAIL mail from prometheus via alertManager , any help

pratnaparkhi2000 · 2021-03-19T11:57:14Z

@XI1062-abhisheksinghal got the same error, don't know how to proceed. Let me know if it is resolved!

XI1062-abhisheksinghal · 2021-04-16T06:26:02Z

No Still it is not resolved , did you find the solution @pratnaparkhi2000

pratnaparkhi2000 · 2021-04-16T07:10:02Z

No, I couldn't find the solution. So I switched to slack notifications instead

simonpasquier added component/notify kind/more-info-needed labels Jan 4, 2019

simonpasquier added kind/enhancement help wanted and removed kind/more-info-needed labels Jan 4, 2019

simonpasquier mentioned this issue Apr 3, 2019

*: log at debug level when context is canceled #1821

Merged

stuartnelson3 closed this as completed in #1821 Apr 3, 2019

ashishtiwari1993 mentioned this issue Nov 29, 2021

[Part 2] How to setup alertmanager and send alerts ? · ashish.one ashishtiwari1993/ashish.one#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notify for alerts failed #1683

Notify for alerts failed #1683

fshabir commented Dec 25, 2018

geekodour commented Dec 25, 2018

fshabir commented Dec 25, 2018

tzz commented Dec 28, 2018

simonpasquier commented Jan 4, 2019

julienlau commented Nov 10, 2020 •

edited

Loading

XI1062-abhisheksinghal commented Dec 31, 2020

pratnaparkhi2000 commented Mar 19, 2021

XI1062-abhisheksinghal commented Apr 16, 2021

pratnaparkhi2000 commented Apr 16, 2021

Notify for alerts failed #1683

Notify for alerts failed #1683

Comments

fshabir commented Dec 25, 2018

Do you want to request a feature or report a bug?

What did you do?

What did you expect to see?

geekodour commented Dec 25, 2018

fshabir commented Dec 25, 2018

tzz commented Dec 28, 2018

simonpasquier commented Jan 4, 2019

julienlau commented Nov 10, 2020 • edited Loading

XI1062-abhisheksinghal commented Dec 31, 2020

pratnaparkhi2000 commented Mar 19, 2021

XI1062-abhisheksinghal commented Apr 16, 2021

pratnaparkhi2000 commented Apr 16, 2021

julienlau commented Nov 10, 2020 •

edited

Loading