[Part 2] How to setup alertmanager and send alerts ? · ashish.one #3

utterances-bot · 2019-10-31T18:08:45Z

[Part 2] How to setup alertmanager and send alerts ? · ashish.one

undefined

https://ashish.one/blogs/setup-alertmanager/

defabiouy · 2019-10-31T18:08:46Z

Hi Ashish!

Thanks for share your knowledge! This 2nd part is very complete and the links to examples pages are so usefull!

Thanks again!
defabiouy

akankshadureja · 2020-01-07T06:14:18Z

Thanks for a precise post, it helped me configure alerts properly. However I cannot find the Part-3 of this. Would you be able to point me towards that?

ashishtiwari1993 · 2020-01-07T07:51:10Z

HI @akanshadureja Thanks for your words. I am on Part - 3. Till that i can try to solve your doubts :)

akankshadureja · 2020-01-07T10:05:58Z

Thanks a lot for the response Ashish :) I am able to connect Prometheus Data source with Grafana. I am trying to figure out if there is a way to connect Grafana Alerts with Alert Manager to configure Threshold based alerts.

ashishtiwari1993 · 2020-01-08T14:55:30Z

Hey akansha, As far i research on this, I haven't found this integration. Even in past i was looking for the same where i can simply set my alert rules on UI on grafana.

So there is 2 alertmanagers we have

Prometheus alertmanager 2. Grafana's alertmanager

I'll recommend go with Prometheus alertmanager It provides much flexible feature like grouping, batching etc.

Keep grafana only for data visualization.

And I just noticed i already released my part - 3 where I have shown how you can create your own custom exporters but not about grafana <---> Prometheus

ashishtiwari1993 · 2020-04-03T19:28:39Z

@akankshadureja I am live with Part 4 (Setup grafana with Prometheus). You can check here https://ashish.one/blogs/setup-grafana-with-prometheus/

Cryptopanda07 · 2020-11-29T10:57:02Z

hi ashish, really simplified the process for me, i am just starting my cloud admin switch ,

so just to be clear, we have to install all the exporters like jmx, node exporter and alert manager on the machine our application is running / on the machine we want to monitor

and i have my prometheus and grafana running on a seperate machine where in i can define the alerting rules and within just point the private ip of my application machine. is that correct ? thank you

ashishtiwari1993 · 2020-12-02T07:21:53Z

Hey @Cryptopanda07 Sorry for delay in reply.

Your alert file alert.rules.yml should be present on same server where your prometheus service is running because you need to specify the alert rule file path in prometheus.yml. You have to specify all rules in alert.rules.yml file only. You can check Setup Alerts Heading above.
You can specify your private IP in prometheus.yml where your alertmanager is running.

Let me know if i clearly understand your doubts.

Thanks

Cryptopanda07 · 2020-12-02T08:28:42Z

i might have confused you, my question was,

i have 3 kafka brokers and 1 zookeeper broker and 1 admin instance ( total 5 instances )

on my zookeeper and kafka instances i have node exporter and jmx running which will expose metrics for my prometheus to catch

my prometheus and grafana is running on my admin instance ( different AZ )

should my alertmanager be running on all ( kafka & zookeeper ) OR since alertmanager is only used to fire alerts, it should be run on admin machine ONLY

so prometheus will be catching for e.g " instance down " from node exporter and alert the alert manager and then alertmanager fires the alert.

ashishtiwari1993 · 2020-12-02T09:08:41Z

You no need to setup alertmanager on all instances ( Kafka & Zookeeper). Like Prometheus, Alertmanager also standalone service. In your case, you should run on admin instance only (Though you can run on any instances but not on all.). Just specify your alertmanager's url and port in prometheus.yml file, As I explained in blog.

The flow is:

Prometheus runs in specific interval -> It pull the metrics -> Evaluates the alert rules -> if alert is true, It will forward to alertmanager -> Alertmanager will push the alert on various channels.

So if prometheus will be catching "instance down" from node exporter, Prometheus will forward to alertmanager ( youralertmanagerurl:9300 ) and then alertmanager will broadcast the alert on various channel.

Cryptopanda07 · 2020-12-02T09:18:39Z

thank you so much, this is what i wanted to clear.
I am on the final step of firing my alerts right now :D

Cryptopanda07 · 2020-12-02T10:18:06Z

hey ashish a quick one, although all services are up and running the alert manager is not firing any alerts to my slack,
or even showing anything on when i go in localhost:9093.

following is my rules.yml

"/prometheus/rules.yml":
content: |
groups:
- name: AllInstances
rules:
- alert: InstanceDown
# Condition for alerting
expr: up == 0
for: 1m
# Annotation - additional informational labels to store more information
annotations:
title: 'Instance {{ $labels.instance }} down'
description: '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minute.'
# Labels - additional labels to be attached to the alert
labels:
severity: 'critical'
owner: ec2-user
group: ec2-user
mode: '000644'

and following is my prometheus.yml

"/prometheus/prometheus.yml":
content: !Sub
- |
global:
scrape_interval: 10s
evaluation_interval: 10s

                              rule_files:
                               - /prometheus/rules.yml

                              alerting:
                                alertmanagers:
                                - static_configs:
                                   - targets:
                                      - localhost:9093

                              scrape_configs:
                               - job_name: 'kafka'
                                 static_configs:
                                  - targets:
                                    - ${kafka_1}:8080
                                    - ${kafka_2}:8080
                                    - ${kafka_3}:8080
                               - job_name: 'kafka machine node'
                                 static_configs:
                                  - targets:
                                    - ${kafka_1}:9100
                                    - ${kafka_2}:9100
                                    - ${kafka_3}:9100

following is my alertmanager.yml

"/prometheus/alertmanager/alertmanager.yml":
content: |
global:
resolve_timeout: 1m
slack_api_url: 'my hook api here

                            route:
                             receiver: 'slack-notifications'

                            receivers:
                            - name: 'slack-notifications'
                              slack_configs:
                                   - channel: '#sysops-test'
                                     send_resolved: true
                        owner: ec2-user
                        group: ec2-user
                        mode: '000644'

ashishtiwari1993 · 2020-12-02T10:23:47Z

@Cryptopanda07 Here are some resources which will help you to do unit testing for your alert rule. Also to test your alertmanager configuration.

prometheus/alertmanager#437

https://gist.github.com/cherti/61ec48deaaab7d288c9fcf17e700853a

https://prometheus.io/docs/prometheus/latest/configuration/unit_testing_rules/

If you still not found anything, Please go through the alertmanager and prometheus logs, You will get some lead over there.

Cryptopanda07 · 2020-12-02T12:23:42Z

thank you so much for your help ! ill go through everything

sridhar551 · 2021-01-09T16:43:10Z

Hi Ashish, I am using prometheus alertmanager to send email notifications. Now I am able to get email notifications to admin, but I need to send email notifications to the customers when their pod memory limit reached or cpu usage reached. Can you please help me.

Cryptopanda07 · 2021-01-10T06:07:30Z

Hi Ashish, I am using prometheus alertmanager to send email notifications. Now I am able to get email notifications to admin, but I need to send email notifications to the customers when their pod memory limit reached or cpu usage reached. Can you please help me.

Hello Sridhar, I believe you need to specify the gmail / email config in the alertmanager.yml file seperately and changes rules.yml which will specify what you want the alert for.

ashishtiwari1993 · 2021-01-10T08:00:37Z

@sridhar551 If you want to send alert to your customers then you want to rewrite alertmanager.yml file too frequently For example if alerts raised for 10 servers (Let's say high disk usage) then you need to send 10 alerts to 10 different users for which you need to rewrite alertmanager.yml file and restart the service again. Which is not a good thing.

Alertmanager will send the alert event to only the admin (Or a single user). You need to accept that event and then you can route that alert to your users.

In your use case, I would suggest configuring the webhook with alertmanager. Whenever an alert will generate, The alertmanager will send the payload to your HTTP endpoint. From there you can add your business logic to send the email to your users.

For the webhook_config you can refer below link:
https://prometheus.io/docs/alerting/latest/configuration/#webhook_config

Cryptopanda07 · 2021-01-10T09:38:32Z

@sridhar551 If you want to send alert to your customers then you want to rewrite alertmanager.yml file too frequently For example if alerts raised for 10 servers (Let's say high disk usage) then you need to send 10 alerts to 10 different users for which you need to rewrite alertmanager.yml file and restart the service again. Which is not a good thing.

Alertmanager will send the alert event to only the admin (Or a single user). You need to accept that event and then you can route that alert to your users.

In your use case, I would suggest configuring the webhook with alertmanager. Whenever an alert will generate, The alertmanager will send the payload to your HTTP endpoint. From there you can add your business logic to send the email to your users.

For the webhook_config you can refer below link:
https://prometheus.io/docs/alerting/latest/configuration/#webhook_config

I agree ! It's easy to setup webhook

You can route the alerts to slack and add your customers on the slack channel and give them access to a seperate alert channel on slack.
They will be able to see that.

However sending alerts to customers make no sense. Why would you want your customers to know your setup is on fire ? :P

ashishtiwari1993 · 2021-01-10T10:11:26Z

@Cryptopanda07 Yes slack is also good option.

And there can be use cases where you need to send alert to your customer. Lets say you are hosting provider, Where you providing droplets/Servers like Digitalocean and you want to offer the alert services on the hardware usage.

In that case you need to send an alert to you client.

RemyaRChandran · 2021-01-21T05:57:04Z

Hi Ashish,

I have setup alertmanager with the below configuration. But Alertmanager is matching only the first match_re entry and send alerts to the TX team but not the second match_re entry and routes it to the default route i.e. UX team. I thought "continue: true" would do it but it is not working. Is there any issue with the configuration? The alertmanager version I am using is 0.21

global:
smtp_smarthost: 'localhost:25'
smtp_from: '[email protected]'
smtp_require_tls: false

route:
group_by: ['instance', 'alert']
group_wait: 30s
group_interval: 5m
repeat_interval: 3h
receiver: 'UX team'
routes:

match_re:
job: ^(Windows Servers 1|ECS Windows Set 1|ECS Windows Set 2)$
receiver: 'TX team'
repeat_interval: '5h'
continue: true
match_re:
job: ^(windows_1|CDD Windows Servers|Win Servers)$
receiver: 'Windows Team'
repeat_interval: '5h'
continue: true

receivers:

name: 'Windows Team'
email_configs:
- to: '[email protected] , [email protected] , [email protected]'
  send_resolved: true
name: 'UX team'
email_configs:
- to: '[email protected] , [email protected]'
  send_resolved: true
name: 'TX team'
email_configs:
- to: '[email protected] , [email protected]'
  send_resolved: true

agrawalyash783 · 2021-02-09T12:47:16Z

Hi Team
It's good to hear that the new Loki release supports Alert Configuration through alertmanager.
I am working with my team so that Digivalet can deploy grafana-loki-promtail as a centralized logging system. But our team is facing some few challenges. I am not sure whether it's a bug or our team fault.
Here my scenario is I am running Grafana-loki on 192.168.126.167 and a Promtail client on 192.168.126.168
1> where Portail client is sending my HTTPD logs to loki
2> I have installed alert manager on 192.168.126.167:9093
3> I have defined rule file to trigger alert whenever incoming log per second is more then 5
4> when Loki invoke rule file he gives output as follow

Feb 09 06:09:06 centos 7.linux vm images.local loki[3394]: level=info ts=2021-02-09T11:09:06.883855921Z caller=metrics.go:83 org_id=1 traceID=5a9b9e046985fa05 latency=fast query="sum(count_over_time({filename="/var/log/httpd/access_log"}[1s])) > 5" query_type=metric range_type=instant length=0s step=0s duration=28.679653ms status=200 throughput=0B total_bytes=0B
6> Here range type is instant and I believe that when query type is instant its doesn't return anything.
7> Help us to find the way to change to query type from instant to range.
Please find below config file of loki,alertmanager,promtail,rules1.yaml
######################### Promtail.yml #####################################################
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
url: http://192.168.126.167:3100/loki/api/v1/push
tenant_id: 1
scrape_configs:
job_name: journal
journal:
max_age: 12h
labels:
job: systemd-journal
relabel_configs:
- source_labels: ['__journal__systemd_unit']
  target_label: 'unit'
job_name: httpd
entry_parser: raw
static_configs:
- targets:
  - localhost
    labels:
    job: httpd
    path: /var/log/httpd/*log
    pipeline_stages:
match:
selector: '{job="httpd"}'
stages:
- regex:
  expression: '^(?P<remote_addr>[\w.]+) - (?P<remote_user>[^ ]) [(?P<time_local>.)] "(?P[^ ]) (?P[^ ]) (?P[^ ])" (?P[\d]+) (?P<body_bytes_sent>[\d]+) "(?P<http_referer>[^"])" "(?P<http_user_agent>[^"]*)"?'
- labels:
  remote_addr:
  remote_user:
  time_local:
  method:
  request:
  protocol:
  status:
  body_bytes_sent:
  http_referer:
  http_user_agent:
  ######################################################################################################
  ############################### LOKI.YML ###############################################################
  auth_enabled: true
  server:
  http_listen_port: 3100
  ingester:
  lifecycler:
  address: 127.0.0.1
  ring:
  kvstore:
  store: inmemory
  replication_factor: 1
  final_sleep: 0s
  chunk_idle_period: 5m
  chunk_retain_period: 30s
  max_transfer_retries: 0
  schema_config:
  configs:
from: 2018-04-15
store: boltdb
object_store: filesystem
schema: v11
index:
prefix: index_
period: 168h
ruler:
storage:
type: local
local:
directory: /tmp/loki/rules
rule_path: /tmp/scratch
alertmanager_url: http://192.168.126.167:9093
ring:
kvstore:
store: inmemory
enable_api: true
storage_config:
boltdb:
directory: /tmp/loki/index
filesystem:
directory: /tmp/loki/chunks
limits_config:
enforce_metric_name: false
reject_old_samples: true
reject_old_samples_max_age: 168h
chunk_store_config:
max_look_back_period: 0s
table_manager:
retention_deletes_enabled: false
retention_period: 0s
################################################################################################
############################ RULES1.YAML #####################################################
groups:
name: rate-alerting
rules:
- alert: HighLogRate
  expr: sum(count_over_time({filename="/var/log/httpd/access_log"}[1s])) > 5
  for: 1m
  labels:
  severity: warning
  annotations:
  title: "High LogRate Alert"
  description: "something is logging a lot"
  ###################################################################################################
  ############################ Alertmanager.yml ########################################################
  global:
  resolve_timeout: 1m
  route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: Slack-Notifications
  receivers:
name: 'Slack-Notifications'
slack_configs:
api_url: 'https://hooks.slack.com/services/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'
channel: '#loki-alert-test'
send_resolved: true
###############################################################################################

AbhinJames · 2021-08-16T16:12:01Z

Hey Ashish,

I have a small doubt. I am working on alertmanager to send alerts via email. But apparently I had to include extra pair of "{{" to finally be able to replace values.

For example:
summary: "{{ "{{ $labels.instance}}" }}'s computer {{ "{{ $labels.instance_hostname }}" }} / {{ "{{ $labels.instance }}" }} has used {{ "{{ $value }}" }}% of space in Volume C "
I guess this might be because of the differences between alertmanager and prometheus interpreters

I want to now , include the printf "%.2f" to {{ "{{ $value }}" }} to allow only 2 values after decimal point. Any idea on how to change the syntax?

I appreciate anyone's help

ashishtiwari1993 · 2021-08-21T08:05:19Z

Hi @AbhinJames

You can try {{ $value | printf "%.2f" }} expression will round it to two decimal.

maheshkapil · 2021-11-25T07:59:32Z

Hi

while configuring getting mails, but last few days not getting alertmanager email

alertmanager status showing like"evel=error ts=2021-11-25T07:43:00.086Z caller=dispatch.go:310 component=dispatcher msg="Notify for alerts failed" num_alerts=2 err="email/email[0]: notify retry canceled after 7"
what was this error means..

ashishtiwari1993 · 2021-11-25T17:25:28Z

@maheshkapil Can you confirm you SMTP is working from the same server ?

maheshkapil · 2021-11-28T14:49:22Z

Hi yes SMTP is working from server. But not going from alertmanager. Thanks for responding my query. Thanks & Regards Mahes

…

On Thu, Nov 25, 2021 at 10:55 PM Ashish Tiwari ***@***.***> wrote: @maheshkapil <https://github.com/maheshkapil> Can you confirm you SMTP is working from the same server ? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALLTKSFOKKUXYQHRCIWGP2DUNZWRFANCNFSM4JHPHKPA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

ashishtiwari1993 · 2021-11-29T06:11:11Z

@maheshkapil
Can you try to find more details by setting log level to debug for alertmanager ?

Also you can refer this similar issue:
prometheus-operator/prometheus-operator#1660
prometheus/alertmanager#1683

Repository owner locked and limited conversation to collaborators Jun 23, 2022

ashishtiwari1993 converted this issue into discussion #15 Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

[Part 2] How to setup alertmanager and send alerts ? · ashish.one #3

[Part 2] How to setup alertmanager and send alerts ? · ashish.one #3

utterances-bot commented Oct 31, 2019

defabiouy commented Oct 31, 2019

akankshadureja commented Jan 7, 2020

ashishtiwari1993 commented Jan 7, 2020

akankshadureja commented Jan 7, 2020

ashishtiwari1993 commented Jan 8, 2020

ashishtiwari1993 commented Apr 3, 2020

Cryptopanda07 commented Nov 29, 2020

ashishtiwari1993 commented Dec 2, 2020

Cryptopanda07 commented Dec 2, 2020

ashishtiwari1993 commented Dec 2, 2020

Cryptopanda07 commented Dec 2, 2020

Cryptopanda07 commented Dec 2, 2020

ashishtiwari1993 commented Dec 2, 2020

Cryptopanda07 commented Dec 2, 2020

sridhar551 commented Jan 9, 2021

Cryptopanda07 commented Jan 10, 2021

ashishtiwari1993 commented Jan 10, 2021

Cryptopanda07 commented Jan 10, 2021

ashishtiwari1993 commented Jan 10, 2021

RemyaRChandran commented Jan 21, 2021

agrawalyash783 commented Feb 9, 2021

AbhinJames commented Aug 16, 2021 •

edited

Loading

ashishtiwari1993 commented Aug 21, 2021

maheshkapil commented Nov 25, 2021

ashishtiwari1993 commented Nov 25, 2021

maheshkapil commented Nov 28, 2021 via email

ashishtiwari1993 commented Nov 29, 2021

This issue was moved to a discussion.

This issue was moved to a discussion.

[Part 2] How to setup alertmanager and send alerts ? · ashish.one #3

[Part 2] How to setup alertmanager and send alerts ? · ashish.one #3

Comments

utterances-bot commented Oct 31, 2019

[Part 2] How to setup alertmanager and send alerts ? · ashish.one

defabiouy commented Oct 31, 2019

akankshadureja commented Jan 7, 2020

ashishtiwari1993 commented Jan 7, 2020

akankshadureja commented Jan 7, 2020

ashishtiwari1993 commented Jan 8, 2020

ashishtiwari1993 commented Apr 3, 2020

Cryptopanda07 commented Nov 29, 2020

ashishtiwari1993 commented Dec 2, 2020

Cryptopanda07 commented Dec 2, 2020

ashishtiwari1993 commented Dec 2, 2020

Cryptopanda07 commented Dec 2, 2020

Cryptopanda07 commented Dec 2, 2020

ashishtiwari1993 commented Dec 2, 2020

Cryptopanda07 commented Dec 2, 2020

sridhar551 commented Jan 9, 2021

Cryptopanda07 commented Jan 10, 2021

ashishtiwari1993 commented Jan 10, 2021

Cryptopanda07 commented Jan 10, 2021

ashishtiwari1993 commented Jan 10, 2021

RemyaRChandran commented Jan 21, 2021

agrawalyash783 commented Feb 9, 2021

AbhinJames commented Aug 16, 2021 • edited Loading

ashishtiwari1993 commented Aug 21, 2021

maheshkapil commented Nov 25, 2021

ashishtiwari1993 commented Nov 25, 2021

maheshkapil commented Nov 28, 2021 via email

ashishtiwari1993 commented Nov 29, 2021

This issue was moved to a discussion.

AbhinJames commented Aug 16, 2021 •

edited

Loading