Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE REQUEST] RabbitMQ monitoring #1487

Closed
toszo opened this issue Jul 24, 2020 · 4 comments
Closed

[FEATURE REQUEST] RabbitMQ monitoring #1487

toszo opened this issue Jul 24, 2020 · 4 comments

Comments

@toszo
Copy link
Contributor

toszo commented Jul 24, 2020

Is your feature request related to a problem? Please describe.
We are unable to collect metrics and create alerts on RabbitMQ. This means if the queue is backing and causing a customer impactful issue, we cannot send an alert.

Describe the solution you'd like
We would like to have RabbitMQ Exporter to expose metrics like below and display them in Grafana for monitoring and alerts in prometheus such as:

RabbitMQ : kbudde/rabbitmq-exporter (11 rules)[copy all]
6.1. Rabbitmq down
RabbitMQ node down[copy]

alert: RabbitmqDown
    expr: rabbitmq_up == 0
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Rabbitmq down (instance {{ $labels.instance }})"
      description: "RabbitMQ node down\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

6.2. Rabbitmq cluster down
Less than 3 nodes running in RabbitMQ cluster[copy]

alert: RabbitmqClusterDown
    expr: sum(rabbitmq_running) < 3
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Rabbitmq cluster down (instance {{ $labels.instance }})"
      description: "Less than 3 nodes running in RabbitMQ cluster\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

6.3. Rabbitmq cluster partition
Cluster partition[copy]

alert: RabbitmqClusterPartition
    expr: rabbitmq_partitions > 0
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Rabbitmq cluster partition (instance {{ $labels.instance }})"
      description: "Cluster partition\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

6.4. Rabbitmq out of memory
Memory available for RabbmitMQ is low (< 10%)[copy]

alert: RabbitmqOutOfMemory
    expr: rabbitmq_node_mem_used / rabbitmq_node_mem_limit * 100 > 90
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Rabbitmq out of memory (instance {{ $labels.instance }})"
      description: "Memory available for RabbmitMQ is low (< 10%)\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

6.5. Rabbitmq too many connections
RabbitMQ instance has too many connections (> 1000)[copy]

alert: RabbitmqTooManyConnections
    expr: rabbitmq_connectionsTotal > 1000
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Rabbitmq too many connections (instance {{ $labels.instance }})"
      description: "RabbitMQ instance has too many connections (> 1000)\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

6.6. Rabbitmq dead letter queue filling up
Dead letter queue is filling up (> 10 msgs)[copy]

alert: RabbitmqDeadLetterQueueFillingUp
    expr: rabbitmq_queue_messages{queue="my-dead-letter-queue"} > 10
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Rabbitmq dead letter queue filling up (instance {{ $labels.instance }})"
      description: "Dead letter queue is filling up (> 10 msgs)\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

6.7. Rabbitmq too many messages in queue
Queue is filling up (> 1000 msgs)[copy]

alert: RabbitmqTooManyMessagesInQueue
    expr: rabbitmq_queue_messages_ready{queue="my-queue"} > 1000
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Rabbitmq too many messages in queue (instance {{ $labels.instance }})"
      description: "Queue is filling up (> 1000 msgs)\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

6.8. Rabbitmq slow queue consuming
Queue messages are consumed slowly (> 60s)[copy]

alert: RabbitmqSlowQueueConsuming
    expr: time() - rabbitmq_queue_head_message_timestamp{queue="my-queue"} > 60
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Rabbitmq slow queue consuming (instance {{ $labels.instance }})"
      description: "Queue messages are consumed slowly (> 60s)\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

6.9. Rabbitmq no consumer
Queue has no consumer[copy]

alert: RabbitmqNoConsumer
    expr: rabbitmq_queue_consumers == 0
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Rabbitmq no consumer (instance {{ $labels.instance }})"
      description: "Queue has no consumer\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

6.10. Rabbitmq too many consumers
Queue should have only 1 consumer[copy]

alert: RabbitmqTooManyConsumers
    expr: rabbitmq_queue_consumers > 1
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Rabbitmq too many consumers (instance {{ $labels.instance }})"
      description: "Queue should have only 1 consumer\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

6.11. Rabbitmq unactive exchange
Exchange receive less than 5 msgs per second[copy]

alert: RabbitmqUnactiveExchange
    expr: rate(rabbitmq_exchange_messages_published_in_total{exchange="my-exchange"}[1m]) < 5
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Rabbitmq unactive exchange (instance {{ $labels.instance }})"
      description: "Exchange receive less than 5 msgs per second\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

Describe alternatives you've considered
We can upgrade RabbitMQ to newer version that should contain metrics exporter included.

Additional context
No.

@ar3ndt
Copy link
Contributor

ar3ndt commented Jul 28, 2020

Newer version 3.8.x supports monitoring:

RabbitMQ 3.8 comes with new support for exposing its metrics via a Prometheus endpoint. Additionally, many more metrics are now available, vastly improving the overall observability of RabbitMQ. Visualizing these metrics is now as simple as importing pre-built dashboards into Grafana.

There is a PR with bumped version but still on hold as it would require no data loss protection somehow. More details in the comment on the PR:

#1464

@rafzei rafzei changed the title RabbitMQ monitoring [FEATURE REQUEST] RabbitMQ monitoring Jul 29, 2020
@rafzei rafzei added this to the S20201119 milestone Oct 27, 2020
@mkyc mkyc removed this from the S20201119 milestone Nov 5, 2020
@mkyc
Copy link
Contributor

mkyc commented Aug 27, 2021

After last upgrade we probably have it solved because there is included exporter in new version.

@mkyc mkyc added the priority/low Task with low priority label Aug 27, 2021
@sbbroot sbbroot self-assigned this Sep 3, 2021
sbbroot added a commit to sbbroot/epiphany that referenced this issue Sep 22, 2021
* Add Grafana dashboards for RabbitMQ monitoring
sbbroot added a commit to sbbroot/epiphany that referenced this issue Sep 22, 2021
* Add Grafana dashboards for RabbitMQ monitoring
sbbroot added a commit to sbbroot/epiphany that referenced this issue Sep 22, 2021
* Add Grafana dashboards for RabbitMQ monitoring
seriva pushed a commit that referenced this issue Sep 30, 2021
* RabbitMQ monitoring (#1487)

* Add Grafana dashboards for RabbitMQ monitoring

* * add dashboard license

* remove useless comment
@przemyslavic przemyslavic self-assigned this Oct 7, 2021
@przemyslavic
Copy link
Collaborator

@sbbroot

  1. Missing documentation.
  2. In my opinion, this PR does basically nothing. It just downloads a single json file. And it certainly does not give us the opportunity to monitor rabbitmq.
    I provided the configuration for grafana dashboards:
---
kind: configuration/grafana
title: "Grafana"
name: default
provider: aws
specification:
  grafana_external_dashboards:
      # RabbitMQ cluster monitoring (via Prometheus)
    - dashboard_id: '10991'
      datasource: 'Prometheus'

and all I could see was the empty dashboard with no data.

To enable monitoring I had to manually:

  • enable plugin responsible for exposing all rabbitmq metrics
  • add target for Prometheus to be able to scrape metrics from rabbitmq nodes
  • edit network security groups to enable traffic on a specific port

In my opinion, all these changes should be included in this PR and additionally documented.

Moving back to TODO.

sbbroot added a commit to sbbroot/epiphany that referenced this issue Oct 26, 2021
* Enable RabbitMQ's plugin for Prometheus metrics exposure

* Add target for Prometheus to be able to scrape metrics from rabbitmq nodes

* Download Grafana dashboard for displaying scraped metrics from RabbitMQ
sbbroot added a commit to sbbroot/epiphany that referenced this issue Nov 16, 2021
* Enable RabbitMQ's plugin for Prometheus metrics exposure

* Add target for Prometheus to be able to scrape metrics from rabbitmq nodes

* Download Grafana dashboard for displaying scraped metrics from RabbitMQ
sbbroot added a commit that referenced this issue Nov 16, 2021
* RabbitMQ monitoring automation (#1487)

* Enable RabbitMQ's plugin for Prometheus metrics exposure

* Add target for Prometheus to be able to scrape metrics from rabbitmq nodes

* Download Grafana dashboard for displaying scraped metrics from RabbitMQ
@przemyslavic
Copy link
Collaborator

✔️ Fixed. To enable RabbitMQ monitoring it is enough to set rabbitmq_monitoring_enabled: true in the RabbitMQ configuration. This will allow Prometheus to scrap the metrics from all nodes and visualize them in Grafana.

@mkyc mkyc closed this as completed Nov 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants