Skip to content

Files

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

HAProxy

Overview

How to install:

  1. Add in HAProxy (or make sure that you have) next rules to enable statistics on socket

vi /etc/haproxy/haproxy.cfg
stats socket /var/lib/haproxy/stats mode 666 level admin
stats timeout 30s

  1. Install socat and nc: yum install nc socat -yum

  2. Make sure that HAProxy user can read from socket :sudo -uhaproxy echo "show info;show stat" | socat stdio unix-connect:/var/lib/haproxy/stats

  3. Copy files:

a) userparameter\_haproxy.conf in /etc/zabbix/zabbix\_agentd.d/

b) haproxy_discovery.sh in /etc/zabbix/scripts/

c) haproxy_stats.sh in /etc/zabbix/scripts/

Make b and c scripts executable with chmod +x script_name

Note: Make sure that /etc/zabbix/scripts/ exist, if not, create it: mkdir -p /etc/zabbix/scripts/

  1. Add host for HAProxy in Zabbix, add template, wait some time for get data

(You can change LLD discovery time to get data more faster, but after change to initial)

This template is based on:

a) Solution by Anastas Dancha - https://github.com/anapsix/zabbix-haproxy

b) Official template from Zabbix for Zabbix > 4.4 - https://www.zabbix.com/integrations/haproxy

The reason why I create this template was to have official zabbix template logic in Zabbix under 4.4

Files are there

a) https://cloud.mail.ru/public/D2M5%2F7ZEamjnVF

b) https://drive.google.com/open?id=16xoJyWut9R\_EudcRyAf2Ui8WuPyTxw6D

Write to [email protected] if something is not clear

Have a nice day

Author

Tudor Ticau

Macros used

Name Description Default Type
{$HAPROXY_CONFIG}

-

/etc/haproxy/haproxy.cfg Text macro
{$HAPROXY_SOCK}

-

/var/lib/haproxy/stats Text macro

Template links

There are no template links in this template.

Discovery rules

Name Description Type Key and additional info
HAProxy server discovery

-

Zabbix agent haproxy.list.discovery[{$HAPROXY_SOCK},SERVER]

Update: 1h

HAProxy backend discovery

-

Zabbix agent haproxy.list.discovery[{$HAPROXY_SOCK},BACK]

Update: 1h

HAProxy frontend discovery

-

Zabbix agent haproxy.list.discovery[{$HAPROXY_SOCK},FRONT]

Update: 1d

Items collected

Name Description Type Key and additional info
HAProxy memory used

-

Zabbix agent proc.mem[haproxy]

Update: 300

HAProxy config file checksum ($1)

-

Zabbix agent vfs.file.cksum[{$HAPROXY_CONFIG}]

Update: 600

HAProxy number of running processes

-

Zabbix agent proc.num[haproxy]

Update: 60

HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Responses denied per second

Responses denied due to security concerns (ACL-restricted). In most cases denials will originate in the frontend (e.g., a user is attempting to access an unauthorized URL). However, sometimes a request may be benign, yet the corresponding response contains sensitive information. In that case, you would want to set up an ACL to deny the offending response. Backend responses that are denied due to ACL restrictions will emit a 502 error code. With properly configured access controls on frontend, this metric should stay at or near zero. Denied responses and an increase in 5xx responses go hand-in-hand. If you are seeing a large number of 5xx responses, you should check your denied responses to shed some light on the increase in error codes

Zabbix agent (active) haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},{#SERVER_NAME},dresp]

Update: 60

LLD

HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Errors connection per second

Number of requests that encountered an error attempting to connect to a backend server. Backend connection failures should be acted upon immediately. Unfortunately, the econ metric not only includes failed backend requests but additionally includes general backend errors, like a backend without an active frontend. Thankfully, correlating this metric with eresp and response codes from both frontend and backend servers will give a better idea of the causes of an increase in backend connection errors.

Zabbix agent (active) haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},{#SERVER_NAME},econ]

Update: 60

LLD

HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Response errors per second

Number of requests whose responses yielded an error This represents the number of response errors generated by your backends. This includes errors caused by data transfers aborted by the servers as well as write errors on the client socket and failures due to ACLs. Combined with other error metrics, the backend error response rate helps diagnose the root cause of response errors. For example, an increase in both the backend error response rate and denied responses could indicate that clients are repeatedly attempting to access ACL-ed resources.

Zabbix agent (active) haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},{#SERVER_NAME},eresp]

Update: 60

LLD

HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Number of responses with codes 4xx per second

Number of HTTP client errors per second.

Zabbix agent (active) haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},{#SERVER_NAME},hrsp_4xx]

Update: 60

LLD

HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Number of responses with codes 5xx per second

Number of HTTP server errors per second.

Zabbix agent (active) haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},{#SERVER_NAME},hrsp_5xx]

Update: 60

LLD

HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Unassigned requests

Current number of requests unassigned in queue. The qcur metric tracks the current number of connections awaiting assignment to a backend server. If you have enabled cookies and the listed server is unavailable, connections will be queued until the queue timeout is reached

Zabbix agent (active) haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},{#SERVER_NAME},qcur]

Update: 60

LLD

HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Time in queue

Average time spent in queue (in ms) for the last 1,024 requests Minimizing time spent in the queue results in lower latency and an overall better client experience. Each use case can tolerate a certain amount of queue time but in general, you should aim to keep this value as low as possible

Zabbix agent (active) haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},{#SERVER_NAME},qtime]

Update: 60

LLD

HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Responses time

Average backend response time (in ms) for the last 1,024 requests Tracking average response times is an effective way to measure the latency of your load-balancing setup. Generally speaking, response times in excess of 500 ms will lead to degradation of application performance and customer experience. Monitoring the average response time can give you the upper hand to respond to latency issues before your customers are substantially impacted. Keep in mind that this metric will be zero if you are not using HTTP (see #60)

Zabbix agent (active) haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},{#SERVER_NAME},rtime]

Update: 60

LLD

HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Status

HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}] status UP = 1 DOWN = 0

Zabbix agent (active) haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},{#SERVER_NAME},status]

Update: 60

LLD

HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Redispatched requests per second

Number of times a request was redispatched to a different backend. The redispatch rate metric tracks the number of times a client connection was unable to reach its original target, and was subsequently sent to a different server. If a client holds a cookie referencing a backend server that is down, the default action is to respond to the client with a 502 status code. However, if is enabled option redispatch in haproxy.cfg, the request will be sent to any available backend server and the cookie will be ignored.

Zabbix agent (active) haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},{#SERVER_NAME},wredis]

Update: 60

LLD

HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Retried connections per second

Number of times a connection was retried. Some dropped or timed-out connections are to be expected when connecting to a backend server. The retry rate represents the number of times a connection to a backend server was retried. This metric is usually non-zero under normal operating conditions. Should you begin to see more retries than usual, it is likely that other metrics will also change, including econ and eresp. Tracking the retry rate in addition to the above two error metrics can shine some light on the true cause of an increase in errors

Zabbix agent (active) haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},{#SERVER_NAME},wretr]

Update: 60

LLD

HAProxy Backend [{#BACKEND_NAME}] bytes in

HAProxy Backend bytes in

Zabbix agent (active) haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},BACKEND,bin]

Update: 60

LLD

HAProxy Backend [{#BACKEND_NAME}] bytes out

HAProxy Backend bytes out

Zabbix agent (active) haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},BACKEND,bout]

Update: 60

LLD

HAProxy Backend [{#BACKEND_NAME}]: Responses denied per second

Responses denied due to security concerns (ACL-restricted). In most cases denials will originate in the frontend (e.g., a user is attempting to access an unauthorized URL). However, sometimes a request may be benign, yet the corresponding response contains sensitive information. In that case, you would want to set up an ACL to deny the offending response. Backend responses that are denied due to ACL restrictions will emit a 502 error code. With properly configured access controls on frontend, this metric should stay at or near zero. Denied responses and an increase in 5xx responses go hand-in-hand. If you are seeing a large number of 5xx responses, you should check your denied responses to shed some light on the increase in error codes

Zabbix agent (active) haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},BACKEND,dresp]

Update: 60

LLD

HAProxy Backend [{#BACKEND_NAME}]: Errors connection per second

Number of requests that encountered an error attempting to connect to a backend server. Backend connection failures should be acted upon immediately. Unfortunately, the econ metric not only includes failed backend requests but additionally includes general backend errors, like a backend without an active frontend. Thankfully, correlating this metric with eresp and response codes from both frontend and backend servers will give a better idea of the causes of an increase in backend connection errors.

Zabbix agent (active) haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},BACKEND,econ]

Update: 60

LLD

HAProxy Backend [{#BACKEND_NAME}] : Response errors per second

Number of requests whose responses yielded an error This represents the number of response errors generated by your backends. This includes errors caused by data transfers aborted by the servers as well as write errors on the client socket and failures due to ACLs. Combined with other error metrics, the backend error response rate helps diagnose the root cause of response errors. For example, an increase in both the backend error response rate and denied responses could indicate that clients are repeatedly attempting to access ACL-ed resources.

Zabbix agent (active) haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},BACKEND,eresp]

Update: 60

LLD

HAProxy Backend [{#BACKEND_NAME}]: Unassigned requests

Current number of requests unassigned in queue. The qcur metric tracks the current number of connections awaiting assignment to a backend server. If you have enabled cookies and the listed server is unavailable, connections will be queued until the queue timeout is reached

Zabbix agent (active) haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},BACKEND,qcur]

Update: 60

LLD

HAProxy Backend [{#BACKEND_NAME}]: Time in queue

Average time spent in queue (in ms) for the last 1,024 requests Minimizing time spent in the queue results in lower latency and an overall better client experience. Each use case can tolerate a certain amount of queue time but in general, you should aim to keep this value as low as possible

Zabbix agent (active) haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},BACKEND,qtime]

Update: 60

LLD

HAProxy Backend [{#BACKEND_NAME}]: Responses time

Average backend response time (in ms) for the last 1,024 requests Tracking average response times is an effective way to measure the latency of your load-balancing setup. Generally speaking, response times in excess of 500 ms will lead to degradation of application performance and customer experience. Monitoring the average response time can give you the upper hand to respond to latency issues before your customers are substantially impacted. Keep in mind that this metric will be zero if you are not using HTTP (see #60)

Zabbix agent (active) haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},BACKEND,rtime]

Update: 60

LLD

HAProxy Backend [{#BACKEND_NAME}]: Status

HAProxy Backend [{#BACKEND_NAME}] status UP = 1 DOWN = 0

Zabbix agent (active) haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},BACKEND,status]

Update: 60

LLD

HAProxy Backend [{#BACKEND_NAME}]: Redispatched requests per second

Number of times a request was redispatched to a different backend. The redispatch rate metric tracks the number of times a client connection was unable to reach its original target, and was subsequently sent to a different server. If a client holds a cookie referencing a backend server that is down, the default action is to respond to the client with a 502 status code. However, if is enabled option redispatch in haproxy.cfg, the request will be sent to any available backend server and the cookie will be ignored.

Zabbix agent (active) haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},BACKEND,wredis]

Update: 60

LLD

HAProxy Backend [{#BACKEND_NAME}]: Retried connections per second

Number of times a connection was retried. Some dropped or timed-out connections are to be expected when connecting to a backend server. The retry rate represents the number of times a connection to a backend server was retried. This metric is usually non-zero under normal operating conditions. Should you begin to see more retries than usual, it is likely that other metrics will also change, including econ and eresp. Tracking the retry rate in addition to the above two error metrics can shine some light on the true cause of an increase in errors

Zabbix agent (active) haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},BACKEND,wretr]

Update: 60

LLD

HAProxy Frontend [{#FRONTEND_NAME}]: Incoming traffic

Number of bits received by the frontend

Zabbix agent (active) haproxy.stats[{$HAPROXY_SOCK},{#FRONTEND_NAME},FRONTEND,bin]

Update: 60

LLD

HAProxy Frontend [{#FRONTEND_NAME}]: Outgoing traffic

Number of bits sent by the frontend

Zabbix agent (active) haproxy.stats[{$HAPROXY_SOCK},{#FRONTEND_NAME},FRONTEND,bout]

Update: 60

LLD

HAProxy Frontend [{#FRONTEND_NAME}]: Denied requests per second

Requests denied due to security concerns (ACL-restricted) per second. An increase in denied requests will subsequently cause an increase in 403 Forbidden codes. - For tcp this is because of a matched tcp-request content rule. - For http this is because of a matched http-request or tarpit rule. Correlating the two can help to discern the root cause of an increase in 4xx responses.

Zabbix agent (active) haproxy.stats[{$HAPROXY_SOCK},{#FRONTEND_NAME},FRONTEND,dreq]

Update: 60

LLD

HAProxy Frontend[{#FRONTEND_NAME}]: Request errors per second

HTTP request errors per second. The frontend request rate measures the number of requests received over the last second. Keeping an eye on peaks and drops is essential to ensure continuous service availability. In the event of a traffic spike, clients could see increases in latency or even denied connections. Some of the possible causes are: - early termination from the client, before the request has been sent. - read error from the client - client timeout - client closed connection - various bad requests from the client. - request was tarpitted.

Zabbix agent (active) haproxy.stats[{$HAPROXY_SOCK},{#FRONTEND_NAME},FRONTEND,ereq]

Update: 60

LLD

HAProxy Frontend [{#FRONTEND_NAME}]: Number of responses with codes 1xx per second

Number of informational (1xx) HTTP responses per second.

Zabbix agent (active) haproxy.stats[{$HAPROXY_SOCK},{#FRONTEND_NAME},FRONTEND,hrsp_1xx]

Update: 60

LLD

HAProxy Frontend [{#FRONTEND_NAME}]: Number of responses with codes 2xx per second

Number of successful HTTP responses per second. ( with 2xx code)

Zabbix agent (active) haproxy.stats[{$HAPROXY_SOCK},{#FRONTEND_NAME},FRONTEND,hrsp_2xx]

Update: 60

LLD

HAProxy Frontend [{#FRONTEND_NAME}]: Number of responses with codes 3xx per second

Number of HTTP redirections per second.. ( with 3xx code)

Zabbix agent (active) haproxy.stats[{$HAPROXY_SOCK},{#FRONTEND_NAME},FRONTEND,hrsp_3xx]

Update: 60

LLD

HAProxy Frontend [{#FRONTEND_NAME}]: Number of responses with codes 4xx per second

Number of HTTP client errors per second. ( with 4xx code) Ideally, all responses forwarded by HAProxy would be class 2xx codes, so an unexpected surge in the number of other code classes could be a sign of trouble. Correlating the denial metrics with the response code data can shed light on the cause of an increase in error codes. No change in denials coupled with an increase in the number of 404 responses could point to a misconfigured application or unruly client.

Zabbix agent (active) haproxy.stats[{$HAPROXY_SOCK},{#FRONTEND_NAME},FRONTEND,hrsp_4xx]

Update: 60

LLD

HAProxy Frontend [{#FRONTEND_NAME}]: Number of responses with codes 5xx per second

Number of HTTP server errors per second. ( with 5xx code) Ideally, all responses forwarded by HAProxy would be class 2xx codes, so an unexpected surge in the number of other code classes could be a sign of trouble. Correlating the denial metrics with the response code data can shed light on the cause of an increase in error codes.

Zabbix agent (active) haproxy.stats[{$HAPROXY_SOCK},{#FRONTEND_NAME},FRONTEND,hrsp_5xx]

Update: 60

LLD

HAProxy Frontend [{#FRONTEND_NAME}]: Number of responses with other codes per second

Number of other HTTP server errors per second. ( all other codes, no 1-5xx)

Zabbix agent (active) haproxy.stats[{$HAPROXY_SOCK},{#FRONTEND_NAME},FRONTEND,hrsp_other]

Update: 60

LLD

HAProxy Frontend [{#FRONTEND_NAME}]: Sessions rate

Number of sessions created per second A significant spike in the number of sessions over a short period could cripple server operations and bring servers down

Zabbix agent (active) haproxy.stats[{$HAPROXY_SOCK},{#FRONTEND_NAME},FRONTEND,rate]

Update: 60

LLD

HAProxy Frontend [{#FRONTEND_NAME}]: Requests rate

HTTP requests per second. The frontend request rate measures the number of requests received over the last second. Keeping an eye on peaks and drops is essential to ensure continuous service availability. In the event of a traffic spike, clients could see increases in latency or even denied connections.

Zabbix agent (active) haproxy.stats[{$HAPROXY_SOCK},{#FRONTEND_NAME},FRONTEND,req_rate]

Update: 60

LLD

HAProxy Frontend [{#FRONTEND_NAME}]: Established sessions

The current number of established sessions.

Zabbix agent (active) haproxy.stats[{$HAPROXY_SOCK},{#FRONTEND_NAME},FRONTEND,scur]

Update: 60

LLD

HAProxy Frontend [{#FRONTEND_NAME}]: Session limits

The most simultaneous sessions that are allowed, as defined by the maxconn setting in the frontend.

Zabbix agent (active) haproxy.stats[{$HAPROXY_SOCK},{#FRONTEND_NAME},FRONTEND,slim]

Update: 60

LLD

HAProxy Frontend [{#FRONTEND_NAME}]: Session utilization

Percentage of sessions used (scur / slim * 100). For every HAProxy session, two connections are consumed—one for the client to HAProxy, and the other for HAProxy to your backend. Alerting on this metric is essential to ensure your server has sufficient capacity to handle all concurrent sessions. Unlike requests, upon reaching the session limit HAProxy will deny additional clients until resource consumption drops.

Calculated haproxy.stats[{$HAPROXY_SOCK},{#FRONTEND_NAME},FRONTEND,sutil]

Update: 60

LLD

Triggers

Name Description Expression Priority
HAProxy Backend [{#BACKEND_NAME}]: Average response time is more than 10 sec for 5m

Average backend response time (in ms) for the last 1,024 requests is more than 10 seconds. Tracking average response times is an effective way to measure the latency of haproxy load-balancing setup. Generally speaking, response times in excess of 500 ms will lead to degradation of application performance and customer experience. Monitoring the average response time can give you the upper hand to respond to latency issues before your customers are substantially impacted. Keep in mind that this metric will be zero if you are not using HTTP

Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},BACKEND,rtime].min(5m)}>10s

Recovery expression:

average
HAProxy Backend [{#BACKEND_NAME}]: Average time spent in queue is more than 10 sec for 5m

Average time spent in queue (in ms) for the last 1,024 requests is more than 10 s. It is obviously that minimizing time spent in the queue results in lower latency and an overall better client experience. Each use case can tolerate a certain amount of queue time but in general, you should aim to keep this value as low as possible

Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},BACKEND,qtime].min(5m)}>10s

Recovery expression:

average
HAProxy Backend [{#BACKEND_NAME}]: Current number of requests unassigned in queue is more than 10 for 5m

Current number of requests on backend unassigned in queue is more than 10. If your backend is bombarded with connections to the point you have reached your global maxconn limit, HAProxy will seamlessly queue new connections in system kernel’s socket queue until a backend server becomes available. Keeping connections out of the queue is ideal, resulting in less latency and a better user experience. You should alert if the size of your queue exceeds the threshold. If you find that connections are consistently enqueueing, configuration changes may be in order, such as increasing global maxconn limit or changing the connection limits on individual backend servers. Keep in mind: empty queue = happy client

Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},BACKEND,qcur].min(5m)}>10

Recovery expression:

average
HAProxy Backend [{#BACKEND_NAME}]: Number of responses with error is more than 10 for 5m

Number of requests on backend, whose responses yielded an error, is more than 10. The backend error response rate represents the number of response errors generated by your backends. This includes errors caused by data transfers aborted by the servers as well as write errors on the client socket and failures due to ACLs. Combined with other error metrics, the backend error response rate helps diagnose the root cause of response errors. For example, an increase in both the backend error response rate and denied responses could indicate that clients are repeatedly attempting to access ACL-ed resources.

Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},BACKEND,eresp].min(5m)}>10

Recovery expression:

average
HAProxy Backend [{#BACKEND_NAME}]: Server is DOWN

HAProxy Backend [{#BACKEND_NAME}] is not available.

Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},BACKEND,status].max(#5)}=0

Recovery expression:

disaster
HAProxy Frontend [{#FRONTEND_NAME}]: Average response time is more than 10 sec for 5m

Number of request errors in last 5 minutes is more than 10. Client-side request errors could have a number of causes: client terminates before sending request read error from client client timeout client terminated connection request was tarpitted/subject to ACL Under normal conditions, it is acceptable to (infrequently) receive invalid requests from clients. However, a significant increase in the number of invalid requests received could be a sign of larger, looming issues. For example, an abnormal number of terminations or timeouts by numerous clients could mean that your application is experiencing excessive latency, causing clients to manually close their connections.

Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#FRONTEND_NAME},FRONTEND,ereq].min(5m)}>10

Recovery expression:

average
HAProxy Frontend [{#FRONTEND_NAME}]: Number of requests denied is more than 10 for 5m

Number of requests denied due to security concerns (ACL-restricted) is more than 10 in last 5 minutes. In the event of a significant increase in denials—a malicious attacker or misconfigured application could be to blame An increase in denied requests will subsequently cause an increase in 403 Forbidden codes. Correlating the two can help you discern the root cause of an increase in 4xx responses.

Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#FRONTEND_NAME},FRONTEND,dreq].min(5m)}>10

Recovery expression:

average
HAProxy Frontend [{#FRONTEND_NAME}]: Session utilization is more than 80% for 5m

For every HAProxy session, two connections are consumed—one for the client to HAProxy, and the other for HAProxy to your backend. Alerting on this metric is essential to ensure your server has sufficient capacity to handle all concurrent sessions. Unlike requests, upon reaching the session limit HAProxy will deny additional clients until resource consumption drops. Furthermore, if you find your session usage percentage to be hovering above 80%, it could be time to either modify HAProxy’s configuration to allow more sessions, or migrate your HAProxy server to a bigger box.

Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#FRONTEND_NAME},FRONTEND,sutil].min(5m)}>80

Recovery expression:

average
HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Average response time is more than 10s for 5m

Average server response time (in ms) for the last 1,024 requests is more than 10s. Tracking average response times is an effective way to measure the latency of haproxy load-balancing setup. Generally speaking, response times in excess of 500 ms will lead to degradation of application performance and customer experience. Monitoring the average response time can give you the upper hand to respond to latency issues before your customers are substantially impacted. Keep in mind that this metric will be zero if you are not using HTTP

Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},{#SERVER_NAME},rtime].min(5m)}>10s

Recovery expression:

average
HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Average time spent in queue is more than 10s for 5m

Average time spent in queue (in ms) for the last 1,024 requests is more than 10s. It is obviously that minimizing time spent in the queue results in lower latency and an overall better client experience. Each use case can tolerate a certain amount of queue time but in general, you should aim to keep this value as low as possible

Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},{#SERVER_NAME},qtime].min(5m)}>10s

Recovery expression:

average
HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Current number of requests unassigned in queue is more than 10s for 5m

Current number of requests unassigned in queue is more than 10. If your server is bombarded with connections to the point you have reached your global maxconn limit, HAProxy will seamlessly queue new connections in system kernel’s socket queue until the server becomes available. Keeping connections out of the queue is ideal, resulting in less latency and a better user experience. You should alert if the size of your queue exceeds the threshold. If you find that connections are consistently enqueueing, configuration changes may be in order, such as increasing global maxconn limit or changing the connection limits on individual backend servers. Keep in mind: empty queue = happy client

Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},{#SERVER_NAME},qcur].min(5m)}>10

Recovery expression:

average
HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Number of responses with error is more than 10s for 5m

Number of requests on server, whose responses yielded an error, is more than 10. The server error response rate represents the number of response errors generated by your servers. This includes errors caused by data transfers aborted by the servers as well as write errors on the client socket and failures due to ACLs. Combined with other error metrics, the server error response rate helps diagnose the root cause of response errors. For example, an increase in both the server error response rate and denied responses could indicate that clients are repeatedly attempting to access ACL-ed resources.

Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},{#SERVER_NAME},eresp].min(5m)}>10

Recovery expression:

average
HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Server is DOWN

Server is not available. The check directive must be enabled in HAProxy server configuration

Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},{#SERVER_NAME},status].max(#5)}=0

Recovery expression:

disaster
HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Average response time is more than 10s for 5m (LLD)

Average server response time (in ms) for the last 1,024 requests is more than 10s. Tracking average response times is an effective way to measure the latency of haproxy load-balancing setup. Generally speaking, response times in excess of 500 ms will lead to degradation of application performance and customer experience. Monitoring the average response time can give you the upper hand to respond to latency issues before your customers are substantially impacted. Keep in mind that this metric will be zero if you are not using HTTP

Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},{#SERVER_NAME},rtime].min(5m)}>10s

Recovery expression:

average
HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Average time spent in queue is more than 10s for 5m (LLD)

Average time spent in queue (in ms) for the last 1,024 requests is more than 10s. It is obviously that minimizing time spent in the queue results in lower latency and an overall better client experience. Each use case can tolerate a certain amount of queue time but in general, you should aim to keep this value as low as possible

Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},{#SERVER_NAME},qtime].min(5m)}>10s

Recovery expression:

average
HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Current number of requests unassigned in queue is more than 10s for 5m (LLD)

Current number of requests unassigned in queue is more than 10. If your server is bombarded with connections to the point you have reached your global maxconn limit, HAProxy will seamlessly queue new connections in system kernel’s socket queue until the server becomes available. Keeping connections out of the queue is ideal, resulting in less latency and a better user experience. You should alert if the size of your queue exceeds the threshold. If you find that connections are consistently enqueueing, configuration changes may be in order, such as increasing global maxconn limit or changing the connection limits on individual backend servers. Keep in mind: empty queue = happy client

Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},{#SERVER_NAME},qcur].min(5m)}>10

Recovery expression:

average
HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Number of responses with error is more than 10s for 5m (LLD)

Number of requests on server, whose responses yielded an error, is more than 10. The server error response rate represents the number of response errors generated by your servers. This includes errors caused by data transfers aborted by the servers as well as write errors on the client socket and failures due to ACLs. Combined with other error metrics, the server error response rate helps diagnose the root cause of response errors. For example, an increase in both the server error response rate and denied responses could indicate that clients are repeatedly attempting to access ACL-ed resources.

Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},{#SERVER_NAME},eresp].min(5m)}>10

Recovery expression:

average
HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Server is DOWN (LLD)

Server is not available. The check directive must be enabled in HAProxy server configuration

Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},{#SERVER_NAME},status].max(#5)}=0

Recovery expression:

disaster
HAProxy Backend [{#BACKEND_NAME}]: Average response time is more than 10 sec for 5m (LLD)

Average backend response time (in ms) for the last 1,024 requests is more than 10 seconds. Tracking average response times is an effective way to measure the latency of haproxy load-balancing setup. Generally speaking, response times in excess of 500 ms will lead to degradation of application performance and customer experience. Monitoring the average response time can give you the upper hand to respond to latency issues before your customers are substantially impacted. Keep in mind that this metric will be zero if you are not using HTTP

Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},BACKEND,rtime].min(5m)}>10s

Recovery expression:

average
HAProxy Backend [{#BACKEND_NAME}]: Average time spent in queue is more than 10 sec for 5m (LLD)

Average time spent in queue (in ms) for the last 1,024 requests is more than 10 s. It is obviously that minimizing time spent in the queue results in lower latency and an overall better client experience. Each use case can tolerate a certain amount of queue time but in general, you should aim to keep this value as low as possible

Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},BACKEND,qtime].min(5m)}>10s

Recovery expression:

average
HAProxy Backend [{#BACKEND_NAME}]: Current number of requests unassigned in queue is more than 10 for 5m (LLD)

Current number of requests on backend unassigned in queue is more than 10. If your backend is bombarded with connections to the point you have reached your global maxconn limit, HAProxy will seamlessly queue new connections in system kernel’s socket queue until a backend server becomes available. Keeping connections out of the queue is ideal, resulting in less latency and a better user experience. You should alert if the size of your queue exceeds the threshold. If you find that connections are consistently enqueueing, configuration changes may be in order, such as increasing global maxconn limit or changing the connection limits on individual backend servers. Keep in mind: empty queue = happy client

Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},BACKEND,qcur].min(5m)}>10

Recovery expression:

average
HAProxy Backend [{#BACKEND_NAME}]: Number of responses with error is more than 10 for 5m (LLD)

Number of requests on backend, whose responses yielded an error, is more than 10. The backend error response rate represents the number of response errors generated by your backends. This includes errors caused by data transfers aborted by the servers as well as write errors on the client socket and failures due to ACLs. Combined with other error metrics, the backend error response rate helps diagnose the root cause of response errors. For example, an increase in both the backend error response rate and denied responses could indicate that clients are repeatedly attempting to access ACL-ed resources.

Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},BACKEND,eresp].min(5m)}>10

Recovery expression:

average
HAProxy Backend [{#BACKEND_NAME}]: Server is DOWN (LLD)

HAProxy Backend [{#BACKEND_NAME}] is not available.

Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},BACKEND,status].max(#5)}=0

Recovery expression:

disaster
HAProxy Frontend [{#FRONTEND_NAME}]: Average response time is more than 10 sec for 5m (LLD)

Number of request errors in last 5 minutes is more than 10. Client-side request errors could have a number of causes: client terminates before sending request read error from client client timeout client terminated connection request was tarpitted/subject to ACL Under normal conditions, it is acceptable to (infrequently) receive invalid requests from clients. However, a significant increase in the number of invalid requests received could be a sign of larger, looming issues. For example, an abnormal number of terminations or timeouts by numerous clients could mean that your application is experiencing excessive latency, causing clients to manually close their connections.

Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#FRONTEND_NAME},FRONTEND,ereq].min(5m)}>10

Recovery expression:

average
HAProxy Frontend [{#FRONTEND_NAME}]: Number of requests denied is more than 10 for 5m (LLD)

Number of requests denied due to security concerns (ACL-restricted) is more than 10 in last 5 minutes. In the event of a significant increase in denials—a malicious attacker or misconfigured application could be to blame An increase in denied requests will subsequently cause an increase in 403 Forbidden codes. Correlating the two can help you discern the root cause of an increase in 4xx responses.

Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#FRONTEND_NAME},FRONTEND,dreq].min(5m)}>10

Recovery expression:

average
HAProxy Frontend [{#FRONTEND_NAME}]: Session utilization is more than 80% for 5m (LLD)

For every HAProxy session, two connections are consumed—one for the client to HAProxy, and the other for HAProxy to your backend. Alerting on this metric is essential to ensure your server has sufficient capacity to handle all concurrent sessions. Unlike requests, upon reaching the session limit HAProxy will deny additional clients until resource consumption drops. Furthermore, if you find your session usage percentage to be hovering above 80%, it could be time to either modify HAProxy’s configuration to allow more sessions, or migrate your HAProxy server to a bigger box.

Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#FRONTEND_NAME},FRONTEND,sutil].min(5m)}>80

Recovery expression:

average