How to install:
- Add in HAProxy (or make sure that you have) next rules to enable statistics on socket
vi /etc/haproxy/haproxy.cfg
stats socket /var/lib/haproxy/stats mode 666 level admin
stats timeout 30s
-
Install socat and nc: yum install nc socat -yum
-
Make sure that HAProxy user can read from socket :sudo -uhaproxy echo "show info;show stat" | socat stdio unix-connect:/var/lib/haproxy/stats
-
Copy files:
a) userparameter\_haproxy.conf
in /etc/zabbix/zabbix\_agentd.d/
b) haproxy_discovery.sh in /etc/zabbix/scripts/
c) haproxy_stats.sh in /etc/zabbix/scripts/
Make b and c scripts executable with chmod +x script_name
Note: Make sure that /etc/zabbix/scripts/ exist, if not, create it: mkdir -p /etc/zabbix/scripts/
- Add host for HAProxy in Zabbix, add template, wait some time for get data
(You can change LLD discovery time to get data more faster, but after change to initial)
This template is based on:
a) Solution by Anastas Dancha - https://github.com/anapsix/zabbix-haproxy
b) Official template from Zabbix for Zabbix > 4.4 - https://www.zabbix.com/integrations/haproxy
The reason why I create this template was to have official zabbix template logic in Zabbix under 4.4
Files are there
a) https://cloud.mail.ru/public/D2M5%2F7ZEamjnVF
b) https://drive.google.com/open?id=16xoJyWut9R\_EudcRyAf2Ui8WuPyTxw6D
Write to [email protected] if something is not clear
Have a nice day
Tudor Ticau
Name | Description | Default | Type |
---|---|---|---|
{$HAPROXY_CONFIG} | - |
/etc/haproxy/haproxy.cfg |
Text macro |
{$HAPROXY_SOCK} | - |
/var/lib/haproxy/stats |
Text macro |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
HAProxy server discovery | - |
Zabbix agent |
haproxy.list.discovery[{$HAPROXY_SOCK},SERVER] Update: 1h |
HAProxy backend discovery | - |
Zabbix agent |
haproxy.list.discovery[{$HAPROXY_SOCK},BACK] Update: 1h |
HAProxy frontend discovery | - |
Zabbix agent |
haproxy.list.discovery[{$HAPROXY_SOCK},FRONT] Update: 1d |
Name | Description | Type | Key and additional info |
---|---|---|---|
HAProxy memory used | - |
Zabbix agent |
proc.mem[haproxy] Update: 300 |
HAProxy config file checksum ($1) | - |
Zabbix agent |
vfs.file.cksum[{$HAPROXY_CONFIG}] Update: 600 |
HAProxy number of running processes | - |
Zabbix agent |
proc.num[haproxy] Update: 60 |
HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Responses denied per second | Responses denied due to security concerns (ACL-restricted). In most cases denials will originate in the frontend (e.g., a user is attempting to access an unauthorized URL). However, sometimes a request may be benign, yet the corresponding response contains sensitive information. In that case, you would want to set up an ACL to deny the offending response. Backend responses that are denied due to ACL restrictions will emit a 502 error code. With properly configured access controls on frontend, this metric should stay at or near zero. Denied responses and an increase in 5xx responses go hand-in-hand. If you are seeing a large number of 5xx responses, you should check your denied responses to shed some light on the increase in error codes |
Zabbix agent (active) |
haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},{#SERVER_NAME},dresp] Update: 60 LLD |
HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Errors connection per second | Number of requests that encountered an error attempting to connect to a backend server. Backend connection failures should be acted upon immediately. Unfortunately, the econ metric not only includes failed backend requests but additionally includes general backend errors, like a backend without an active frontend. Thankfully, correlating this metric with eresp and response codes from both frontend and backend servers will give a better idea of the causes of an increase in backend connection errors. |
Zabbix agent (active) |
haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},{#SERVER_NAME},econ] Update: 60 LLD |
HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Response errors per second | Number of requests whose responses yielded an error This represents the number of response errors generated by your backends. This includes errors caused by data transfers aborted by the servers as well as write errors on the client socket and failures due to ACLs. Combined with other error metrics, the backend error response rate helps diagnose the root cause of response errors. For example, an increase in both the backend error response rate and denied responses could indicate that clients are repeatedly attempting to access ACL-ed resources. |
Zabbix agent (active) |
haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},{#SERVER_NAME},eresp] Update: 60 LLD |
HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Number of responses with codes 4xx per second | Number of HTTP client errors per second. |
Zabbix agent (active) |
haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},{#SERVER_NAME},hrsp_4xx] Update: 60 LLD |
HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Number of responses with codes 5xx per second | Number of HTTP server errors per second. |
Zabbix agent (active) |
haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},{#SERVER_NAME},hrsp_5xx] Update: 60 LLD |
HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Unassigned requests | Current number of requests unassigned in queue. The qcur metric tracks the current number of connections awaiting assignment to a backend server. If you have enabled cookies and the listed server is unavailable, connections will be queued until the queue timeout is reached |
Zabbix agent (active) |
haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},{#SERVER_NAME},qcur] Update: 60 LLD |
HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Time in queue | Average time spent in queue (in ms) for the last 1,024 requests Minimizing time spent in the queue results in lower latency and an overall better client experience. Each use case can tolerate a certain amount of queue time but in general, you should aim to keep this value as low as possible |
Zabbix agent (active) |
haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},{#SERVER_NAME},qtime] Update: 60 LLD |
HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Responses time | Average backend response time (in ms) for the last 1,024 requests Tracking average response times is an effective way to measure the latency of your load-balancing setup. Generally speaking, response times in excess of 500 ms will lead to degradation of application performance and customer experience. Monitoring the average response time can give you the upper hand to respond to latency issues before your customers are substantially impacted. Keep in mind that this metric will be zero if you are not using HTTP (see #60) |
Zabbix agent (active) |
haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},{#SERVER_NAME},rtime] Update: 60 LLD |
HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Status | HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}] status UP = 1 DOWN = 0 |
Zabbix agent (active) |
haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},{#SERVER_NAME},status] Update: 60 LLD |
HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Redispatched requests per second | Number of times a request was redispatched to a different backend. The redispatch rate metric tracks the number of times a client connection was unable to reach its original target, and was subsequently sent to a different server. If a client holds a cookie referencing a backend server that is down, the default action is to respond to the client with a 502 status code. However, if is enabled option redispatch in haproxy.cfg, the request will be sent to any available backend server and the cookie will be ignored. |
Zabbix agent (active) |
haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},{#SERVER_NAME},wredis] Update: 60 LLD |
HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Retried connections per second | Number of times a connection was retried. Some dropped or timed-out connections are to be expected when connecting to a backend server. The retry rate represents the number of times a connection to a backend server was retried. This metric is usually non-zero under normal operating conditions. Should you begin to see more retries than usual, it is likely that other metrics will also change, including econ and eresp. Tracking the retry rate in addition to the above two error metrics can shine some light on the true cause of an increase in errors |
Zabbix agent (active) |
haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},{#SERVER_NAME},wretr] Update: 60 LLD |
HAProxy Backend [{#BACKEND_NAME}] bytes in | HAProxy Backend bytes in |
Zabbix agent (active) |
haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},BACKEND,bin] Update: 60 LLD |
HAProxy Backend [{#BACKEND_NAME}] bytes out | HAProxy Backend bytes out |
Zabbix agent (active) |
haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},BACKEND,bout] Update: 60 LLD |
HAProxy Backend [{#BACKEND_NAME}]: Responses denied per second | Responses denied due to security concerns (ACL-restricted). In most cases denials will originate in the frontend (e.g., a user is attempting to access an unauthorized URL). However, sometimes a request may be benign, yet the corresponding response contains sensitive information. In that case, you would want to set up an ACL to deny the offending response. Backend responses that are denied due to ACL restrictions will emit a 502 error code. With properly configured access controls on frontend, this metric should stay at or near zero. Denied responses and an increase in 5xx responses go hand-in-hand. If you are seeing a large number of 5xx responses, you should check your denied responses to shed some light on the increase in error codes |
Zabbix agent (active) |
haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},BACKEND,dresp] Update: 60 LLD |
HAProxy Backend [{#BACKEND_NAME}]: Errors connection per second | Number of requests that encountered an error attempting to connect to a backend server. Backend connection failures should be acted upon immediately. Unfortunately, the econ metric not only includes failed backend requests but additionally includes general backend errors, like a backend without an active frontend. Thankfully, correlating this metric with eresp and response codes from both frontend and backend servers will give a better idea of the causes of an increase in backend connection errors. |
Zabbix agent (active) |
haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},BACKEND,econ] Update: 60 LLD |
HAProxy Backend [{#BACKEND_NAME}] : Response errors per second | Number of requests whose responses yielded an error This represents the number of response errors generated by your backends. This includes errors caused by data transfers aborted by the servers as well as write errors on the client socket and failures due to ACLs. Combined with other error metrics, the backend error response rate helps diagnose the root cause of response errors. For example, an increase in both the backend error response rate and denied responses could indicate that clients are repeatedly attempting to access ACL-ed resources. |
Zabbix agent (active) |
haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},BACKEND,eresp] Update: 60 LLD |
HAProxy Backend [{#BACKEND_NAME}]: Unassigned requests | Current number of requests unassigned in queue. The qcur metric tracks the current number of connections awaiting assignment to a backend server. If you have enabled cookies and the listed server is unavailable, connections will be queued until the queue timeout is reached |
Zabbix agent (active) |
haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},BACKEND,qcur] Update: 60 LLD |
HAProxy Backend [{#BACKEND_NAME}]: Time in queue | Average time spent in queue (in ms) for the last 1,024 requests Minimizing time spent in the queue results in lower latency and an overall better client experience. Each use case can tolerate a certain amount of queue time but in general, you should aim to keep this value as low as possible |
Zabbix agent (active) |
haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},BACKEND,qtime] Update: 60 LLD |
HAProxy Backend [{#BACKEND_NAME}]: Responses time | Average backend response time (in ms) for the last 1,024 requests Tracking average response times is an effective way to measure the latency of your load-balancing setup. Generally speaking, response times in excess of 500 ms will lead to degradation of application performance and customer experience. Monitoring the average response time can give you the upper hand to respond to latency issues before your customers are substantially impacted. Keep in mind that this metric will be zero if you are not using HTTP (see #60) |
Zabbix agent (active) |
haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},BACKEND,rtime] Update: 60 LLD |
HAProxy Backend [{#BACKEND_NAME}]: Status | HAProxy Backend [{#BACKEND_NAME}] status UP = 1 DOWN = 0 |
Zabbix agent (active) |
haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},BACKEND,status] Update: 60 LLD |
HAProxy Backend [{#BACKEND_NAME}]: Redispatched requests per second | Number of times a request was redispatched to a different backend. The redispatch rate metric tracks the number of times a client connection was unable to reach its original target, and was subsequently sent to a different server. If a client holds a cookie referencing a backend server that is down, the default action is to respond to the client with a 502 status code. However, if is enabled option redispatch in haproxy.cfg, the request will be sent to any available backend server and the cookie will be ignored. |
Zabbix agent (active) |
haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},BACKEND,wredis] Update: 60 LLD |
HAProxy Backend [{#BACKEND_NAME}]: Retried connections per second | Number of times a connection was retried. Some dropped or timed-out connections are to be expected when connecting to a backend server. The retry rate represents the number of times a connection to a backend server was retried. This metric is usually non-zero under normal operating conditions. Should you begin to see more retries than usual, it is likely that other metrics will also change, including econ and eresp. Tracking the retry rate in addition to the above two error metrics can shine some light on the true cause of an increase in errors |
Zabbix agent (active) |
haproxy.stats[{$HAPROXY_SOCK},{#BACKEND_NAME},BACKEND,wretr] Update: 60 LLD |
HAProxy Frontend [{#FRONTEND_NAME}]: Incoming traffic | Number of bits received by the frontend |
Zabbix agent (active) |
haproxy.stats[{$HAPROXY_SOCK},{#FRONTEND_NAME},FRONTEND,bin] Update: 60 LLD |
HAProxy Frontend [{#FRONTEND_NAME}]: Outgoing traffic | Number of bits sent by the frontend |
Zabbix agent (active) |
haproxy.stats[{$HAPROXY_SOCK},{#FRONTEND_NAME},FRONTEND,bout] Update: 60 LLD |
HAProxy Frontend [{#FRONTEND_NAME}]: Denied requests per second | Requests denied due to security concerns (ACL-restricted) per second. An increase in denied requests will subsequently cause an increase in 403 Forbidden codes. - For tcp this is because of a matched tcp-request content rule. - For http this is because of a matched http-request or tarpit rule. Correlating the two can help to discern the root cause of an increase in 4xx responses. |
Zabbix agent (active) |
haproxy.stats[{$HAPROXY_SOCK},{#FRONTEND_NAME},FRONTEND,dreq] Update: 60 LLD |
HAProxy Frontend[{#FRONTEND_NAME}]: Request errors per second | HTTP request errors per second. The frontend request rate measures the number of requests received over the last second. Keeping an eye on peaks and drops is essential to ensure continuous service availability. In the event of a traffic spike, clients could see increases in latency or even denied connections. Some of the possible causes are: - early termination from the client, before the request has been sent. - read error from the client - client timeout - client closed connection - various bad requests from the client. - request was tarpitted. |
Zabbix agent (active) |
haproxy.stats[{$HAPROXY_SOCK},{#FRONTEND_NAME},FRONTEND,ereq] Update: 60 LLD |
HAProxy Frontend [{#FRONTEND_NAME}]: Number of responses with codes 1xx per second | Number of informational (1xx) HTTP responses per second. |
Zabbix agent (active) |
haproxy.stats[{$HAPROXY_SOCK},{#FRONTEND_NAME},FRONTEND,hrsp_1xx] Update: 60 LLD |
HAProxy Frontend [{#FRONTEND_NAME}]: Number of responses with codes 2xx per second | Number of successful HTTP responses per second. ( with 2xx code) |
Zabbix agent (active) |
haproxy.stats[{$HAPROXY_SOCK},{#FRONTEND_NAME},FRONTEND,hrsp_2xx] Update: 60 LLD |
HAProxy Frontend [{#FRONTEND_NAME}]: Number of responses with codes 3xx per second | Number of HTTP redirections per second.. ( with 3xx code) |
Zabbix agent (active) |
haproxy.stats[{$HAPROXY_SOCK},{#FRONTEND_NAME},FRONTEND,hrsp_3xx] Update: 60 LLD |
HAProxy Frontend [{#FRONTEND_NAME}]: Number of responses with codes 4xx per second | Number of HTTP client errors per second. ( with 4xx code) Ideally, all responses forwarded by HAProxy would be class 2xx codes, so an unexpected surge in the number of other code classes could be a sign of trouble. Correlating the denial metrics with the response code data can shed light on the cause of an increase in error codes. No change in denials coupled with an increase in the number of 404 responses could point to a misconfigured application or unruly client. |
Zabbix agent (active) |
haproxy.stats[{$HAPROXY_SOCK},{#FRONTEND_NAME},FRONTEND,hrsp_4xx] Update: 60 LLD |
HAProxy Frontend [{#FRONTEND_NAME}]: Number of responses with codes 5xx per second | Number of HTTP server errors per second. ( with 5xx code) Ideally, all responses forwarded by HAProxy would be class 2xx codes, so an unexpected surge in the number of other code classes could be a sign of trouble. Correlating the denial metrics with the response code data can shed light on the cause of an increase in error codes. |
Zabbix agent (active) |
haproxy.stats[{$HAPROXY_SOCK},{#FRONTEND_NAME},FRONTEND,hrsp_5xx] Update: 60 LLD |
HAProxy Frontend [{#FRONTEND_NAME}]: Number of responses with other codes per second | Number of other HTTP server errors per second. ( all other codes, no 1-5xx) |
Zabbix agent (active) |
haproxy.stats[{$HAPROXY_SOCK},{#FRONTEND_NAME},FRONTEND,hrsp_other] Update: 60 LLD |
HAProxy Frontend [{#FRONTEND_NAME}]: Sessions rate | Number of sessions created per second A significant spike in the number of sessions over a short period could cripple server operations and bring servers down |
Zabbix agent (active) |
haproxy.stats[{$HAPROXY_SOCK},{#FRONTEND_NAME},FRONTEND,rate] Update: 60 LLD |
HAProxy Frontend [{#FRONTEND_NAME}]: Requests rate | HTTP requests per second. The frontend request rate measures the number of requests received over the last second. Keeping an eye on peaks and drops is essential to ensure continuous service availability. In the event of a traffic spike, clients could see increases in latency or even denied connections. |
Zabbix agent (active) |
haproxy.stats[{$HAPROXY_SOCK},{#FRONTEND_NAME},FRONTEND,req_rate] Update: 60 LLD |
HAProxy Frontend [{#FRONTEND_NAME}]: Established sessions | The current number of established sessions. |
Zabbix agent (active) |
haproxy.stats[{$HAPROXY_SOCK},{#FRONTEND_NAME},FRONTEND,scur] Update: 60 LLD |
HAProxy Frontend [{#FRONTEND_NAME}]: Session limits | The most simultaneous sessions that are allowed, as defined by the maxconn setting in the frontend. |
Zabbix agent (active) |
haproxy.stats[{$HAPROXY_SOCK},{#FRONTEND_NAME},FRONTEND,slim] Update: 60 LLD |
HAProxy Frontend [{#FRONTEND_NAME}]: Session utilization | Percentage of sessions used (scur / slim * 100). For every HAProxy session, two connections are consumed—one for the client to HAProxy, and the other for HAProxy to your backend. Alerting on this metric is essential to ensure your server has sufficient capacity to handle all concurrent sessions. Unlike requests, upon reaching the session limit HAProxy will deny additional clients until resource consumption drops. |
Calculated |
haproxy.stats[{$HAPROXY_SOCK},{#FRONTEND_NAME},FRONTEND,sutil] Update: 60 LLD |
Name | Description | Expression | Priority |
---|---|---|---|
HAProxy Backend [{#BACKEND_NAME}]: Average response time is more than 10 sec for 5m | Average backend response time (in ms) for the last 1,024 requests is more than 10 seconds. Tracking average response times is an effective way to measure the latency of haproxy load-balancing setup. Generally speaking, response times in excess of 500 ms will lead to degradation of application performance and customer experience. Monitoring the average response time can give you the upper hand to respond to latency issues before your customers are substantially impacted. Keep in mind that this metric will be zero if you are not using HTTP |
Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},BACKEND,rtime].min(5m)}>10s Recovery expression: |
average |
HAProxy Backend [{#BACKEND_NAME}]: Average time spent in queue is more than 10 sec for 5m | Average time spent in queue (in ms) for the last 1,024 requests is more than 10 s. It is obviously that minimizing time spent in the queue results in lower latency and an overall better client experience. Each use case can tolerate a certain amount of queue time but in general, you should aim to keep this value as low as possible |
Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},BACKEND,qtime].min(5m)}>10s Recovery expression: |
average |
HAProxy Backend [{#BACKEND_NAME}]: Current number of requests unassigned in queue is more than 10 for 5m | Current number of requests on backend unassigned in queue is more than 10. If your backend is bombarded with connections to the point you have reached your global maxconn limit, HAProxy will seamlessly queue new connections in system kernel’s socket queue until a backend server becomes available. Keeping connections out of the queue is ideal, resulting in less latency and a better user experience. You should alert if the size of your queue exceeds the threshold. If you find that connections are consistently enqueueing, configuration changes may be in order, such as increasing global maxconn limit or changing the connection limits on individual backend servers. Keep in mind: empty queue = happy client |
Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},BACKEND,qcur].min(5m)}>10 Recovery expression: |
average |
HAProxy Backend [{#BACKEND_NAME}]: Number of responses with error is more than 10 for 5m | Number of requests on backend, whose responses yielded an error, is more than 10. The backend error response rate represents the number of response errors generated by your backends. This includes errors caused by data transfers aborted by the servers as well as write errors on the client socket and failures due to ACLs. Combined with other error metrics, the backend error response rate helps diagnose the root cause of response errors. For example, an increase in both the backend error response rate and denied responses could indicate that clients are repeatedly attempting to access ACL-ed resources. |
Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},BACKEND,eresp].min(5m)}>10 Recovery expression: |
average |
HAProxy Backend [{#BACKEND_NAME}]: Server is DOWN | HAProxy Backend [{#BACKEND_NAME}] is not available. |
Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},BACKEND,status].max(#5)}=0 Recovery expression: |
disaster |
HAProxy Frontend [{#FRONTEND_NAME}]: Average response time is more than 10 sec for 5m | Number of request errors in last 5 minutes is more than 10. Client-side request errors could have a number of causes: client terminates before sending request read error from client client timeout client terminated connection request was tarpitted/subject to ACL Under normal conditions, it is acceptable to (infrequently) receive invalid requests from clients. However, a significant increase in the number of invalid requests received could be a sign of larger, looming issues. For example, an abnormal number of terminations or timeouts by numerous clients could mean that your application is experiencing excessive latency, causing clients to manually close their connections. |
Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#FRONTEND_NAME},FRONTEND,ereq].min(5m)}>10 Recovery expression: |
average |
HAProxy Frontend [{#FRONTEND_NAME}]: Number of requests denied is more than 10 for 5m | Number of requests denied due to security concerns (ACL-restricted) is more than 10 in last 5 minutes. In the event of a significant increase in denials—a malicious attacker or misconfigured application could be to blame An increase in denied requests will subsequently cause an increase in 403 Forbidden codes. Correlating the two can help you discern the root cause of an increase in 4xx responses. |
Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#FRONTEND_NAME},FRONTEND,dreq].min(5m)}>10 Recovery expression: |
average |
HAProxy Frontend [{#FRONTEND_NAME}]: Session utilization is more than 80% for 5m | For every HAProxy session, two connections are consumed—one for the client to HAProxy, and the other for HAProxy to your backend. Alerting on this metric is essential to ensure your server has sufficient capacity to handle all concurrent sessions. Unlike requests, upon reaching the session limit HAProxy will deny additional clients until resource consumption drops. Furthermore, if you find your session usage percentage to be hovering above 80%, it could be time to either modify HAProxy’s configuration to allow more sessions, or migrate your HAProxy server to a bigger box. |
Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#FRONTEND_NAME},FRONTEND,sutil].min(5m)}>80 Recovery expression: |
average |
HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Average response time is more than 10s for 5m | Average server response time (in ms) for the last 1,024 requests is more than 10s. Tracking average response times is an effective way to measure the latency of haproxy load-balancing setup. Generally speaking, response times in excess of 500 ms will lead to degradation of application performance and customer experience. Monitoring the average response time can give you the upper hand to respond to latency issues before your customers are substantially impacted. Keep in mind that this metric will be zero if you are not using HTTP |
Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},{#SERVER_NAME},rtime].min(5m)}>10s Recovery expression: |
average |
HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Average time spent in queue is more than 10s for 5m | Average time spent in queue (in ms) for the last 1,024 requests is more than 10s. It is obviously that minimizing time spent in the queue results in lower latency and an overall better client experience. Each use case can tolerate a certain amount of queue time but in general, you should aim to keep this value as low as possible |
Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},{#SERVER_NAME},qtime].min(5m)}>10s Recovery expression: |
average |
HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Current number of requests unassigned in queue is more than 10s for 5m | Current number of requests unassigned in queue is more than 10. If your server is bombarded with connections to the point you have reached your global maxconn limit, HAProxy will seamlessly queue new connections in system kernel’s socket queue until the server becomes available. Keeping connections out of the queue is ideal, resulting in less latency and a better user experience. You should alert if the size of your queue exceeds the threshold. If you find that connections are consistently enqueueing, configuration changes may be in order, such as increasing global maxconn limit or changing the connection limits on individual backend servers. Keep in mind: empty queue = happy client |
Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},{#SERVER_NAME},qcur].min(5m)}>10 Recovery expression: |
average |
HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Number of responses with error is more than 10s for 5m | Number of requests on server, whose responses yielded an error, is more than 10. The server error response rate represents the number of response errors generated by your servers. This includes errors caused by data transfers aborted by the servers as well as write errors on the client socket and failures due to ACLs. Combined with other error metrics, the server error response rate helps diagnose the root cause of response errors. For example, an increase in both the server error response rate and denied responses could indicate that clients are repeatedly attempting to access ACL-ed resources. |
Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},{#SERVER_NAME},eresp].min(5m)}>10 Recovery expression: |
average |
HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Server is DOWN | Server is not available. The check directive must be enabled in HAProxy server configuration |
Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},{#SERVER_NAME},status].max(#5)}=0 Recovery expression: |
disaster |
HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Average response time is more than 10s for 5m (LLD) | Average server response time (in ms) for the last 1,024 requests is more than 10s. Tracking average response times is an effective way to measure the latency of haproxy load-balancing setup. Generally speaking, response times in excess of 500 ms will lead to degradation of application performance and customer experience. Monitoring the average response time can give you the upper hand to respond to latency issues before your customers are substantially impacted. Keep in mind that this metric will be zero if you are not using HTTP |
Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},{#SERVER_NAME},rtime].min(5m)}>10s Recovery expression: |
average |
HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Average time spent in queue is more than 10s for 5m (LLD) | Average time spent in queue (in ms) for the last 1,024 requests is more than 10s. It is obviously that minimizing time spent in the queue results in lower latency and an overall better client experience. Each use case can tolerate a certain amount of queue time but in general, you should aim to keep this value as low as possible |
Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},{#SERVER_NAME},qtime].min(5m)}>10s Recovery expression: |
average |
HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Current number of requests unassigned in queue is more than 10s for 5m (LLD) | Current number of requests unassigned in queue is more than 10. If your server is bombarded with connections to the point you have reached your global maxconn limit, HAProxy will seamlessly queue new connections in system kernel’s socket queue until the server becomes available. Keeping connections out of the queue is ideal, resulting in less latency and a better user experience. You should alert if the size of your queue exceeds the threshold. If you find that connections are consistently enqueueing, configuration changes may be in order, such as increasing global maxconn limit or changing the connection limits on individual backend servers. Keep in mind: empty queue = happy client |
Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},{#SERVER_NAME},qcur].min(5m)}>10 Recovery expression: |
average |
HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Number of responses with error is more than 10s for 5m (LLD) | Number of requests on server, whose responses yielded an error, is more than 10. The server error response rate represents the number of response errors generated by your servers. This includes errors caused by data transfers aborted by the servers as well as write errors on the client socket and failures due to ACLs. Combined with other error metrics, the server error response rate helps diagnose the root cause of response errors. For example, an increase in both the server error response rate and denied responses could indicate that clients are repeatedly attempting to access ACL-ed resources. |
Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},{#SERVER_NAME},eresp].min(5m)}>10 Recovery expression: |
average |
HAProxy Server [{#BACKEND_NAME}/{#SERVER_NAME}]: Server is DOWN (LLD) | Server is not available. The check directive must be enabled in HAProxy server configuration |
Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},{#SERVER_NAME},status].max(#5)}=0 Recovery expression: |
disaster |
HAProxy Backend [{#BACKEND_NAME}]: Average response time is more than 10 sec for 5m (LLD) | Average backend response time (in ms) for the last 1,024 requests is more than 10 seconds. Tracking average response times is an effective way to measure the latency of haproxy load-balancing setup. Generally speaking, response times in excess of 500 ms will lead to degradation of application performance and customer experience. Monitoring the average response time can give you the upper hand to respond to latency issues before your customers are substantially impacted. Keep in mind that this metric will be zero if you are not using HTTP |
Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},BACKEND,rtime].min(5m)}>10s Recovery expression: |
average |
HAProxy Backend [{#BACKEND_NAME}]: Average time spent in queue is more than 10 sec for 5m (LLD) | Average time spent in queue (in ms) for the last 1,024 requests is more than 10 s. It is obviously that minimizing time spent in the queue results in lower latency and an overall better client experience. Each use case can tolerate a certain amount of queue time but in general, you should aim to keep this value as low as possible |
Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},BACKEND,qtime].min(5m)}>10s Recovery expression: |
average |
HAProxy Backend [{#BACKEND_NAME}]: Current number of requests unassigned in queue is more than 10 for 5m (LLD) | Current number of requests on backend unassigned in queue is more than 10. If your backend is bombarded with connections to the point you have reached your global maxconn limit, HAProxy will seamlessly queue new connections in system kernel’s socket queue until a backend server becomes available. Keeping connections out of the queue is ideal, resulting in less latency and a better user experience. You should alert if the size of your queue exceeds the threshold. If you find that connections are consistently enqueueing, configuration changes may be in order, such as increasing global maxconn limit or changing the connection limits on individual backend servers. Keep in mind: empty queue = happy client |
Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},BACKEND,qcur].min(5m)}>10 Recovery expression: |
average |
HAProxy Backend [{#BACKEND_NAME}]: Number of responses with error is more than 10 for 5m (LLD) | Number of requests on backend, whose responses yielded an error, is more than 10. The backend error response rate represents the number of response errors generated by your backends. This includes errors caused by data transfers aborted by the servers as well as write errors on the client socket and failures due to ACLs. Combined with other error metrics, the backend error response rate helps diagnose the root cause of response errors. For example, an increase in both the backend error response rate and denied responses could indicate that clients are repeatedly attempting to access ACL-ed resources. |
Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},BACKEND,eresp].min(5m)}>10 Recovery expression: |
average |
HAProxy Backend [{#BACKEND_NAME}]: Server is DOWN (LLD) | HAProxy Backend [{#BACKEND_NAME}] is not available. |
Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#BACKEND_NAME},BACKEND,status].max(#5)}=0 Recovery expression: |
disaster |
HAProxy Frontend [{#FRONTEND_NAME}]: Average response time is more than 10 sec for 5m (LLD) | Number of request errors in last 5 minutes is more than 10. Client-side request errors could have a number of causes: client terminates before sending request read error from client client timeout client terminated connection request was tarpitted/subject to ACL Under normal conditions, it is acceptable to (infrequently) receive invalid requests from clients. However, a significant increase in the number of invalid requests received could be a sign of larger, looming issues. For example, an abnormal number of terminations or timeouts by numerous clients could mean that your application is experiencing excessive latency, causing clients to manually close their connections. |
Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#FRONTEND_NAME},FRONTEND,ereq].min(5m)}>10 Recovery expression: |
average |
HAProxy Frontend [{#FRONTEND_NAME}]: Number of requests denied is more than 10 for 5m (LLD) | Number of requests denied due to security concerns (ACL-restricted) is more than 10 in last 5 minutes. In the event of a significant increase in denials—a malicious attacker or misconfigured application could be to blame An increase in denied requests will subsequently cause an increase in 403 Forbidden codes. Correlating the two can help you discern the root cause of an increase in 4xx responses. |
Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#FRONTEND_NAME},FRONTEND,dreq].min(5m)}>10 Recovery expression: |
average |
HAProxy Frontend [{#FRONTEND_NAME}]: Session utilization is more than 80% for 5m (LLD) | For every HAProxy session, two connections are consumed—one for the client to HAProxy, and the other for HAProxy to your backend. Alerting on this metric is essential to ensure your server has sufficient capacity to handle all concurrent sessions. Unlike requests, upon reaching the session limit HAProxy will deny additional clients until resource consumption drops. Furthermore, if you find your session usage percentage to be hovering above 80%, it could be time to either modify HAProxy’s configuration to allow more sessions, or migrate your HAProxy server to a bigger box. |
Expression: {HAProxy:haproxy.stats[/var/lib/haproxy/stats,{#FRONTEND_NAME},FRONTEND,sutil].min(5m)}>80 Recovery expression: |
average |