diff --git a/docs/Manual.md b/docs/Manual.md
index 09403484..9aa81bde 100644
--- a/docs/Manual.md
+++ b/docs/Manual.md
@@ -4,71 +4,601 @@ EaseProbe is a simple, standalone, and lightweight tool that can do health/statu
![](./overview.png)
+EaseProbe has the following major modules:
+
+- **Probe**: It is used to check the health of the service.
+- **Notification**: It is used to send the Probe notification.
+- **Channel**: It is used to connect the probe and the notification.
+- **Report**: It is used to generate the SLA report for all probe.
+- **Metrics**: It is used to export the metrics data to Prometheus.
+
+
Outline
-- [1. Probe](#1-probe)
- - [1.1 Overview](#11-overview)
- - [1.2 Initial Fire Up](#12-initial-fire-up)
-- [2. Notification](#2-notification)
- - [2.1 Slack](#21-slack)
- - [2.2 Discord](#22-discord)
- - [2.3 Telegram](#23-telegram)
- - [2.4 Teams](#24-teams)
- - [2.5 Email](#25-email)
- - [2.6 AWS SNS](#26-aws-sns)
- - [2.7 WeChat Work](#27-wechat-work)
- - [2.8 DingTalk](#28-dingtalk)
- - [2.9 Lark](#29-lark)
- - [2.10 SMS](#210-sms)
- - [2.11 Log](#211-log)
- - [2.12 Shell](#212-shell)
- - [2.13 RingCentral](#213-ringcentral)
-- [3. Report](#3-report)
- - [3.1 SLA Report Notification](#31-sla-report-notification)
- - [3.2 SLA Live Report](#32-sla-live-report)
- - [3.3 SLA Data Persistence](#33-sla-data-persistence)
-- [4. Channel](#4-channel)
- - [4.1 Overview](#41-overview)
- - [4.2 Examples](#42-examples)
-- [5. Administration](#5-administration)
- - [5.1 PID file](#51-pid-file)
- - [5.2 Log file Rotation](#52-log-file-rotation)
-- [6. Prometheus Metrics Exporter](#6-prometheus-metrics-exporter)
-- [7. Configuration](#7-configuration)
- - [7.1 HTTP Probe Configuration](#71-http-probe-configuration)
- - [7.1.1 Basic HTTP Configuration](#711-basic-http-configuration)
- - [7.1.2 Complete HTTP Configuration](#712-complete-http-configuration)
- - [7.1.3 Expression Evaluation](#713-expression-evaluation)
- - [7.2 TCP Probe Configuration](#72-tcp-probe-configuration)
- - [7.3 Ping Probe Configuration](#73-ping-probe-configuration)
- - [7.4 Shell Command Probe Configuration](#74-shell-command-probe-configuration)
- - [7.5 SSH Command Probe Configuration](#75-ssh-command-probe-configuration)
- - [7.6 TLS Probe Configuration](#76-tls-probe-configuration)
- - [7.7 Host Resource Usage Probe Configuration](#77-host-resource-usage-probe-configuration)
- - [7.8 Native Client Probe Configuration](#78-native-client-probe-configuration)
- - [7.9 Notification Configuration](#79-notification-configuration)
- - [7.10 Global Setting Configuration](#710-global-setting-configuration)
-- [8. Tools](#8-tools)
- - [8.1 EaseProbe JSON Schema](#81-easeprobe-json-schema)
- - [8.2 EaseProbe Deployment](#82-easeprobe-deployment)
+- [1. Probe](#1-probe)
+ - [1.1 Overview](#11-overview)
+ - [1.1.1 General Settings](#111-general-settings)
+ - [1.1.2 Initial Fire Up](#112-initial-fire-up)
+ - [1.2 HTTP](#12-http)
+ - [1.2.1 Basic Configuration](#121-basic-configuration)
+ - [1.2.2 Complete Configuration](#122-complete-configuration)
+ - [1.2.3 Expression Evaluation](#123-expression-evaluation)
+ - [1.3 TCP](#13-tcp)
+ - [1.4 Ping](#14-ping)
+ - [1.5 Shell](#15-shell)
+ - [1.6 SSH](#16-ssh)
+ - [1.7 TLS](#17-tls)
+ - [1.8 Host](#18-host)
+ - [1.9 Native Client](#19-native-client)
+ - [1.9.1 Redis](#191-redis)
+ - [1.9.2 Memcache](#192-memcache)
+ - [1.9.3 MongoDB](#193-mongodb)
+ - [1.9.4 Memcache](#194-memcache)
+ - [1.9.5 Kafka](#195-kafka)
+ - [1.9.6 PostgreSQL](#196-postgresql)
+ - [1.9.7 Zookeeper](#197-zookeeper)
+- [2. Notification](#2-notification)
+ - [2.1 Slack](#21-slack)
+ - [2.2 Discord](#22-discord)
+ - [2.3 Telegram](#23-telegram)
+ - [2.4 Teams](#24-teams)
+ - [2.5 Email](#25-email)
+ - [2.6 AWS SNS](#26-aws-sns)
+ - [2.7 WeChat Work](#27-wechat-work)
+ - [2.8 DingTalk](#28-dingtalk)
+ - [2.9 Lark](#29-lark)
+ - [2.10 SMS](#210-sms)
+ - [2.11 Log](#211-log)
+ - [2.12 Shell](#212-shell)
+ - [2.13 RingCentral](#213-ringcentral)
+- [3. Report](#3-report)
+ - [3.1 SLA Report Notification](#31-sla-report-notification)
+ - [3.2 SLA Live Report](#32-sla-live-report)
+ - [3.3 SLA Data Persistence](#33-sla-data-persistence)
+- [4. Channel](#4-channel)
+ - [4.1 Overview](#41-overview)
+ - [4.2 Examples](#42-examples)
+- [5. Administration](#5-administration)
+ - [5.1 PID file](#51-pid-file)
+ - [5.2 Log file Rotation](#52-log-file-rotation)
+- [6. Prometheus Metrics Exporter](#6-prometheus-metrics-exporter)
+ - [6.1 General Metrics](#61-general-metrics)
+ - [6.2 HTTP Probe](#62-http-probe)
+ - [6.3 Ping Probe](#63-ping-probe)
+ - [6.4 TLS Probe](#64-tls-probe)
+ - [6.5 Shell \& SSH Probe](#65-shell--ssh-probe)
+ - [6.6 Host Probe](#66--host-probe)
+- [7. Configuration](#7-configuration)
+ - [7.1 Probe Configuration](#71-probe-configuration)
+ - [7.2 Notification Configuration](#72-notification-configuration)
+ - [7.3 Global Setting Configuration](#73-global-setting-configuration)
+- [8. Tools](#8-tools)
+ - [8.1 EaseProbe JSON Schema](#81-easeprobe-json-schema)
+ - [8.2 EaseProbe Deployment](#82-easeprobe-deployment)
+
+
+
+
+# 1. Probe
+
+EaseProbe supports these probing methods: **HTTP**, **TCP**, **TLS**, **Shell Command**, **SSH Command**, **Host Resource Usage**, and **Native Client**.
+
+## 1.1 Overview
+
+Each probe is identified by the method it supports (eg `http`), a unique name (across all probes in the configuration file) and the method specific parameters. For example:
+
+```yaml
+http:
+ - name: Web Service
+ url: http://example.com:1080
+```
+> **Note**:
+>
+> **Probe name must be unique.** if multiple probes using the same name then this could lead to corruption of the metrics data and/or the behavior of the application in non-deterministic way.
+
+### 1.1.1 General Settings
+
+All probes support the `timeout`, `interval`, `failure`, and `success` optional configuration parameters. For example:
+
+```yaml
+tcp:
+ - name: Web Service
+ host: example.com:1080
+ timeout: 10s # the time out for all probes, default is 30 seconds
+ interval: 2m # probe every minute for all probes, default is 60 seconds
+ failure: 2 # number of consecutive failed probes needed to determine the status down, default: 1
+ success: 1 # number of consecutive successful probes needed to determine the status up, default: 1
+```
+
+We can configure the general probe settings for all probes.
+
+The following configuration is effective for all probe, unless the probe has its own configuration.
+
+```yaml
+settings:
+ probe:
+ timeout: 30s # the time out for all probes, default is 30 seconds
+ interval: 1m # probe every minute for all probes, default is 60 seconds
+ failure: 2 # number of consecutive failed probes needed to determine the status down, default: 1
+ success: 1 # number of consecutive successful probes needed to determine the status up, default: 1
+```
+
+### 1.1.2 Initial Fire Up
+
+On application startup, the configured probes are scheduled for their initial fire up based on the following criteria:
+
+- Less than or equal to 60 total probers exist: the delay between initial prober fire-up is `1 second`
+- More than 60 total probers exist: the startup is scheduled based on the following equation `timeGap = DefaultProbeInterval / numProbes`
+
+
+## 1.2 HTTP
+
+HTTP probe using `http` identifier, it has the following features:
+
+- Checking the HTTP status code range
+- Support the response body check - JSON/XML/HTML Path checking
+- Support mTLS and HTTP Basic Auth
+- Can set the customized Request Header/Body.
+- Support the HTTP Proxy
+
+Setting the environment variables `$HTTP_PROXY` & `$HTTPS_PROXY` allows for configuring the proxy settings for all HTTP related probe (or you can set the `proxy` field in the probe configuration for specific probe).
+
+```shell
+export HTTPS_PROXY=socks5://127.0.0.1:1080
+```
+
+### 1.2.1 Basic Configuration
+
+the following example configuration is a basic HTTP probe configuration, which only have `name` and `url`, the `name` is the probe name, and the `url` is the HTTP URL to be checked. If the HTTP status code is in `[0, 499]`, then the probe is considered as `UP`.
+
+```YAML
+# HTTP Probe Configuration
+
+http:
+ # A Website
+ - name: MegaEase Website (Global)
+ url: https://megaease.com
+
+ # Some of the Software support the HTTP Query
+ - name: ElasticSearch
+ url: http://elasticsearch.server:9200
+ - name: Eureka
+ url: http://eureka.server:8761
+ - name: Prometheus
+ url: http://prometheus:9090/graph
+
+ # Spring Boot Application with Actuator Heath API
+ - name: EaseService-Governance
+ url: http://easeservice-mgmt-governance:38012/actuator/health
+ - name: EaseService-Control
+ url: http://easeservice-mgmt-control:38013/actuator/health
+ - name: EaseService-Mesh
+ url: http://easeservice-mgmt-mesh:38013/actuator/health
+```
+
+### 1.2.2 Complete Configuration
+
+The following example configuration is a complete HTTP probe configuration, which has all the supported configuration parameters.
+
+```yaml
+http:
+ # A completed HTTP Probe configuration
+ - name: Special Website
+ url: https://megaease.cn
+ # Proxy setting, support sock5, http, https, for example:
+ # proxy: http://proxy.server:8080
+ # proxy: socks5://localhost:1085
+ # proxy: https://user:password@proxy.example.com:443
+ # Also support `HTTP_PROXY` & `HTTPS_PROXY` environment variables
+ proxy: http://proxy.server:8080
+ # Request Method
+ method: GET
+ # Request Header
+ headers:
+ User-Agent: Customized User-Agent # default: "MegaEase EaseProbe / v1.6.0"
+ X-head-one: xxxxxx
+ X-head-two: yyyyyy
+ X-head-THREE: zzzzzzX-
+ content_encoding: text/json
+ # Request Body
+ body: '{ "FirstName": "Mega", "LastName" : "Ease", "UserName" : "megaease", "Email" : "user@example.com"}'
+ # HTTP Basic Auth
+ username: username
+ password: password
+ # mTLS
+ ca: /path/to/file.ca
+ cert: /path/to/file.crt
+ key: /path/to/file.key
+ # TLS
+ insecure: true # skip any security checks, useful for self-signed and expired certs. default: false
+ # HTTP successful response code range, default is [0, 499].
+ success_code:
+ - [200,206] # the code >=200 and <= 206
+ - [300,308] # the code >=300 and <= 308
+ # Response Checking
+ contain: "success" # response body must contain this string, if not the probe is considered failed.
+ not_contain: "failure" # response body must NOT contain this string, if it does the probe is considered failed.
+ regex: false # if true, the contain and not_contain will be treated as regular expression. default: false
+ eval: # eval is a expression evaluation for HTTP response message
+ doc: XML # support XML, JSON, HTML, TEXT.
+ expression: "x_time('//feed/updated') > '2022-07-01'" # the expression to evaluate.
+ # configuration
+ timeout: 10s # default is 30 seconds
+```
+
+> **Note**:
+>
+> The Regular Expression supported refer to https://github.com/google/re2/wiki/Syntax
+> The XPath only supported 1.0/2.0) syntax. refer to https://www.w3.org/TR/xpath/, and the library is https://github.com/antchfx/xpath
+
+### 1.2.3 Expression Evaluation
+
+HTTP Probe supports two type of expression evaluation.
+
+- `XML`, `JSON`, `HTML `: using the **XPath(1.0/2.0)** to extract the value
+- `TEXT` : using the **Regression Expression** to extract the value
+
+And the configuration can be two types as below:
+
+**1) Variable Definition**
+
+```yaml
+ eval:
+ doc: XML # support XML, JSON, HTML, TEXT
+ expression: "updated > '2022-07-01'"
+ variables: # variables definition
+ - name: updated # variable name
+ type: time # variable type, support `int`, `float`, `bool`, `time` and `duration`.
+ query: "//feed/updated" # the XPath query to get the variable value.
+```
+
+**2) Build-in XPath function Expression Evaluation**
+
+you can just use the XPath build-in function in expression so simplify the configuration.
+
+```yaml
+ eval:
+ doc: XML # support XML, JSON, HTML, TEXT.
+ expression: "x_time('//feed/updated') > '2022-07-01'" # the expression to evaluate.
+```
+
+Currently, EaseProbe supports the following XPath functions:
+- `x_str` - get the string value from the XPath/RegExp query result.
+- `x_int` - get the integer value from the XPath/RegExp query result.
+- `x_float` - get the float value from the XPath/RegExp query result.
+- `x_time` - get the time value from the XPath/RegExp query result.
+- `x_duration` - get the duration value from the XPath/RegExp query result.
+
+**3) Build-in Functions**
+
+Currently, EaseProbe supports the following build-in functions:
+
+- `strlen` - get the string length.
+- `now` - get the current time.
+- `duration` - get the duration value.
+
+
+For examples:
+
+check the `time` from response is 5 seconds later than the current time.
+
+```yaml
+eval:
+ doc: HTML
+ expression: "now() - x_time('//div[@id=\\'time\\']') > 5"
+```
+
+
+Check the duration from response is less than 1 second.
+
+```yaml
+eval:
+ doc: HTML
+ expression: "duration(rt) < duration('1s')"
+ variables:
+ - name: rt # variable name `rt` will be used in expression.
+ type: duration # variable type is `duration`
+ query: "//div[@id=\\'time\\']" # the XPath query the value.
+```
+Or
+
+```yaml
+eval:
+ doc: HTML
+ expression: "x_duration('//div[@id=\\'resp_time\\']') < duration('1s')"
+```
+
+
+**4) XPath Syntax Example**
+
+Considering we have the following response:
+
+```json
+{
+ "company": {
+ "name": "MegaEase",
+ "person": [{
+ "name": "Bob",
+ "email": "bob@example.com",
+ "age": 35,
+ "salary": 35000.12,
+ "birth": "1984-10-12",
+ "work": "40h",
+ "fulltime": true
+ },
+ {
+ "name": "Alice",
+ "email": "alice@example.com",
+ "age": 25,
+ "salary": 25000.12,
+ "birth": "1985-10-12",
+ "work": "30h",
+ "fulltime": false
+ }
+ ]
+ }
+}
+```
+Then, the extraction syntax as below:
+
+```
+"//name" ==> "MegaEase"
+"//company/name" ==> "MegaEase"
+"//email" ==> "bob@example.com"
+"//company/person/*[1]/name" ==> "Bob"
+"//company/person/*[2]/emai ==> "alice@example.com"
+"//company/person/*[last()]/name" ==> "Alice"
+"//company/person/*[last()]/age" ==> "25"
+"//company/person/*[salary=25000.12]/salary" ==> "25000.12"
+"//company/person/*[name='Bob']/birth" ==> "1984-10-12"
+"//company/person/*[name='Alice']/work" ==> "30h"
+"//*/email[contains(.,'bob')]" ==> "bob@example.com"
+"//work", ==> "40h"
+"//person/*[2]/fulltime" ==> "false"
+```
+
+**5) Regression Expression Syntax Examples**
+
+Considering we have the following response:
+
+`name: Bob, email: bob@example.com, age: 35, salary: 35000.12, birth: 1984-10-12, work: 40h, fulltime: true`
+
+Then, the extraction syntax as below:
+
+```
+"name: (?P[a-zA-Z0-9 ]*)" ==> "Bob"
+"email: (?P[a-zA-Z0-9@.]*)" ==> "bob@example.com"
+"age: (?P[0-9]*)" ==> "35"
+"age: (?P\\d+)" ==> "35"
+"salary: (?P[0-9.]*)" ==> "35000.12"
+"salary: (?P\\d+\\.\\d+)" ==> "35000.12"
+"birth: (?P[0-9-]*)" ==> "1984-10-12"
+"birth: (?P\\d{4}-\\d{2}-\\d{2})" ==> "1984-10-12"
+"work: (?P\\d+[hms])" ==> "40h"
+"fulltime: (?Ptrue|false)" ==> "true"
+```
+> Notes
+>
+> Checking the unit test case in [`eval`](./eval/) package you can find more examples.
+
+## 1.3 TCP
+
+TCP probe just simply check whether the TCP connection can be established or not.
+
+The following is the configuration example, which has two TCP probes:
+- **SSH Service**, it will check the TCP connection to `example.com:22` every 2 minutes with 10 second timeout via the proxy `socks5://proxy.server:1080`.
+- **Kafka**, which will check the TCP connection to `kafka.server:9093` by using the default interval and timeout settings.
+
+```YAML
+# TCP Probe Configuration
+tcp:
+ - name: SSH Service
+ host: example.com:22
+ timeout: 10s # default is 30 seconds
+ interval: 2m # default is 60 seconds
+ proxy: socks5://proxy.server:1080 # Optional. Only support socks5.
+ # Also support the `ALL_PROXY` environment.
+ - name: Kafka
+ host: kafka.server:9093
+```
+
+## 1.4 Ping
+
+Ping probe uses `ping` identifier, it just simply check whether can be pinged or not.
+
+The following is the configuration example.
+
+```YAML
+ping:
+ - name: Localhost
+ host: 127.0.0.1
+ count: 5 # number of packets to send, default: 3
+ lost: 0.2 # 20% lost percentage threshold, mark it down if the loss is greater than this, default: 0
+ privileged: true # if true, the ping will be executed with icmp, otherwise use udp, default: false (Note: On Windows platform, this must be set to True)
+ timeout: 10s # default is 30 seconds
+ interval: 2m # default is 60 seconds
+```
+
+## 1.5 Shell
+
+The shell command probe uses `shell` identifier, it is used to execute a shell command and check the output.
+
+The following example shows how to configure the shell command probe.
+
+```YAML
+# Shell Probe Configuration
+shell:
+ # A proxy curl shell script
+ - name: Google Service
+ cmd: "./resources/probe/scripts/proxy.curl.sh"
+ args:
+ - "socks5://127.0.0.1:1085"
+ - "www.google.com"
+
+ # run redis-cli ping and check the "PONG"
+ - name: Redis (Local)
+ cmd: "redis-cli"
+ args:
+ - "-h"
+ - "127.0.0.1"
+ - "ping"
+ clean_env: true # Do not pass the OS environment variables to the command
+ # default: false
+ env:
+ # set the `REDISCLI_AUTH` environment variable for redis password
+ - "REDISCLI_AUTH=abc123"
+ # check the command output, if does not contain the PONG, mark the status down
+ contain : "PONG"
+ not_contain: "failure" # response body must NOT contain this string, if it does the probe is considered failed.
+ regex: false # if true, the `contain` and `not_contain` will be treated as regular expression. default: false
+
+ # Run Zookeeper command `stat` to check the zookeeper status
+ - name: Zookeeper (Local)
+ cmd: "/bin/sh"
+ args:
+ - "-c"
+ - "echo stat | nc 127.0.0.1 2181"
+ contain: "Mode:"
+```
+
+> **Note**:
+>
+> The Regular Expression supported refer to https://github.com/google/re2/wiki/Syntax
+
+## 1.6 SSH
+
+SSH probe uses `ssh` identifier, it is similar to Shell probe.
+- Support Password and Private key authentication.
+- Support the Bastion host tunnel.
+
+The `host` supports the following configuration
+- `example.com`
+- `example.com:22`
+- `user@example.com:22`
+
+The following are examples of SSH probe configuration.
+
+```YAML
+# SSH Probe Configuration
+ssh:
+ # SSH bastion host configuration
+ bastion:
+ aws: # bastion host ID ◄──────────────────────────────┐
+ host: aws.basition.com:22 # │
+ username: ubuntu # login user │
+ key: /path/to/aws/basion/key.pem # private key file │
+ gcp: # bastion host ID │
+ host: ubuntu@gcp.basition.com:22 # bastion host │
+ key: /path/to/gcp/basion/key.pem # private key file │
+ # SSH Probe configuration │
+ servers: # │
+ # run redis-cli ping and check the "PONG" │
+ - name: Redis (AWS) # Name │
+ bastion: aws # bastion host id ------------------------─┘
+ host: 172.20.2.202:22
+ username: ubuntu # SSH Login username
+ password: xxxxx # SSH Login password
+ key: /path/to/private.key # SSH login private file
+ passphrase: xxxxxxx # PrivateKey password
+ cmd: "redis-cli"
+ args:
+ - "-h"
+ - "127.0.0.1"
+ - "ping"
+ env:
+ # set the `REDISCLI_AUTH` environment variable for redis password
+ - "REDISCLI_AUTH=abc123"
+ # check the command output, if does not contain the PONG, mark the status down
+ contain : "PONG"
+ not_contain: "failure" # response body must NOT contain this string, if it does the probe is considered failed.
+ regex: false # if true, the contain and not_contain will be treated as regular expression. default: false
+
+ # Check the process status of `Kafka`
+ - name: Kafka (GCP)
+ bastion: gcp # ◄------ bastion host id
+ host: 172.10.1.100:22
+ username: ubuntu
+ key: /path/to/private.key
+ cmd: "ps -ef | grep kafka"
+```
+> **Note**:
+>
+> The Regular Expression supported refer to https://github.com/google/re2/wiki/Syntax
+## 1.7 TLS
+TLS probe uses `tls` identifier, it pings to remote endpoint, can probe for revoked or expired certificates
-# 1. Probe
+```YAML
+tls:
+ - name: expired test
+ host: expired.badssl.com:443
+ proxy: socks5://proxy.server:1080 # Optional. Only support socks5.
+ # Also support the `ALL_PROXY` environment.
+ insecure_skip_verify: true # don't check cert validity
+ expire_skip_verify: true # don't check cert expire date
+ alert_expire_before: 168h # alert if cert expire date is before X, the value is a Duration,
+ # see https://pkg.go.dev/time#ParseDuration. example: 1h, 1m, 1s.
+ # expire_skip_verify must be false to use this feature.
+ # root_ca_pem_path: /path/to/root/ca.pem # ignore if root_ca_pem is present
+ # root_ca_pem: |
+ # -----BEGIN CERTIFICATE-----
+ - name: untrust test
+ host: untrusted-root.badssl.com:443
+ # insecure_skip_verify: true # don't check cert validity
+ # expire_skip_verify: true # don't check cert expire date
+ # root_ca_pem_path: /path/to/root/ca.pem # ignore if root_ca_pem is present
+ # root_ca_pem: |
+ # -----BEGIN CERTIFICATE-----
+```
-## 1.1 Overview
+## 1.8 Host
+
+The host probe uses `host` identifier, it allows for collecting information and alerting when certain resource utilization thresholds are exceeded.
+
+The resources currently monitored include CPU, memory and disk utilization. The probe status is considered as `down` when any value exceeds its defined threshold.
+
+> **Note**:
+> - The remote system to be monitored needs to have the following commands installed and available: `top`, `df`, `free`, `awk`, `grep`, `tr`, `cat` and `hostname`.
+> - The disk usage check is limited to the root filesystem only with the following command `df -h /`.
+> - The actual load would be divided by cpu core number, the threshold won't consider the cpu core number (requires proc filesystem support).
+
+```yaml
+host:
+ bastion: # bastion server configuration
+ aws: # bastion host ID ◄──────────────────┐
+ host: ubuntu@example.com # bastion host │
+ key: /path/to/bastion.pem # private key file │
+ # Servers List │
+ servers: # │
+ - name : aws server # │
+ bastion: aws # <-- bastion server id ------─┘
+ host: ubuntu@172.20.2.202:22
+ key: /path/to/server.pem
+ disks: # [optional] Check multiple disks. if not present, only check `/` by default
+ - /
+ - /data
+ threshold:
+ cpu: 0.80 # cpu usage 80%
+ mem: 0.70 # memory usage 70%
+ disk: 0.90 # disk usage 90%
+ load: # load average - Note: the actual load would be divided by cpu core number, the threshold won't consider the cpu core number.
+ m1: 0.5 # 1 minute load average 0.5 (default: 0.8)
+ m5: 0.9 # 5 minute load average 0.9 (default: 0.8)
+ m15: 0.9 # 15 minute load average 0.9 (default: 0.8)
+
+ # Using the default threshold
+ # cpu 80%, mem 80%, disk 95% and 0.8 load average
+ - name : My VPS
+ host: user@example.com:22
+ key: /Users/user/.ssh/id_rsa
+```
-EaseProbe supports the following probing methods: **HTTP**, **TCP**, **TLS**, **Shell Command**, **SSH Command**, **Host Resource Usage**, and **Native Client**.
-
-Each probe is identified by the method it supports (eg `http`), a unique name (across all probes in the configuration file) and the method specific parameters.
-- **HTTP**. Checking the HTTP status code, Support mTLS, HTTP Basic Auth, and can set the Request Header/Body. ( [HTTP Probe Configuration](#71-http-probe-configuration) )
-- **TCP**. Just simply check whether the TCP connection can be established or not. ( [TCP Probe Configuration](#72-tcp-probe-configuration) )
-- **Ping**. Just simply check whether can be pinged or not. ( [Ping Probe Configuration](#73-ping-probe-configuration) )
-- **Shell**. Run a Shell command and check the result. ( [Shell Command Probe Configuration](#74-shell-command-probe-configuration) )
-- **SSH**. Run a remote command via SSH and check the result. Support the bastion/jump server ([SSH Command Probe Configuration](#75-ssh-command-probe-configuration))
-- **TLS**. Ping the remote endpoint, can probe for revoked or expired certificates ( [TLS Probe Configuration](#76-tls-probe-configuration) )
-- **Host**. Run an SSH command on a remote host and check the CPU, Memory, and Disk usage. ( [Host Load Probe](#77-host-resource-usage-probe-configuration) )
-- **Client**. Currently, support the following native client. Support the mTLS. ( refer to: [Native Client Probe Configuration](#78-native-client-probe-configuration) )
+## 1.9 Native Client
+
+Native Client probe uses `client` identifier, it uses the native GO SDK to communicate with the remote endpoints. Additionally to simple connectivity checks, you can also define key and data validity checks for EaseProbe, it will query for the given keys and verify the data stored on each service.
+
+Currently, support the following native client
- **MySQL**. Connect to the MySQL server and run the `SHOW STATUS` SQL.
- **Redis**. Connect to the Redis server and run the `PING` command.
- **Memcache**. Connect to a Memcache server and run the `version` command or check based on key/value checks.
@@ -77,21 +607,130 @@ Each probe is identified by the method it supports (eg `http`), a unique name (a
- **PostgreSQL**. Connect to PostgreSQL server and run `SELECT 1` SQL.
- **Zookeeper**. Connect to Zookeeper server and run `get /` command.
- Most of the clients support the additional validity check of data pulled from the service (such as checking a redis or memcache key for specific values). Check the documentation of the corresponding client for details on how to enable.
+The following is an example for all native client probe configuration:
-## 1.2 Initial Fire Up
+### 1.9.1 Redis
-On application startup, the configured probes are scheduled for their initial fire up based on the following criteria:
+```YAML
+# Native Client Probe
+client:
+ - name: Redis Native Client (local)
+ driver: "redis" # driver is redis
+ host: "localhost:6379" # server and port
+ password: "abc123" # password
+ data: # Optional
+ key: val # Check that `key` exists and its value is `val`
+ # mTLS - Optional
+ ca: /path/to/file.ca
+ cert: /path/to/file.crt
+ key: /path/to/file.key
-- Less than or equal to 60 total probers exist: the delay between initial prober fire-up is `1 second`
-- More than 60 total probers exist: the startup is scheduled based on the following equation `timeGap = DefaultProbeInterval / numProbes`
+```
+
+### 1.9.2 Memcache
+
+```YAML
+client:
+ - name: MySQL Native Client (local)
+ driver: "mysql"
+ host: "localhost:3306"
+ username: "root"
+ password: "pass"
+ data: # Optional, check the specific column value in the table
+ # Usage: "database:table:column:primary_key:value" : "expected_value"
+ # transfer to : "SELECT column FROM database.table WHERE primary_key = value"
+ # the `value` for `primary_key` must be int
+ "test:product:name:id:1" : "EaseProbe" # select name from test.product where id = 1
+ "test:employee:age:id:2" : 45 # select age from test.employee where id = 2
+ # mTLS - Optional
+ ca: /path/to/file.ca
+ cert: /path/to/file.crt
+ key: /path/to/file.key
+```
+
+### 1.9.3 MongoDB
+
+```YAML
+client:
+ - name: MongoDB Native Client (local)
+ driver: "mongo"
+ host: "localhost:27017"
+ username: "admin"
+ password: "abc123"
+ timeout: 5s
+ data: # Optional, find the specific value in the table
+ # Usage: "database:collection" : "{JSON}"
+ "test:employee" : '{"name":"Hao Chen"}' # find the employee with name "Hao Chen"
+ "test:product" : '{"name":"EaseProbe"}' # find the product with name "EaseProbe"
+```
+
+### 1.9.4 Memcache
+
+```YAML
+client:
+ - name: Memcache Native Client (local)
+ driver: "memcache"
+ host: "localhost:11211"
+ timeout: 5s
+ data: # Optional
+ key: val # Check that key exists and its value is val
+ "namespace:key": val # Namespaced keys enclosed in "
+```
+
+### 1.9.5 Kafka
+
+```YAML
+client:
+ - name: Kafka Native Client (local)
+ driver: "kafka"
+ host: "localhost:9093"
+ # mTLS - Optional
+ ca: /path/to/file.ca
+ cert: /path/to/file.crt
+ key: /path/to/file.key
+```
+
+### 1.9.6 PostgreSQL
+
+```YAML
+client:
+ - name: PostgreSQL Native Client (local)
+ driver: "postgres"
+ host: "localhost:5432"
+ username: "postgres"
+ password: "pass"
+ data: # Optional, check the specific column value in the table
+ # Usage: "database:table:column:primary_key:value" : "expected_value"
+ # transfer to : "SELECT column FROM table WHERE primary_key = value"
+ # the `value` for `primary_key` must be int
+ "test:product:name:id:1" : "EaseProbe" # select name from product where id = 1
+ "test:employee:age:id:2" : 45 # select age from employee where id = 2
+ # mTLS - Optional
+ ca: /path/to/file.ca
+ cert: /path/to/file.crt
+ key: /path/to/file.key
+```
+
+### 1.9.7 Zookeeper
+
+```YAML
+client:
+ - name: Zookeeper Native Client (local)
+ driver: "zookeeper"
+ host: "localhost:2181"
+ timeout: 5s
+ data: # Optional, check the specific value in the path
+ "/path/to/key": "value" # Check that the value of the `/path/to/key` is "value"
+ # mTLS - Optional
+ ca: /path/to/file.ca
+ cert: /path/to/file.crt
+ key: /path/to/file.key
+```
-> **Note**:
->
-> **If multiple probes using the same name then this could lead to corruption of the metrics data and/or the behavior of the application in non-deterministic way.**
# 2. Notification
+
EaseProbe supports a variety of notifications. The notifications are **Edge-Triggered**, this means that these notifications are triggered when the status changes.
Each notification is identified by the delivery it supports (eg `slack`), a unique name (across all notifies in the configuration file) and (optionally) the notify specific parameters.
@@ -561,6 +1200,14 @@ EaseProbe accepts the `HUP` signal to rotate the log.
EaseProbe supports Prometheus metrics exporter. The Prometheus endpoint is `http://localhost:8181/metrics` by default.
+The following snapshot is the Grafana panel for host CPU metrics
+
+![](./grafana.demo.png)
+
+Refer to the [Global Setting Configuration](#710-global-setting-configuration) for further details on how to configure the HTTP server.
+
+## 6.1 General Metrics
+
Currently, All of the Probers support the following metrics:
- `total`: the total number of probes
@@ -569,9 +1216,12 @@ Currently, All of the Probers support the following metrics:
- `status`: Probe status
- `SLA`: Probe SLA percentage
-And for the different Probers, the following metrics are available:
+And the different Probers have its own metrics.
+
+## 6.2 HTTP Probe
+
+The HTTP probe supports the following metrics:
-- HTTP Probe
- `status_code`: HTTP status code
- `content_len`: HTTP content length
- `dns_duration`: DNS duration in milliseconds
@@ -582,7 +1232,10 @@ And for the different Probers, the following metrics are available:
- `transfer_duration`: HTTP transfer duration in milliseconds
- `total_duration`: HTTP total duration in milliseconds
-- Ping Probe
+## 6.3 Ping Probe
+
+The Ping probe supports the following metrics:
+
- `sent`: Number of sent packets
- `recv`: Number of received packets
- `loss`: Packet loss percentage
@@ -594,25 +1247,29 @@ And for the different Probers, the following metrics are available:
Please note that `privileged: true` requires administrative privileges such as `root` (for more details see https://github.com/prometheus-community/pro-bing#supported-operating-systems)
-- TLS Probe
+## 6.4 TLS Probe
+
+The TLS probe supports the following metrics:
+
- `earliest_cert_expiry`: last TLS chain expiry in timestamp seconds
- `last_chain_expiry_timestamp_seconds`: earliest TLS cert expiry in Unix time
-- Shell & SSH Probe
- - `exit_code`: exit code of the command
- - `output_len`: length of the output
+## 6.5 Shell & SSH Probe
-- Host Probe
- - `cpu`: CPU usage in percentage
- - `memory`: memory usage in percentage
- - `disk`: disk usage in percentage
+The Shell & SSH probe supports the following metrics:
+
+ - `exit_code`: exit code of the command
+ - `output_len`: length of the output
-The following snapshot is the Grafana panel for host CPU metrics
+## 6.6 Host Probe
-![](./grafana.demo.png)
+The Host probe supports the following metrics:
-Refer to the [Global Setting Configuration](#710-global-setting-configuration) for further details on how to configure the HTTP server.
+ - `cpu`: CPU usage in percentage
+ - `memory`: memory usage in percentage
+ - `disk`: disk usage in percentage
+ - `load`: load average for `m1`, `m5`, and `m15`
# 7. Configuration
@@ -631,53 +1288,25 @@ The following environment variables can be used to fine-tune the request to the
* `HTTP_AUTHORIZATION`
* `HTTP_TIMEOUT`
-And the configuration file should be versioned, the version should be aligned with the EaseProbe binary version.
+EaseProbe supports multiple configuration files, with the use of the command line flag `-f` which indicates the directory of the configuration files.
-```yaml
-version: v1.5.0
+```shell
+easeprobe -f /path/to/conf
```
-The following example configurations illustrate the EaseProbe supported features.
-**Note**: All probes have the following options:
+The configuration file should be versioned, the version should be aligned with the EaseProbe binary version.
-- `timeout` - the maximum time to wait for the probe to complete. default: `30s`.
-- `interval` - the interval time to run the probe. default: `1m`.
-- `failure` - number of consecutive failed probes needed to determine the status down, default: 1
-- `success` - number of consecutive successful probes needed to determine the status up, default: 1
+```yaml
+version: v1.5.0
+```
-## 7.1 HTTP Probe Configuration
+You can find the full configuration template [here](../resources/config.yaml)
-### 7.1.1 Basic HTTP Configuration
+## 7.1 Probe Configuration
```YAML
-# HTTP Probe Configuration
-
-http:
- # A Website
- - name: MegaEase Website (Global)
- url: https://megaease.com
-
- # Some of the Software support the HTTP Query
- - name: ElasticSearch
- url: http://elasticsearch.server:9200
- - name: Eureka
- url: http://eureka.server:8761
- - name: Prometheus
- url: http://prometheus:9090/graph
-
- # Spring Boot Application with Actuator Heath API
- - name: EaseService-Governance
- url: http://easeservice-mgmt-governance:38012/actuator/health
- - name: EaseService-Control
- url: http://easeservice-mgmt-control:38013/actuator/health
- - name: EaseService-Mesh
- url: http://easeservice-mgmt-mesh:38013/actuator/health
-```
-
-### 7.1.2 Complete HTTP Configuration
-
-```yaml
+# --------------------- HTTP Probe Configuration ---------------------
http:
# A completed HTTP Probe configuration
- name: Special Website
@@ -721,167 +1350,12 @@ http:
expression: "x_time('//feed/updated') > '2022-07-01'" # the expression to evaluate.
# configuration
timeout: 10s # default is 30 seconds
-```
-
-> **Note**:
->
-> The Regular Expression supported refer to https://github.com/google/re2/wiki/Syntax
-> The XPath only supported 1.0/2.0) syntax. refer to https://www.w3.org/TR/xpath/, and the library is https://github.com/antchfx/xpath
-
-### 7.1.3 Expression Evaluation
-
-HTTP Probe supports two type of expression evaluation.
-
-- `XML`, `JSON`, `HTML `: using the **XPath(1.0/2.0)** to extract the value
-- `TEXT` : using the **Regression Expression** to extract the value
-
-And the configuration can be two types as below:
-
-**1) Variable Definition**
-
-```yaml
- eval:
- doc: XML # support XML, JSON, HTML, TEXT
- expression: "updated > '2022-07-01'"
- variables: # variables definition
- - name: updated # variable name
- type: time # variable type, support `int`, `float`, `bool`, `time` and `duration`.
- query: "//feed/updated" # the XPath query to get the variable value.
-```
-
-**2) Build-in XPath function Expression Evaluation**
-
-you can just use the XPath build-in function in expression so simplify the configuration.
-
-```yaml
- eval:
- doc: XML # support XML, JSON, HTML, TEXT.
- expression: "x_time('//feed/updated') > '2022-07-01'" # the expression to evaluate.
-```
-
-Currently, EaseProbe supports the following XPath functions:
-- `x_str` - get the string value from the XPath/RegExp query result.
-- `x_int` - get the integer value from the XPath/RegExp query result.
-- `x_float` - get the float value from the XPath/RegExp query result.
-- `x_time` - get the time value from the XPath/RegExp query result.
-- `x_duration` - get the duration value from the XPath/RegExp query result.
-
-**3) Build-in Functions**
-
-Currently, EaseProbe supports the following build-in functions:
-
-- `strlen` - get the string length.
-- `now` - get the current time.
-- `duration` - get the duration value.
-
-
-For examples:
-
-check the `time` from response is 5 seconds later than the current time.
-
-```yaml
-eval:
- doc: HTML
- expression: "now() - x_time('//div[@id=\\'time\\']') > 5"
-```
-
-
-Check the duration from response is less than 1 second.
-
-```yaml
-eval:
- doc: HTML
- expression: "duration(rt) < duration('1s')"
- variables:
- - name: rt # variable name `rt` will be used in expression.
- type: duration # variable type is `duration`
- query: "//div[@id=\\'time\\']" # the XPath query the value.
-```
-Or
-
-```yaml
-eval:
- doc: HTML
- expression: "x_duration('//div[@id=\\'resp_time\\']') < duration('1s')"
-```
-
-
-**4) XPath Syntax Example**
-
-Considering we have the following response:
-
-```json
-{
- "company": {
- "name": "MegaEase",
- "person": [{
- "name": "Bob",
- "email": "bob@example.com",
- "age": 35,
- "salary": 35000.12,
- "birth": "1984-10-12",
- "work": "40h",
- "fulltime": true
- },
- {
- "name": "Alice",
- "email": "alice@example.com",
- "age": 25,
- "salary": 25000.12,
- "birth": "1985-10-12",
- "work": "30h",
- "fulltime": false
- }
- ]
- }
-}
-```
-Then, the extraction syntax as below:
-
-```
-"//name" ==> "MegaEase"
-"//company/name" ==> "MegaEase"
-"//email" ==> "bob@example.com"
-"//company/person/*[1]/name" ==> "Bob"
-"//company/person/*[2]/emai ==> "alice@example.com"
-"//company/person/*[last()]/name" ==> "Alice"
-"//company/person/*[last()]/age" ==> "25"
-"//company/person/*[salary=25000.12]/salary" ==> "25000.12"
-"//company/person/*[name='Bob']/birth" ==> "1984-10-12"
-"//company/person/*[name='Alice']/work" ==> "30h"
-"//*/email[contains(.,'bob')]" ==> "bob@example.com"
-"//work", ==> "40h"
-"//person/*[2]/fulltime" ==> "false"
-```
-
-**5) Regression Expression Syntax Examples**
-
-Considering we have the following response:
-
-`name: Bob, email: bob@example.com, age: 35, salary: 35000.12, birth: 1984-10-12, work: 40h, fulltime: true`
-
-Then, the extraction syntax as below:
+ failure: 2 # number of consecutive failed probes needed to determine the status down, default: 1
+ success: 1 # number of consecutive successful probes needed to determine the status up , default: 1
-```
-"name: (?P[a-zA-Z0-9 ]*)" ==> "Bob"
-"email: (?P[a-zA-Z0-9@.]*)" ==> "bob@example.com"
-"age: (?P[0-9]*)" ==> "35"
-"age: (?P\\d+)" ==> "35"
-"salary: (?P[0-9.]*)" ==> "35000.12"
-"salary: (?P\\d+\\.\\d+)" ==> "35000.12"
-"birth: (?P[0-9-]*)" ==> "1984-10-12"
-"birth: (?P\\d{4}-\\d{2}-\\d{2})" ==> "1984-10-12"
-"work: (?P\\d+[hms])" ==> "40h"
-"fulltime: (?Ptrue|false)" ==> "true"
-```
-> Notes
->
-> Checking the unit test case in [`eval`](./eval/) package you can find more examples.
-## 7.2 TCP Probe Configuration
+# --------------------- TCP Probe Configuration ---------------------
-```YAML
-# TCP Probe Configuration
tcp:
- name: SSH Service
host: example.com:22
@@ -889,13 +1363,11 @@ tcp:
interval: 2m # default is 60 seconds
proxy: socks5://proxy.server:1080 # Optional. Only support socks5.
# Also support the `ALL_PROXY` environment.
- - name: Kafka
- host: kafka.server:9093
-```
+ failure: 2 # number of consecutive failed probes needed to determine the status down, default: 1
+ success: 1 # number of consecutive successful probes needed to determine the status up, default: 1
-## 7.3 Ping Probe Configuration
+--------------------- Ping Probe Configuration ---------------------
-```YAML
ping:
- name: Localhost
host: 127.0.0.1
@@ -904,24 +1376,10 @@ ping:
privileged: true # if true, the ping will be executed with icmp, otherwise use udp, default: false (Note: On Windows platform, this must be set to True)
timeout: 10s # default is 30 seconds
interval: 2m # default is 60 seconds
-```
-
-## 7.4 Shell Command Probe Configuration
-The shell command probe is used to execute a shell command and check the output.
+# --------------------- Shell Probe Configuration ---------------------
-The following example shows how to configure the shell command probe.
-
-```YAML
-# Shell Probe Configuration
shell:
- # A proxy curl shell script
- - name: Google Service
- cmd: "./resources/probe/scripts/proxy.curl.sh"
- args:
- - "socks5://127.0.0.1:1085"
- - "www.google.com"
-
# run redis-cli ping and check the "PONG"
- name: Redis (Local)
cmd: "redis-cli"
@@ -939,33 +1397,9 @@ shell:
not_contain: "failure" # response body must NOT contain this string, if it does the probe is considered failed.
regex: false # if true, the `contain` and `not_contain` will be treated as regular expression. default: false
- # Run Zookeeper command `stat` to check the zookeeper status
- - name: Zookeeper (Local)
- cmd: "/bin/sh"
- args:
- - "-c"
- - "echo stat | nc 127.0.0.1 2181"
- contain: "Mode:"
-```
-
-> **Note**:
->
-> The Regular Expression supported refer to https://github.com/google/re2/wiki/Syntax
-
-## 7.5 SSH Command Probe Configuration
-
-SSH probe is similar to Shell probe.
-- Support Password and Private key authentication.
-- Support the Bastion host tunnel.
-
-The `host` supports the following configuration
-- `example.com`
-- `example.com:22`
-- `user@example.com:22`
-The following are examples of SSH probe configuration.
+# --------------------- SSH Probe Configuration ---------------------
-```YAML
# SSH Probe Configuration
ssh:
# SSH bastion host configuration
@@ -1007,16 +1441,10 @@ ssh:
username: ubuntu
key: /path/to/private.key
cmd: "ps -ef | grep kafka"
-```
-> **Note**:
->
-> The Regular Expression supported refer to https://github.com/google/re2/wiki/Syntax
-## 7.6 TLS Probe Configuration
-TLS ping to remote endpoint, can probe for revoked or expired certificates
+# --------------------- TLS Probe Configuration ---------------------
-```YAML
tls:
- name: expired test
host: expired.badssl.com:443
@@ -1037,20 +1465,10 @@ tls:
# root_ca_pem_path: /path/to/root/ca.pem # ignore if root_ca_pem is present
# root_ca_pem: |
# -----BEGIN CERTIFICATE-----
-```
-## 7.7 Host Resource Usage Probe Configuration
-The host resource usage probe allows for collecting information and alerting when certain resource utilization thresholds are exceeded.
-
-The resources currently monitored include CPU, memory and disk utilization. The probe status is considered as `down` when any value exceeds its defined threshold.
-
-> **Note**:
-> - The remote system to be monitored needs to have the following commands installed and available: `top`, `df`, `free`, `awk`, `grep`, `tr`, `cat` and `hostname`.
-> - The disk usage check is limited to the root filesystem only with the following command `df -h /`.
-> - The actual load would be divided by cpu core number, the threshold won't consider the cpu core number (requires proc filesystem support).
+# --------------------- Host Probe Configuration ---------------------
-```yaml
host:
bastion: # bastion server configuration
aws: # bastion host ID ◄──────────────────┐
@@ -1079,14 +1497,10 @@ host:
- name : My VPS
host: user@example.com:22
key: /Users/user/.ssh/id_rsa
-```
-## 7.8 Native Client Probe Configuration
-Native Client probe uses the native GO SDK to communicate with the remote endpoints. Additionally to simple connectivity checks, you can also define key and data validity checks for EaseProbe, it will query for the given keys and verify the data stored on each service.
+# --------------------- Native Client Probe Configuration ---------------------
-```YAML
-# Native Client Probe
client:
- name: Redis Native Client (local)
driver: "redis" # driver is redis
@@ -1170,8 +1584,7 @@ client:
key: /path/to/file.key
```
-
-## 7.9 Notification Configuration
+## 7.2 Notification Configuration
```YAML
# Notification Configuration
@@ -1288,7 +1701,7 @@ notify:
> ```
-## 7.10 Global Setting Configuration
+## 7.3 Global Setting Configuration
```YAML
# Global settings for all probes and notifiers.