Skip to content

Commit

Permalink
Attempt to improve explanation of the current metric so Operators don…
Browse files Browse the repository at this point in the history
…'t think things are failing when they aren't (#27955)
  • Loading branch information
Peter Wilson authored Aug 2, 2024
1 parent 2dbb3d4 commit 6b9261e
Show file tree
Hide file tree
Showing 2 changed files with 14 additions and 10 deletions.
Original file line number Diff line number Diff line change
@@ -1,15 +1,17 @@
### vault.audit.log_request_failure ((#vault-audit-log_request_failure))

| Metric type | Value | Description |
|-------------|--------|---------------------------------------------------------|
| counter | number | Number of audit log request failures across all devices |
| Metric type | Value | Description |
|-------------|--------|-------------------------------------------------------------------------------------------|
| gauge | number | Average (mean) number of audit log request failures across all devices during time period |

The number of request failures is a **crucial metric**.

A non-zero value for `vault.audit.log_request_failure` indicates that all your
configured audit devices failed to log a request (or response). If Vault cannot
A non-zero value for `vault.audit.log_request_failure` indicates that all
the configured audit devices failed to log a request (or response). If Vault cannot
properly audit a request, or the response to a request, the original request
will fail.

The `mean` value for this metric should be monitored, not the `count` which could be misleading.

Refer to the Vault logs and any device-specific metrics to troubleshoot the
failing audit log device.
Original file line number Diff line number Diff line change
@@ -1,15 +1,17 @@
### vault.audit.log_response_failure ((#vault-audit-log_response_failure))

| Metric type | Value | Description |
|-------------|--------|---------------------------------------------------------|
| counter | number | Number of audit log response failures across all devices |
| Metric type | Value | Description |
|-------------|--------|--------------------------------------------------------------------------------------------|
| gauge | number | Average (mean) number of audit log response failures across all devices during time period |

The number of request failures is a **crucial metric**.

A non-zero value for `vault.audit.log_response_failure` indicates that all of
the configured audit log devices failed to log a response to a request to Vault. If Vault cannot
A non-zero value for `vault.audit.log_response_failure` indicates that all
the configured audit log devices failed to log a response to a request. If Vault cannot
properly audit a request, or the response to a request, the original request
will fail.

The `mean` value for this metric should be monitored, not the `count` which could be misleading.

Refer to the device-specific metrics and logs to troubleshoot the failing audit
log device.

0 comments on commit 6b9261e

Please sign in to comment.