You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The first error is possible due to network latency/timeouts to a database when using dynamic credentials. The second error is new and is due to the new audit logging behaviour in Vault 1.15. This causes the vault_audit_log_request_failure telemetry metric to be incremented and therefore could trigger alerts unnecessarily.
To Reproduce
I've managed to reproduce this with a vault server -dev instance locally:
Follow the getting started guide using a local Vault instance and a local postgres instance running in docker.
Enable audit logs with vault audit enable file file_path=/tmp/vault_audit.log
Install toxiproxy locally which we will use to simulate some network latency/jitter. Start the server with toxiproxy-server.
Set up a backend in toxiproxy toxiproxy-cli create -l localhost:5433 -u localhost:5432 postgres
Add a "toxic" to simulate some network latency/jitter toxiproxy-cli toxic add -t latency -a latency=1000 -a jitter=2000 postgres
Tell Vault to use toxiproxy instead of postgres direct:
Create some dynamic credentials: vault read database/creds/readonly
Generate a curl command to renew the lease: vault lease renew --output-curl-string database/creds/readonly/weDQJArsWMkCUwVBf7zzOVwg (update the lease ID as needed)
Update the curl command to have a timeout and run the command: curl --max-time 1.2 -X PUT -H "X-Vault-Request: true" -H "X-Vault-Token: $(vault print token)" -d '{"increment":0,"lease_id":"database/creds/readonly/weDQJArsWMkCUwVBf7zzOVwg"}' http://127.0.0.1:8200/v1/sys/leases/renew
Review the Vault error log
You should see an event not processed by enough 'sink' nodes error here. This error can happen in a dynamic environment where requests are made to renew leases, which can take a bit of time, and the request is cancelled while it's being processed. This can happen in k8s when pods are spinning up/down frequently.
Expected behavior
Although the lease renewal failed, I would not expect this to cause an error when writing to the audit file. I don't believe it caused a problem in earlier versions of Vault.
Environment:
Vault Server Version (retrieve with vault status): 1.15.1
Vault CLI Version (retrieve with vault version): 1.15.1
Server Operating System/Architecture: MacOS arm64/Linux amd64
Vault server configuration file(s): Not applicable as this is using a vault server -dev instance with no special configuration.
Setting VAULT_AUDIT_DISABLE_EVENTLOGGER=true as an env var to revert back to the previous audit log behaviour stops the error appearing in the logs:
Describe the bug
Since upgrading to Vault 1.15, we are seeing occasional errors in the logs such as:
The first error is possible due to network latency/timeouts to a database when using dynamic credentials. The second error is new and is due to the new audit logging behaviour in Vault 1.15. This causes the
vault_audit_log_request_failure
telemetry metric to be incremented and therefore could trigger alerts unnecessarily.To Reproduce
I've managed to reproduce this with a
vault server -dev
instance locally:vault audit enable file file_path=/tmp/vault_audit.log
toxiproxy-server
.toxiproxy-cli create -l localhost:5433 -u localhost:5432 postgres
toxiproxy-cli toxic add -t latency -a latency=1000 -a jitter=2000 postgres
vault read database/creds/readonly
vault lease renew --output-curl-string database/creds/readonly/weDQJArsWMkCUwVBf7zzOVwg
(update the lease ID as needed)curl --max-time 1.2 -X PUT -H "X-Vault-Request: true" -H "X-Vault-Token: $(vault print token)" -d '{"increment":0,"lease_id":"database/creds/readonly/weDQJArsWMkCUwVBf7zzOVwg"}' http://127.0.0.1:8200/v1/sys/leases/renew
You should see an
event not processed by enough 'sink' nodes
error here. This error can happen in a dynamic environment where requests are made to renew leases, which can take a bit of time, and the request is cancelled while it's being processed. This can happen in k8s when pods are spinning up/down frequently.Expected behavior
Although the lease renewal failed, I would not expect this to cause an error when writing to the audit file. I don't believe it caused a problem in earlier versions of Vault.
Environment:
vault status
): 1.15.1vault version
): 1.15.1Vault server configuration file(s): Not applicable as this is using a
vault server -dev
instance with no special configuration.Setting
VAULT_AUDIT_DISABLE_EVENTLOGGER=true
as an env var to revert back to the previous audit log behaviour stops the error appearing in the logs:Only the "context canceled" error is displayed when the lease renewal request is cancelled.
The text was updated successfully, but these errors were encountered: