Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vault.agent.authenticated metric #26570

Merged

Conversation

markafarrell
Copy link
Contributor

This adds an additional metric to vault agent telemetry that allows you to see if vault agent is currently authenticated and has a valid token.

When the metric is set to 1 it means that the agent has successfully authenticated with the vault server and has a valid token.

When it is set to 0 it means that the agent does not have a valid token.

fixes #26569

@markafarrell markafarrell requested a review from a team as a code owner April 22, 2024 03:06
@markafarrell
Copy link
Contributor Author

The below can be used to demonstrate the new metric

Generate TLS certificates

mkdir -p container-data/vault/tls
openssl req -x509 -nodes -days 9999 -newkey rsa:2048 \
-keyout  container-data/vault/tls/vault_server.key -out  container-data/vault/tls/vault_server.crt \
-subj "/CN=AU/ST=Some-State/L=Some-City/O=Internet Widgits Pty Ltd/OU=Something/\
CN=vault-server" \
-addext "subjectAltName = DNS:vault-server"
chmod g+r container-data/vault/tls/*

Generate configuration

mkdir -p container-data/vault/config

cat <<EOF > container-data/vault/config/vault_main.hcl
ui = true

listener "tcp" {
  address = "[::]:8200"
  cluster_address = "[::]:8201"
  tls_cert_file = "/vault/tls/vault_server.crt"
  tls_key_file  = "/vault/tls/vault_server.key"
}

storage "file" {
  path = "/vault/data"
}
EOF

Start vault

mkdir -p logs

mkdir -p logs/vault

chmod g+w logs/vault

docker network create vault-agent-test

docker run --rm -d -p 8200:8200 -e VAULT_LOG_LEVEL=debug -e NO_PROXY="vault-server" -v $PWD/container-data/vault/tls/:/vault/tls/ -v $PWD/container-data/vault/config/vault_main.hcl:/vault/config/vault_main.hcl -v $PWD/logs/vault:/var/log/vault --cap-add IPC_LOCK --network=vault-agent-test --name=vault-server hashicorp/vault:1.16.1 server

docker run --rm -it --cap-add IPC_LOCK -e VAULT_CLI_NO_COLOR=1 -e VAULT_ADDR=https://vault-server:8200 -e NO_PROXY=vault-server -e VAULT_SKIP_VERIFY=TRUE --network=vault-agent-test hashicorp/vault:1.16.1 operator init | tee logs/init.log

Extract unseal keys and root token

mkdir -p secrets
grep "Unseal Key" logs/init.log | awk -F':' '{ print $2}' | tr -d ' ' | tee secrets/unseal_keys
grep "Initial Root Token" logs/init.log | awk -F':' '{ print $2}' | tr -d ' ' | tr -d '\n' | tr -d '\r' | tee secrets/root_token

Unseal Vault

for k in $(cat secrets/unseal_keys)
do
    docker run --rm -it --cap-add IPC_LOCK -e VAULT_ADDR=https://vault-server:8200 -e NO_PROXY=vault-server -e VAULT_SKIP_VERIFY=TRUE --network=vault-agent-test hashicorp/vault:1.16.1 operator unseal $k
done

Enable approle auth

docker run --rm -it --cap-add IPC_LOCK -e VAULT_ADDR=https://vault-server:8200 -e VAULT_TOKEN=$(cat $PWD/secrets/root_token) -e NO_PROXY=vault-server -e VAULT_SKIP_VERIFY=TRUE --network=vault-agent-test hashicorp/vault:1.16.1 auth enable approle

Create approle

docker run --rm -it --cap-add IPC_LOCK -e VAULT_ADDR=https://vault-server:8200 -e VAULT_TOKEN=$(cat $PWD/secrets/root_token) -e NO_PROXY=vault-server -e VAULT_SKIP_VERIFY=TRUE --network=vault-agent-test hashicorp/vault:1.16.1 \
vault write auth/approle/role/my-role \
    secret_id_ttl=10m \
    token_num_uses=10 \
    token_ttl=20m \
    token_max_ttl=30m \
    secret_id_num_uses=40

mkdir -p secrets/approle

docker run --rm -it --cap-add IPC_LOCK -e VAULT_CLI_NO_COLOR=1 -e VAULT_ADDR=https://vault-server:8200 -e VAULT_TOKEN=$(cat $PWD/secrets/root_token) -e NO_PROXY=vault-server -e VAULT_SKIP_VERIFY=TRUE --network=vault-agent-test hashicorp/vault:1.16.1 vault read -field=role_id auth/approle/role/my-role/role-id | tr -d '\n' | tr -d '\r' | tee secrets/approle/role-id; echo

docker run --rm -it --cap-add IPC_LOCK -e VAULT_CLI_NO_COLOR=1 -e VAULT_ADDR=https://vault-server:8200 -e VAULT_TOKEN=$(cat $PWD/secrets/root_token) -e NO_PROXY=vault-server -e VAULT_SKIP_VERIFY=TRUE --network=vault-agent-test hashicorp/vault:1.16.1 vault write -field secret_id -f auth/approle/role/my-role/secret-id | tr -d '\n' | tr -d '\r' | tee secrets/approle/secret-id; echo

Generate Vault Agent configuration

mkdir -p container-data/vault-agent/config

cat <<EOF > container-data/vault-agent/config/vault-agent-conf.hcl
auto_auth {
  method {
    type = "approle"

    config = {
      role_id_file_path = "/etc/vault/approle/role-id"
      secret_id_file_path = "/etc/vault/approle/secret-id"
    }
  }

  sinks {
    sink {
      type = "file"

      config = {
        path = "/tmp/file-foo"
      }
    }
  }
}
listener "tcp" {
  address = "0.0.0.0:8100"
  tls_disable = true
  unauthenticated_metrics_access = true
}

telemetry {
  disable_hostname = true
}

cache {}
EOF

Start Vault Agent

docker run --rm -d -p 8100:8100 -e VAULT_LOG_LEVEL=debug -e VAULT_ADDR=https://vault-server:8200 -e VAULT_SKIP_VERIFY=TRUE -e NO_PROXY="vault-server" -v $PWD/container-data/vault/tls/:/vault/tls/ -v $PWD/container-data/vault-agent/config/vault-agent-conf.hcl:/vault/config/vault-agent-conf.hcl -v $PWD/secrets/approle:/etc/vault/approle:rw --cap-add IPC_LOCK --network=vault-agent-test --name=vault-agent hashicorp/vault:1.16.1 agent -config /vault/config/vault-agent-conf.hcl
docker logs vault-agent

Get Vault Agent Metrics

curl -s -x '' http://127.0.0.1:8100/agent/v1/metrics?format=prometheus | grep vault_agent_auth
# HELP vault_agent_auth_success vault_agent_auth_success
# TYPE vault_agent_auth_success counter
vault_agent_auth_success 2

Start Modified Vault Agent

docker run --rm -d -p 8100:8100 -e VAULT_LOG_LEVEL=debug -e VAULT_ADDR=https://vault-server:8200 -e VAULT_SKIP_VERIFY=TRUE -e NO_PROXY="vault-server" -v $PWD/container-data/vault/tls/:/vault/tls/ -v $PWD/container-data/vault-agent/config/vault-agent-conf.hcl:/vault/config/vault-agent-conf.hcl -v $PWD/secrets/approle:/etc/vault/approle:rw --cap-add IPC_LOCK --network=vault-agent-test --name=vault-agent vault:dev agent -config /vault/config/vault-agent-conf.hcl
docker logs vault-agent

Get Vault Agent Metrics

curl -s -x '' http://127.0.0.1:8100/agent/v1/metrics?format=prometheus | grep vault_agent_auth
# HELP vault_agent_auth_authenticated vault_agent_auth_authenticated
# TYPE vault_agent_auth_authenticated gauge
vault_agent_auth_authenticated 1
# HELP vault_agent_auth_failure vault_agent_auth_failure
# TYPE vault_agent_auth_failure counter
vault_agent_auth_failure 4
# HELP vault_agent_auth_success vault_agent_auth_success
# TYPE vault_agent_auth_success counter
vault_agent_auth_success 2

@markafarrell markafarrell force-pushed the feature/add-agent-authenticated-metric branch 2 times, most recently from b88be6e to aafebf7 Compare April 22, 2024 03:12
@divyaac divyaac added the agent label Apr 22, 2024
@@ -0,0 +1,3 @@
```release-note:feature
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would probably qualify as an improvement!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

ah.logger.Info("renewed auth token")

case <-credCh:
ah.logger.Info("auth method found new credentials, re-authenticating")
ah.logger.Info("autreh method found new credentials, -authenticating")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably revert this change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@divyaac
Copy link
Contributor

divyaac commented Apr 22, 2024

Hi @markafarrell, thank you so much for your PR! Before we proceed - I'd love to understand your use case a little bit better. Currently, we can see when authentication has succeeded and the agent has a valid token in the server logs (ie. https://github.com/hashicorp/vault/blob/main/command/agentproxyshared/auth/auth.go#L480 ). Is there a reason that telemetry might better serve your needs than the server logs?

@markafarrell
Copy link
Contributor Author

@divyaac Having a metric makes it much easier to integrate with alerting tools like Prometheus alert manager.

Then you can get an alert when that metric goes to zero so you can promptly act. Instead of having to look at logs to see that the agent is not authenticated

@divyaac
Copy link
Contributor

divyaac commented Apr 23, 2024

@divyaac Having a metric makes it much easier to integrate with alerting tools like Prometheus alert manager.

Then you can get an alert when that metric goes to zero so you can promptly act. Instead of having to look at logs to see that the agent is not authenticated

Thanks for your response @markafarrell . I think adding this metric would make sense. After addressing the comments we should be able to get move this PR along!

@@ -276,6 +288,8 @@ func (ah *AuthHandler) Run(ctx context.Context, am AuthMethod) error {
if err != nil {
ah.logger.Error("error creating client for wrapped call", "error", err, "backoff", backoffCfg)
metrics.IncrCounter([]string{ah.metricsSignifier, "auth", "failure"}, 1)
// Set unauthenticated when authentication fails
Copy link
Contributor

@divyaac divyaac Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment can be deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -478,6 +511,8 @@ func (ah *AuthHandler) Run(ctx context.Context, am AuthMethod) error {
}

metrics.IncrCounter([]string{ah.metricsSignifier, "auth", "success"}, 1)
// Set authenticated when authentication succeeds
Copy link
Contributor

@divyaac divyaac Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment can be deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -144,12 +144,18 @@ func (ah *AuthHandler) Run(ctx context.Context, am AuthMethod) error {
backoffCfg := newAutoAuthBackoff(ah.minBackoff, ah.maxBackoff, ah.exitOnError)

ah.logger.Info("starting auth handler")

// Set unauthenticated when starting up
metrics.SetGauge([]string{ah.metricsSignifier, "auth", "authenticated"}, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The comment can be deleted (and done for the rest of the new additions)
  2. The name of the metric can be
    metrics.SetGauge([]string{ah.metricsSignifier, "authenticated"}, 0)
    aka, we can remove the "auth" prefix from the string array

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Contributor

@divyaac divyaac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the changes are addressed we can move forward!

@markafarrell markafarrell force-pushed the feature/add-agent-authenticated-metric branch from e336c63 to 54ebad9 Compare April 29, 2024 01:54
@markafarrell markafarrell requested review from a team as code owners April 29, 2024 01:54
@markafarrell markafarrell force-pushed the feature/add-agent-authenticated-metric branch 3 times, most recently from f15336a to f60907f Compare April 29, 2024 02:07
@markafarrell markafarrell changed the title Add vault.agent.auth.authenticated metric Add vault.agent.authenticated metric Apr 29, 2024
@markafarrell markafarrell force-pushed the feature/add-agent-authenticated-metric branch from f60907f to a6b2e25 Compare April 30, 2024 01:33
Copy link
Contributor

@schavis schavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc updates lgtm

@markafarrell markafarrell requested a review from divyaac May 3, 2024 05:05
@markafarrell markafarrell force-pushed the feature/add-agent-authenticated-metric branch 2 times, most recently from 3bf2951 to 1eb2322 Compare May 6, 2024 01:21
@markafarrell markafarrell force-pushed the feature/add-agent-authenticated-metric branch from 1eb2322 to d1e0696 Compare May 9, 2024 01:26
@schavis schavis added the content-lgtm Content changes approved. Merge depends on code review label May 9, 2024
Copy link
Contributor

@schavis schavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Content update lgtm. Feel free to merge once ENG approves

@schavis schavis added the needs-eng-review Community PR waiting for ENG review label May 9, 2024
@VioletHynes
Copy link
Contributor

Thanks for this! I chatted with @divyaac and she's approved it, I resolved the merge conflicts (I think they were mostly my fault!) and I'll try and get this merged if everything passes. Great work :D

@VioletHynes VioletHynes merged commit 476b0d5 into hashicorp:main May 28, 2024
67 of 68 checks passed
@pieter-lautus
Copy link

This is awesomesauce! This resolves a query I raised on discuss.hashicorp.com about how to use the pre-existing telemetry to effectively monitor a fleet of vault agents.

@pieter-lautus
Copy link

More correctly: this resolves an issue that has made me reluctant to roll out a fleet of vault agents in the first place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent content-lgtm Content changes approved. Merge depends on code review needs-eng-review Community PR waiting for ENG review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to ascertain vault agent authentication status from metrics
5 participants