-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vault token renewal fails after some time #4372
Comments
See Vault's documentation for renewing tokens: https://www.vaultproject.io/docs/commands/token/renew.html. The Nomad process must be sent a Let us know if this solves this issue. |
@chelseakomlo so its expected of the nomad operator to |
Thanks for the replies. I would assume the nomad server is already renewing the tokens given the log statements before, i.e.
Only having glanced at the code this seems to be doing a renewal. The vault integration guide also says that the "Nomad servers will renew the token automatically". https://www.nomadproject.io/docs/vault-integration/index.html Given the log statements, it looks like for a few days it did successfully renew the token, but then yesterday it failed for some reason. |
I think I tried manually creating a new token the same way I provision it and sending the |
@jippi and @Tethik - Nomad servers will renew the token you provided it close to when it gets to half of the remaining TTL. However, its possible that if that token was revoked entirely in vault, or the operator wants to replace it with another one, so we pointed out the SIGHUP docs above if you want to change or update the token given to Nomad. Sorry for the confusion if any. @Tethik - From your logs above, its not clear to me what changed upstream in Vault, unfortunately the error message on Nomad's side |
Apologies for the confusion- as Preetha mentioned above, Nomad servers will maintain tokens on the fly, but if a token has been revoked in Vault, tokens can be updated for the Nomad agent via More information would be helpful for us to diagnose this issue. Can you provide the following:
|
Here are some more verbose logs w/ The first failure shows at
Our vault version: |
Thanks for including this information. Can you also include the Nomad agent's Vault policy? https://www.nomadproject.io/docs/vault-integration/index.html#required-vault-policies |
Here's the # Allow creating tokens under "nomad-cluster" token role. The token role name
# should be updated if "nomad-cluster" is not used.
path "auth/token/create/nomad-cluster" {
capabilities = ["update"]
}
# Allow looking up "nomad-cluster" token role. The token role name should be
# updated if "nomad-cluster" is not used.
path "auth/token/roles/nomad-cluster" {
capabilities = ["read"]
}
# Allow creating orphan tokens
path "auth/token/create-orphan" {
capabilities = ["create", "update"]
}
# Allow looking up the token passed to Nomad to validate # the token has the
# proper capabilities. This is provided by the "default" policy.
path "auth/token/lookup-self" {
capabilities = ["read"]
}
# Allow looking up incoming tokens to validate they have permissions to access
# the tokens they are requesting. This is only required if
# `allow_unauthenticated` is set to false.
path "auth/token/lookup" {
capabilities = ["update"]
}
# Allow revoking tokens that should no longer exist. This allows revoking
# tokens for dead tasks.
path "auth/token/revoke-accessor" {
capabilities = ["update"]
}
# Allow checking the capabilities of our own token. This is used to validate the
# token upon startup.
path "sys/capabilities-self" {
capabilities = ["update"]
}
# Allow our own token to be renewed.
path "auth/token/renew-self" {
capabilities = ["update"]
} I noticed this difference from the documentation, not sure why I did this. Although this change to v0.8.1? might be related. #3992
The
|
@chelseakomlo @preetapan FWIW, I noticed nomad 0.8.3 does not update its vault token from HCL configurations upon sending it a SIGHUP signal when reloading from systemd. I had to restart the service in order to get it to pick up a newly set vault token. I can create a new issue if you think that behavior is not expected and is different to what it is being reported here. |
@c4milo thanks for notifying us about this issue- if you could open a new ticket with a description of steps to reproduce and the Nomad agent configuration/relevant logs, that would be helpful. |
@Tethik I tried reproducing this issue with a token with a period set to 1 minute, but was unable to do so. There isn't a code path for Nomad agents to revoke their own tokens, so this token must be revoked out of band. If you turn on Vault audit logs, this should give a better idea of the token's lifecycle. I'm going to close this issue for now, but feel free to reopen with further Vault logs/audit logs that seem abnormal. |
Thanks @chelseakomlo for taking the time and debugging. I appreciate it. I'll try out the suggestion for using audit logs. |
@chelseakomlo, my specific issue may be difficult to happen in a real environment since Nomad servers are going to renew the token just fine if needed as you already know. The way how I'm able to reproduce my particular issue is with a local Vagrant environment by closing my laptop and X time later opening it up 😬. I would see Nomad complaining about being unable to access Vault to renew the token. Then, I run an Ansible playbook to get a new token from Vault and place it in the Nomad servers config directory, within a HCL file. At the end of the playbook, systemd reload is issued, it sends a SIGHUP signal but Nomad does not pick up the token. It seems to be an edge case, unlikely to happen in real environments, do you still want me to report it? |
@c4milo FYI, we've reproduced the issue with not being able to reload Nomad's Vault configuration via |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Nomad v0.8.3 (c85483d)
Operating system and Environment details
AWS Linux
Issue
After running our nomad cluster for a while, the vault token that we give to the nomad server seems to have expired somehow.
The initial token given to Nomad looks like something like this:
So tokens last for a week.
My last new deployment of nomad servers was on the 30th of May. This should have given them tokens that shouldn't have expired until 6th of June. However since yesterday (4th of June) it started failing already.
I'm a bit at a loss to what's happening, and how I should proceed with debugging this.
Nomad Server logs (if appropriate)
With
grep vault
More specifically I see this for every renewal attempt that fails:
In the vault logs I also see lines like this
The text was updated successfully, but these errors were encountered: