You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Replacing all nomad servers results in Nomad not being able to renew application tokens.
Reproduction steps
Vault config:
auth "aws" {
type = "aws"
role "nomad-cluster" {
policies = "nomad-server"
auth_type = "ec2"
max_ttl = "6h"
period = "1h"
allow_instance_migration = false
bound_iam_role_arn = "arn:aws:iam::***"
}
}
auth "token" {
type = "token"
role "nomad-cluster" {
disallowed_policies = "nomad-server"
explicit_max_ttl = 0
name = "nomad-cluster"
orphan = false
period = 3600
renewable = true
}
}
Replace all nomad server nodes (EC2 instances) in a proper rolling fashion, leader as last etc. Shortly after this, some applications are unable to use the token provided by Nomad to talk to Vault and we replace the worker nodes to alleviate the problem.
Nomad Client logs (if appropriate)
This particular entry (numerous times) is only observed after completely replacing Nomad servers. [ERR] client.vault: renewal of token failed: failed to renew the vault token: Error making API request. Expanding the above error with more information about why the request failed would be useful.
We also see these: [ERR] client: failed to renew Vault token for task app on alloc "93e0a494-df1c-e908-6e13-3d6080759e01": failed to renew the vault token: Error making API request.
The text was updated successfully, but these errors were encountered:
Sorry you hit this! This likely occurred since the new Nomad servers were given different Vault tokens than the servers they replaced. Since Nomad historically generates tokens for tasks with Orphan set to false, when the old Nomad tokens expired, your tasks tokens also got revoked. This behavior will change in Nomad 0.8: #3992
So I suggest you upgrade when it is released (few weeks) and set the token role to allow orphaned tokens.
Thanks @dadgar. Do you happen to know what the easiest workaround is once nomad servers are replaced? Us replacing nomad clients and rescheduling tasks to new hosts fixes it but is there an easier way?
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Nomad version
Nomad v0.7.1
Operating system and Environment details
Centos 7
Issue
Replacing all nomad servers results in Nomad not being able to renew application tokens.
Reproduction steps
Vault config:
Replace all nomad server nodes (EC2 instances) in a proper rolling fashion, leader as last etc. Shortly after this, some applications are unable to use the token provided by Nomad to talk to Vault and we replace the worker nodes to alleviate the problem.
Nomad Client logs (if appropriate)
This particular entry (numerous times) is only observed after completely replacing Nomad servers.
[ERR] client.vault: renewal of token failed: failed to renew the vault token: Error making API request.
Expanding the above error with more information about why the request failed would be useful.
We also see these:
[ERR] client: failed to renew Vault token for task app on alloc "93e0a494-df1c-e908-6e13-3d6080759e01": failed to renew the vault token: Error making API request.
The text was updated successfully, but these errors were encountered: