You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We've been using Nomad's native Vault integration to issue tokens with a TTL of 72 hours. Until recently, one of our customers used this token only at bootstrap time. They've now adopted Vault's transit backend, and make regular use of the token throughout the guest instance's lifespan. As part of that work, the customer also implemented simple monitoring of the token's TTL.
I had provided (inaccurate) guidance that tokens were refreshed at approximately 1/2 lease duration. In our case, we expected that to be 36 hours, so a 24 hour threshold was chosen for the monitor. Shortly after configuring alerts against the monitor, the customers discovered token TTL was frequently dipping below the 24 hour threshold.
We've dug into this a bit, and it appears Nomad used to attempt token renewal at ~1/2 leaseDuration. We're trying to understand the reason for the current behaviour, which, if I am reading correctly, will choose a token renewal time between 10 and (lease duration - 10 seconds) seconds in the future (assuming a leaseDuration of >= 30 sec).
While it makes good sense to spread out token renewal times, this window seems very wide. Was the previous renewal attempt time of ~1/2 leaseDuration abandoned intentionally?
(Our current workloads are admittedly a bit of a funny fit: we inject per-task Vault tokens into VMs via cloud-init, hence the interest in reducing the risk of e.g. delaying token refresh until near-EOL and experiencing expired tokens due to brief Vault service disruptions.)
Reproduction steps
Schedule any job with Vault integration.
Nomad Server logs (if appropriate)
n/a
Nomad Client logs (if appropriate)
n/a
Job file (if appropriate)
n/a
The text was updated successfully, but these errors were encountered:
Good question. I'm chatting with the Vault team internally about what the appropriate behavior should be as I agree: the current behavior seems problematic.
schmichael
changed the title
[question] Window for per-task Vault token renewal time extremely wide by design?
Window for per-task Vault token renewal time extremely wide by design?
Mar 27, 2019
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Nomad version
Nomad v0.8.7 (21a2d93eecf018ad2209a5eab6aae6c359267933+CHANGES)
Vault version
1.0.3+prem.hsm
(via/v1/sys/health
)Operating system and Environment details
Ubuntu 16.04LTS
Issue
We've been using Nomad's native Vault integration to issue tokens with a TTL of 72 hours. Until recently, one of our customers used this token only at bootstrap time. They've now adopted Vault's transit backend, and make regular use of the token throughout the guest instance's lifespan. As part of that work, the customer also implemented simple monitoring of the token's TTL.
I had provided (inaccurate) guidance that tokens were refreshed at approximately 1/2 lease duration. In our case, we expected that to be 36 hours, so a 24 hour threshold was chosen for the monitor. Shortly after configuring alerts against the monitor, the customers discovered token TTL was frequently dipping below the 24 hour threshold.
We've dug into this a bit, and it appears Nomad used to attempt token renewal at ~1/2 leaseDuration. We're trying to understand the reason for the current behaviour, which, if I am reading correctly, will choose a token renewal time between 10 and (lease duration - 10 seconds) seconds in the future (assuming a
leaseDuration
of >= 30 sec).While it makes good sense to spread out token renewal times, this window seems very wide. Was the previous renewal attempt time of ~1/2 leaseDuration abandoned intentionally?
(Our current workloads are admittedly a bit of a funny fit: we inject per-task Vault tokens into VMs via cloud-init, hence the interest in reducing the risk of e.g. delaying token refresh until near-EOL and experiencing expired tokens due to brief Vault service disruptions.)
Reproduction steps
Schedule any job with Vault integration.
Nomad Server logs (if appropriate)
n/a
Nomad Client logs (if appropriate)
n/a
Job file (if appropriate)
n/a
The text was updated successfully, but these errors were encountered: