-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Expose retry behavior for template stanza #3866
Comments
@SoMuchToGrok thanks for reporting this. It sounds like the default retry behavior in consul template (which gives you 12 attempts and about 6minutes before the job transitions to failed) was not sufficient for your use case. We could bring in just these set of options from Consul template into the nomad template stanza, but there's always going to be more knobs that you can change in consul template than what nomad supports. We are moving towards a plugin architecture that lets us expand to fully supporting these external tools, without having to keep updating our config options for each run time tool and driver. We will be able to address this then. |
Linking these as they are the same issue but one for Vault, one for Consul: #2623 |
Is this issue just about retrying an allocation after killing it due to an unreachable Vault? We would very much like it if we could configure Nomad to not kill an allocation when re-rendering the template fails (e.g. due to unreachable Vault servers). Is this behavior in scope for this issue? With dynamic secrets, even when the secret expires, the service might degrade gracefully (e.g. continue running without a database connection). Static secrets read from Vault would continue to be valid in most cases until the connection to Vault is re-established. |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Nomad v0.7.1
Operating system and Environment details
Ubuntu 16.04.03 LTS
Issue
When a running job encounters a template rendering failure due to Vault being inaccessible, the entire job is marked as "dead" and no rescheduling attempts are ever made. When this issue presents itself during a "net new" deploy, it's not a major deal as someone will be investigating the issue nearly immediately. However, when you have successfully deployed long-lived jobs that renew their secrets every N hours/days, this behavior becomes especially important.
The desired behavior already exists in consul-template itself (for both consul communication and vault communication). See relevant config:
Ideally, job operators should be able to configure these within the nomad template. I realize not everyone will want their jobs retrying indefinitely, but some jobs are mission-critical and requiring additional manual intervention after a Vault outage can be extremely costly.
Reproduction steps
Example template
The text was updated successfully, but these errors were encountered: