You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be nice if Nomad would occasionally (well, via definable interval) re-evaluate system jobs. It's currently triggered if a node rejoins the cluster, but not in a few other cases we've observed.
Use-cases
A few times now we've had issues where vault (or consul) became unavailable on a node. This then caused a running system job allocation to fail, which was not restarted, even though the job spec says to never fail a job. Eventually you end up where in Nomad the job status is 'running' but it has zero allocations.
Upon return of consul (or vault), the job was not re-evaluated, and no allocations were placed. It would be nice if job re-evaluation (for system jobs) would trigger on the return of consul or vault, or at a set interval.
Attempted Solutions
We currently have a cron job that runs a nomad job eval every 30 minutes, but this isn't ideal.
The text was updated successfully, but these errors were encountered:
That actually looks exactly like the issue we've been having - I'm currently on "vacation" (yeah, I make github issues on vacation :D) so won't be able to test right now, but I'm going to go out on a limb and guess this will solve our problem, so I'll close the issue :)
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Proposal
It would be nice if Nomad would occasionally (well, via definable interval) re-evaluate system jobs. It's currently triggered if a node rejoins the cluster, but not in a few other cases we've observed.
Use-cases
A few times now we've had issues where vault (or consul) became unavailable on a node. This then caused a running system job allocation to fail, which was not restarted, even though the job spec says to never fail a job. Eventually you end up where in Nomad the job status is 'running' but it has zero allocations.
Upon return of consul (or vault), the job was not re-evaluated, and no allocations were placed. It would be nice if job re-evaluation (for system jobs) would trigger on the return of consul or vault, or at a set interval.
Attempted Solutions
We currently have a cron job that runs a nomad job eval every 30 minutes, but this isn't ideal.
The text was updated successfully, but these errors were encountered: