You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During incident response, operators may find that automated processes elsewhere in the organization can be generating new workloads on Nomad clusters that are unable to handle the workload. Add an operator API endpoint that causes all job registration calls to be rejected with an error stating that the cluster is in a load-shedding mode.
The text was updated successfully, but these errors were encountered:
We've also had an internal discussion about automatically load-shedding in cases where the Nomad servers are overloaded, but that's a good deal more involved than this issue. In any case, whatever state this option has will most likely be used for that mechanism anyways.
A new field RejectJobRegistration on SchedulerConfig, which we already have wired up to get persisted to raft.
Job.Register and Job.Dispatch will check this this field value. If the field is true we'll return an error unless the ACL is a management token.
We don't currently allow SchedulerConfig to be set via CLI... we may have a couple more upcoming features for SchedulerConfig, so let's put off adding a CLI until we know what they're all going to be so that we have a sensible/ergonomic CLI for all of them together.
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
During incident response, operators may find that automated processes elsewhere in the organization can be generating new workloads on Nomad clusters that are unable to handle the workload. Add an operator API endpoint that causes all job registration calls to be rejected with an error stating that the cluster is in a load-shedding mode.
The text was updated successfully, but these errors were encountered: