Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler: allow configuring default preemption for system scheduler #6935

Merged
merged 2 commits into from
Jan 28, 2020

Conversation

notnoop
Copy link
Contributor

@notnoop notnoop commented Jan 13, 2020

Some operators want a greater control over when preemption is enabled,
especially during an upgrade to limit potential side-effects.

However, I lean towards exposing it as a general flag. The preemption code is relatively new and had few panic bugs (e.g. #6792 and #5794), so it's understandable that operators would like to control its enablement more. Additional, current server config parameters[1] have many esoteric knobs, and I suspect this flag passes that threshold. Granted, disabling preemption can lead to surprising behavior (e.g. system jobs not running on every node), and operators should be warned in our docs/guides. I'm open for alternative views.

[1] https://www.nomadproject.io/docs/configuration/server.html#server-parameters

Some operators want a greater control over when preemption is enabled,
especially during an upgrade to limit potential side-effects.
@notnoop notnoop added this to the 0.10.3 milestone Jan 13, 2020
@notnoop notnoop self-assigned this Jan 13, 2020
@@ -444,6 +444,9 @@ type ServerConfig struct {
// ServerJoin contains information that is used to attempt to join servers
ServerJoin *ServerJoin `hcl:"server_join"`

// SystemSchedulerPreemptionEnabledDefault is used to determin whether to enable system preemption by default in a new cluster
Copy link
Contributor

@preetapan preetapan Jan 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/determin/determine here and everywhere else below

@notnoop
Copy link
Contributor Author

notnoop commented Jan 14, 2020

BTW - I would love a better name than system_scheduler_preemption_enabled_default. Alternatively, instead of a single flag for system scheduler, we can add a default_scheduler_config:

server {
  default_scheduler_config = {
      "PreemptionConfig": {
        "SystemSchedulerEnabled": true,
        "BatchSchedulerEnabled": false,
        "ServiceSchedulerEnabled": true
      }
  }
}

It's a bit more generic and can handle future scheduler configurations and support more use cases (e.g. enabling batch and service preemption).

@@ -444,6 +444,9 @@ type ServerConfig struct {
// ServerJoin contains information that is used to attempt to join servers
ServerJoin *ServerJoin `hcl:"server_join"`

// SystemSchedulerPreemptionEnabledDefault is used to determin whether to enable system preemption by default in a new cluster
SystemSchedulerPreemptionEnabledDefault *bool `hcl:"system_scheduler_preemption_enabled_default"`
Copy link
Member

@schmichael schmichael Jan 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we added a Defaults struct for this field so we can reuse it for all API configured settings? I think we should provide file-configurable-defaults for all scheduler configuration parameters to better support Packer & TF users who don't want to have to run some commands to tweak their cluster post deployment.

Then hcl files baked into golden images would look something like:

server {
  enabled = true

 # ... a bunch of other stuff ...

  defaults {
    system_scheduler_preemption = false
  }
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, just noticed your comment above saying almost the exact same thing @notnoop. 😅

Another naming idea based off of yours:

server {
  enabled = true

 # ... a bunch of other stuff ...

  scheduler_defaults {
    system_job_preemption = false
  }
}

We can drop _enabled as other bool fields don't have it. Otherwise I think some sort of qualifier is necessary for "system" as shortening it to system_preemption may not be immediately obvious to users that its referring to system jobs since the word "system" is quite generic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree: a generic defaults make sense. Though, I would prefer to use the same json format or key names that the API accepts. Given that the API is the canonical way for configuration, I opted to use the same API terms as-is. Making them hcl-friendly risks hcl format being behind/divergent from the api format and complicates documentation/implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, we should probably follow the jobspec's lead here:

  1. The style should vary by format (underscores for HCL, Caps for JSON)
  2. Otherwise the names should match

@notnoop notnoop force-pushed the b-default-preemption-flag branch from 2d84f54 to 31025d6 Compare January 28, 2020 19:51
@schmichael
Copy link
Member

Don't forget docs and changelog (of course I only remembered those after hitting "Approved")

@notnoop
Copy link
Contributor Author

notnoop commented Jan 28, 2020

I'll follow up with docs in follow up PR.

@notnoop notnoop merged commit eb0acc3 into master Jan 28, 2020
@notnoop notnoop deleted the b-default-preemption-flag branch January 28, 2020 20:11
@tgross
Copy link
Member

tgross commented Feb 5, 2020

ref #7010 for docs

notnoop pushed a commit that referenced this pull request May 1, 2020
scheduler: allow configuring default preemption for system scheduler
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 18, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants