Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cpu_hard_limit and cpu_cfs_period options #149

Merged
merged 3 commits into from
Feb 15, 2022

Conversation

optiz0r
Copy link
Contributor

@optiz0r optiz0r commented Jan 6, 2022

Nomad's docker driver allows having nomad set cpu resources as hard
limits rather than soft limits via the cpu_hard_limits driver config
option. This is useful to restrict how much CPU time a nomad job is
permitted to consume, even if the host is under utilised, or when
predicatibility is more important than speed.

Default behaviour is unchanged. When cpu_hard_limit is set to true
the podman container is started with the equivalent API options to the
podman cli --cpu-quota and --cpu-period.

Nomad currently provides a PercentTicks value which is used to convert
the cpu resource from the specification into a number of microseconds
of cputime the job is allowed to consume for each period interval of time.
Use of this value is discouraged ouside of the docker driver, however the
desired long term replacement is to expose cpu quota and cpu period as
options in the task resource block, and these are not yet currently exposed.
Since this driver emulates the docker interface where possible, I think it's
acceptable to use the PercentTicks value in spite of the warning to the contrary.

The cpu_cfs_period defaults to 100,000 (microseconds) in both the linux
kernel, and this driver when not specified by the user. User may override
this between 1,000 and 1,000,000. Any value less than 1,000 is silently
increased to 1,000 to prevent
the job failing to run with an Invalid Operation error.

The minimum value for the quota is also 1,000. Since this value is
computed by this driver, it is silently increased to 1,000 if the
computed value is lower. For jobs with very small cpu resources in
the job spec, which equate to less than 1% of total available cpu,
the job may consume more cpu time than intended.

Fixes #147

@optiz0r
Copy link
Contributor Author

optiz0r commented Jan 6, 2022

@towe75 Could I trouble you to approve the workflow so I can see if the tests pass here? (As noted in #148 I can't get the tests to run locally). Thanks!

@optiz0r optiz0r force-pushed the 147_cpu_hard_limits branch 4 times, most recently from 6ca5b47 to a1d19d3 Compare January 7, 2022 10:21
@optiz0r
Copy link
Contributor Author

optiz0r commented Jan 7, 2022

Lint issues fixed and force pushed.

@lgfa29
Copy link
Contributor

lgfa29 commented Jan 7, 2022

Thanks for the PR @optiz0r!

I thought that approving it once would do the trick, but GH requires approval at every commit (which makes sense but it's kind of annoying).

To help unblock you, I poked at #148 and was able to run the tests using the provided Vagrantfile in the repo, but I had to run some manual commands. I will post a comment there to keep the discussion separate.

@optiz0r
Copy link
Contributor Author

optiz0r commented Jan 7, 2022

Perhaps it's not just me - the tests are now failing on GHA with similar results as I was seeing locally.

I think the workflow will still require approval on each push until a change authored in my name is merged in.

@towe75
Copy link
Collaborator

towe75 commented Jan 29, 2022

@optiz0r please rebase to fix the tests.

Nomad's docker driver allows having nomad set cpu resources as hard
limits rather than soft limits via the `cpu_hard_limits` driver config
option. This is useful to restrict how much CPU time a nomad job is
permitted to consume, even if the host is under utilised, or when
predicatibility is more important than speed.

Default behaviour is unchanged. When `cpu_hard_limit` is set to `true`
the podman container is started with the equivalent API options to the
podman cli `--cpu-quota` and `--cpu-period`.

Nomad currently provides a `PercentTicks` value which is used to convert
the cpu resource from the specification into a number of microseconds
of cputime the job is allowed to consume for each `period` interval of time.
Use of this value is discouraged ouside of the docker driver, however the
desired long term replacement is to expose `cpu quota` and `cpu period` as
options in the task `resource` block, and these are not yet currently exposed.
Since this driver emulates the docker interface where possible, I think it's
acceptable to use the `PercentTicks` value in spite of the warning to the contrary.

The `cpu_cfs_period` defaults to 100,000 (microseconds) in both the linux
kernel, and this driver when not specified by the user. User may override
this between 1,000 and 1,000,000. Any value less than 1,000 is silently
increased to 1,000 to prevent
the job failing to run with an `Invalid Operation` error.

The minimum value for the quota is also 1,000. Since this value is
computed by this driver, it is silently increased to 1,000 if the
computed value is lower. For jobs with very small cpu resources in
the job spec, which equate to less than 1% of total available cpu,
the job may consume more cpu time than intended.
@optiz0r optiz0r force-pushed the 147_cpu_hard_limits branch from a1d19d3 to a793e7c Compare January 29, 2022 14:35
@optiz0r
Copy link
Contributor Author

optiz0r commented Feb 7, 2022

@towe75 Any other changes you want to see before merge?

@towe75
Copy link
Collaborator

towe75 commented Feb 7, 2022

@lgfa29 can you have a look, please?

Copy link
Contributor

@lgfa29 lgfa29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for delay on getting this review. I pushed small refactoring of setCPUResources (fewer arguments and less indentation) and also updated the changelog entry to fix a merge conflict and the PR link.

Thanks for the PR @optiz0r 🎉

@lgfa29 lgfa29 merged commit 96e41ef into hashicorp:main Feb 15, 2022
@optiz0r optiz0r deleted the 147_cpu_hard_limits branch February 15, 2022 22:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for CPU hard limits
3 participants