Add cpu_hard_limit and cpu_cfs_period options #149

optiz0r · 2022-01-06T16:33:46Z

Nomad's docker driver allows having nomad set cpu resources as hard
limits rather than soft limits via the cpu_hard_limits driver config
option. This is useful to restrict how much CPU time a nomad job is
permitted to consume, even if the host is under utilised, or when
predicatibility is more important than speed.

Default behaviour is unchanged. When cpu_hard_limit is set to true
the podman container is started with the equivalent API options to the
podman cli --cpu-quota and --cpu-period.

Nomad currently provides a PercentTicks value which is used to convert
the cpu resource from the specification into a number of microseconds
of cputime the job is allowed to consume for each period interval of time.
Use of this value is discouraged ouside of the docker driver, however the
desired long term replacement is to expose cpu quota and cpu period as
options in the task resource block, and these are not yet currently exposed.
Since this driver emulates the docker interface where possible, I think it's
acceptable to use the PercentTicks value in spite of the warning to the contrary.

The cpu_cfs_period defaults to 100,000 (microseconds) in both the linux
kernel, and this driver when not specified by the user. User may override
this between 1,000 and 1,000,000. Any value less than 1,000 is silently
increased to 1,000 to prevent
the job failing to run with an Invalid Operation error.

The minimum value for the quota is also 1,000. Since this value is
computed by this driver, it is silently increased to 1,000 if the
computed value is lower. For jobs with very small cpu resources in
the job spec, which equate to less than 1% of total available cpu,
the job may consume more cpu time than intended.

Fixes #147

optiz0r · 2022-01-06T16:34:54Z

@towe75 Could I trouble you to approve the workflow so I can see if the tests pass here? (As noted in #148 I can't get the tests to run locally). Thanks!

optiz0r · 2022-01-07T10:22:07Z

Lint issues fixed and force pushed.

lgfa29 · 2022-01-07T16:39:33Z

Thanks for the PR @optiz0r!

I thought that approving it once would do the trick, but GH requires approval at every commit (which makes sense but it's kind of annoying).

To help unblock you, I poked at #148 and was able to run the tests using the provided Vagrantfile in the repo, but I had to run some manual commands. I will post a comment there to keep the discussion separate.

optiz0r · 2022-01-07T17:28:44Z

Perhaps it's not just me - the tests are now failing on GHA with similar results as I was seeing locally.

I think the workflow will still require approval on each push until a change authored in my name is merged in.

towe75 · 2022-01-29T13:08:52Z

@optiz0r please rebase to fix the tests.

Nomad's docker driver allows having nomad set cpu resources as hard limits rather than soft limits via the `cpu_hard_limits` driver config option. This is useful to restrict how much CPU time a nomad job is permitted to consume, even if the host is under utilised, or when predicatibility is more important than speed. Default behaviour is unchanged. When `cpu_hard_limit` is set to `true` the podman container is started with the equivalent API options to the podman cli `--cpu-quota` and `--cpu-period`. Nomad currently provides a `PercentTicks` value which is used to convert the cpu resource from the specification into a number of microseconds of cputime the job is allowed to consume for each `period` interval of time. Use of this value is discouraged ouside of the docker driver, however the desired long term replacement is to expose `cpu quota` and `cpu period` as options in the task `resource` block, and these are not yet currently exposed. Since this driver emulates the docker interface where possible, I think it's acceptable to use the `PercentTicks` value in spite of the warning to the contrary. The `cpu_cfs_period` defaults to 100,000 (microseconds) in both the linux kernel, and this driver when not specified by the user. User may override this between 1,000 and 1,000,000. Any value less than 1,000 is silently increased to 1,000 to prevent the job failing to run with an `Invalid Operation` error. The minimum value for the quota is also 1,000. Since this value is computed by this driver, it is silently increased to 1,000 if the computed value is lower. For jobs with very small cpu resources in the job spec, which equate to less than 1% of total available cpu, the job may consume more cpu time than intended.

optiz0r · 2022-02-07T16:35:53Z

@towe75 Any other changes you want to see before merge?

towe75 · 2022-02-07T20:54:04Z

@lgfa29 can you have a look, please?

lgfa29

Sorry for delay on getting this review. I pushed small refactoring of setCPUResources (fewer arguments and less indentation) and also updated the changelog entry to fix a merge conflict and the PR link.

Thanks for the PR @optiz0r 🎉

optiz0r requested review from jazzyfresh, jrasell, lgfa29 and towe75 as code owners January 6, 2022 16:33

optiz0r force-pushed the 147_cpu_hard_limits branch 4 times, most recently from 6ca5b47 to a1d19d3 Compare January 7, 2022 10:21

lgfa29 mentioned this pull request Jan 13, 2022

config: map host devices into container #129

Merged

optiz0r force-pushed the 147_cpu_hard_limits branch from a1d19d3 to a793e7c Compare January 29, 2022 14:35

lgfa29 added 2 commits February 15, 2022 17:20

minor refactoring of setCPUResources

dea0de0

Merge remote-tracking branch 'origin/main' into 147_cpu_hard_limits

5a99650

lgfa29 approved these changes Feb 15, 2022

View reviewed changes

lgfa29 merged commit 96e41ef into hashicorp:main Feb 15, 2022

optiz0r deleted the 147_cpu_hard_limits branch February 15, 2022 22:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cpu_hard_limit and cpu_cfs_period options #149

Add cpu_hard_limit and cpu_cfs_period options #149

optiz0r commented Jan 6, 2022

optiz0r commented Jan 6, 2022

optiz0r commented Jan 7, 2022

lgfa29 commented Jan 7, 2022

optiz0r commented Jan 7, 2022

towe75 commented Jan 29, 2022

optiz0r commented Feb 7, 2022

towe75 commented Feb 7, 2022

lgfa29 left a comment

Add cpu_hard_limit and cpu_cfs_period options #149

Add cpu_hard_limit and cpu_cfs_period options #149

Conversation

optiz0r commented Jan 6, 2022

optiz0r commented Jan 6, 2022

optiz0r commented Jan 7, 2022

lgfa29 commented Jan 7, 2022

optiz0r commented Jan 7, 2022

towe75 commented Jan 29, 2022

optiz0r commented Feb 7, 2022

towe75 commented Feb 7, 2022

lgfa29 left a comment

Choose a reason for hiding this comment