Make cpu.cfs_period_us configurable #4456

omame · 2018-06-28T09:15:10Z

Nomad version

Nomad v0.8.1 (46aa11b)

Operating system and Environment details

Linux 4.4.0-111-generic x86_64 - Ubuntu 14.04

Issue

In #3825 and #3915 cpu hard limits have been introduced in the Docker driver. This is working great (thanks @jaininshah9!) but the inability to configure cpu.cfs_period_us is causing some bizarre behaviour.

When a task has very short cpu spikes (in the order of a fraction of a tenth of a second) then it gets constantly throttled and sometimes this inhibits it from even starting before the timeout triggers. If it manages to boot, then the task gets constantly throttled even if the cpu usage is very low.
The solution for this is relatively easy: increase cpu.cfs_period_us and set cpu.cfs_quota_us proportionally for the cgroup. This allows the short cpu spike to complete while leaving enough cpu time for everything else.

Ideally, I'd like to have an option in the Docker driver to set cpu.cfs_period_us for the task's cgroup.

Reproduction steps

Hard to tell: it depends on the application being contained. We may have found that an Elixir vm shows this behaviour but we don't have enough data to prove this yet.

The text was updated successfully, but these errors were encountered:

jippi · 2018-07-02T16:03:09Z

If it manages to boot, then the task gets constantly throttled even if the cpu usage is very low.

why is that ? my understanding would be that it should "refill" its quota and eventually stop throttling if it uses well below its hard limit?

omame · 2018-07-02T20:51:53Z

But then you'd be wasting resources to overcome an inefficient configuration of the cgroup.
Not to mention that after the startup, the process will ideally (at least in our scenarios) start serving traffic, which would add more pressure to the resource quota. Eventually this may even lead to the task becoming unhealthy and getting killed.

We've seen throttling constantly happening because the process's load spikes lasted more than the cpu.cfs_period_us period, which right now is static to 100 ms. Expanding the window beyond 300 ms eliminates throttling in our scenarios entirely.
Consider a task spike that uses 100% of cpu for 30 ms. Setting the quota/period to 20/100 ms will make task throttle for 10 ms and complete in the next period, roughly in 110 ms. Bumping the quota/period to -say- 200/1000 ms will not throttle this task at all, and it'll complete in 30 ms. All this while keeping the same quota/period ratio.

github-actions · 2022-11-29T02:18:18Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

chelseakomlo added type/enhancement theme/driver/docker labels Jun 28, 2018

omame mentioned this issue Jul 2, 2018

Add support for specifying cpu_cfs_period in the Docker driver #4462

Merged

omame changed the title ~~Make cpu.cfs_period_us and cpu.cfs_quota_us configurable~~ Make cpu.cfs_period_us configurable Jul 3, 2018

schmichael closed this as completed in #4462 Jul 25, 2018

github-actions bot locked as resolved and limited conversation to collaborators Nov 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make cpu.cfs_period_us configurable #4456

Make cpu.cfs_period_us configurable #4456

omame commented Jun 28, 2018 •

edited

Loading

jippi commented Jul 2, 2018

omame commented Jul 2, 2018

github-actions bot commented Nov 29, 2022

Make cpu.cfs_period_us configurable #4456

Make cpu.cfs_period_us configurable #4456

Comments

omame commented Jun 28, 2018 • edited Loading

Nomad version

Operating system and Environment details

Issue

Reproduction steps

jippi commented Jul 2, 2018

omame commented Jul 2, 2018

github-actions bot commented Nov 29, 2022

omame commented Jun 28, 2018 •

edited

Loading