Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make cpu.cfs_period_us configurable #4456

Closed
omame opened this issue Jun 28, 2018 · 3 comments
Closed

Make cpu.cfs_period_us configurable #4456

omame opened this issue Jun 28, 2018 · 3 comments

Comments

@omame
Copy link
Contributor

omame commented Jun 28, 2018

Nomad version

Nomad v0.8.1 (46aa11b)

Operating system and Environment details

Linux 4.4.0-111-generic x86_64 - Ubuntu 14.04

Issue

In #3825 and #3915 cpu hard limits have been introduced in the Docker driver. This is working great (thanks @jaininshah9!) but the inability to configure cpu.cfs_period_us is causing some bizarre behaviour.

When a task has very short cpu spikes (in the order of a fraction of a tenth of a second) then it gets constantly throttled and sometimes this inhibits it from even starting before the timeout triggers. If it manages to boot, then the task gets constantly throttled even if the cpu usage is very low.
The solution for this is relatively easy: increase cpu.cfs_period_us and set cpu.cfs_quota_us proportionally for the cgroup. This allows the short cpu spike to complete while leaving enough cpu time for everything else.

Ideally, I'd like to have an option in the Docker driver to set cpu.cfs_period_us for the task's cgroup.

Reproduction steps

Hard to tell: it depends on the application being contained. We may have found that an Elixir vm shows this behaviour but we don't have enough data to prove this yet.

@jippi
Copy link
Contributor

jippi commented Jul 2, 2018

If it manages to boot, then the task gets constantly throttled even if the cpu usage is very low.

why is that ? my understanding would be that it should "refill" its quota and eventually stop throttling if it uses well below its hard limit?

@omame
Copy link
Contributor Author

omame commented Jul 2, 2018

But then you'd be wasting resources to overcome an inefficient configuration of the cgroup.
Not to mention that after the startup, the process will ideally (at least in our scenarios) start serving traffic, which would add more pressure to the resource quota. Eventually this may even lead to the task becoming unhealthy and getting killed.

We've seen throttling constantly happening because the process's load spikes lasted more than the cpu.cfs_period_us period, which right now is static to 100 ms. Expanding the window beyond 300 ms eliminates throttling in our scenarios entirely.
Consider a task spike that uses 100% of cpu for 30 ms. Setting the quota/period to 20/100 ms will make task throttle for 10 ms and complete in the next period, roughly in 110 ms. Bumping the quota/period to -say- 200/1000 ms will not throttle this task at all, and it'll complete in 30 ms. All this while keeping the same quota/period ratio.

@omame omame changed the title Make cpu.cfs_period_us and cpu.cfs_quota_us configurable Make cpu.cfs_period_us configurable Jul 3, 2018
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants