Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set limits and requests for exclusive pool containers #25

Closed
Levovar opened this issue Jul 26, 2019 · 6 comments · Fixed by #55
Closed

Set limits and requests for exclusive pool containers #25

Levovar opened this issue Jul 26, 2019 · 6 comments · Fixed by #55

Comments

@Levovar
Copy link
Collaborator

Levovar commented Jul 26, 2019

Similarly to shared pool Pods, we need to care about setting requests, and limits for exclusives.
Request: explicitly setting 0 is required, similarly to how it was done for shareds in #14

Limit: we should either set explicit 0 to avoid artificially limiting the share time of a core which is anyway physically isolated; or we need to set (number_of_exclusive_cores)X1000.
Reasoning for 0: even if the limit is actually higher than what is possible to get, it will still add a CFS quota to the container. There are anecdotal reports out there indicating just the presence of a CFS quota can negatively affect performance.
Reasoning for explicitly setting limit: there are some admission controllers in K8s which mandate a Pod to define requests, and limits. These mandated requests are ofc pointless in case of an exclusive user, but failure to comply results in failing Pod admission.
So, if the presence of CFS quotas do not affect performance, it is actually the safer option to set the limit!

@TimoLindqvist : WDYT Timo?

@TimoLindqvist
Copy link
Collaborator

I'm under the impression that CFS quota can affect performance / latency so we should configure it so that CFS quota is disbaled. Is it disabled if we set the CPU limit to zero ?

@Levovar
Copy link
Collaborator Author

Levovar commented Aug 27, 2019

K8s community actually proposed kernel patches to correct this issue :)
so I think we shouldn't disable CFS quotas. they should work much better from 4.14, and as intended in the latest versions.

@Levovar
Copy link
Collaborator Author

Levovar commented Aug 27, 2019

sorry, first improvement is available from 4.18:
torvalds/linux@512ac99#diff-1c5364196d98130348bddabaad0a701f

And this one should totally fix them:
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=de53fd7aedb100f03e5d2231cfce0e4993282425

BTW based on the description the CFS quota was only misbehaving for workloads which frequently idle, and only consume some slices.
our exclusive workloads are quite the opposite, they always use 100% of the core

@TimoLindqvist
Copy link
Collaborator

I'm still a bit against setting the limit to something else than 0 thus enabling the CFS quota. If container allocates exclusive core(s), it must have full access to the core(s). On the other hand it cannot use other cores so why is the quota needed ?

@Levovar
Copy link
Collaborator Author

Levovar commented Sep 10, 2019

On the other hand it cannot use other cores so why is the quota needed ?

purely for Kubernetes compatibility reasons I described in the opening post. If the user has LimitRanger admission plugins enabled in their cluster, instantiating Pods within their CPU-Pooler enhanced cluster will fail:
https://kubernetes.io/docs/concepts/policy/limit-range/#overview-of-limit-range

I think the same might be an issue when the operator would set some constraints in the Namespace object

@TimoLindqvist
Copy link
Collaborator

We can add the limit and requests to exclusive containers to avoid failures in pod instantiation but would it be ok to set the limit to zero ?

Levovar added a commit that referenced this issue Jan 6, 2021
This commit solves Issue #25.
When a container is using shared pool resources, the CFS quota is set to its limit value
With exclusive users it is set to the total amount of all exclusive cores * 1000
When both are requested the overall quota is set to exclusive*1000 + 1.2*shared
In this hybrid scenario we leave a 20% safety margin on top of the originally requested shared resoruces,
  to avoid accidentally throttling the higher prio exclusive thread when the lower prio shared threads are overloaded.
Levovar added a commit that referenced this issue Jan 7, 2021
This commit solves Issue #25.
When a container is using shared pool resources, the CFS quota is set to its limit value
With exclusive users it is set to the total amount of all exclusive cores * 1000
When both are requested the overall quota is set to exclusive*1000 + 1.2*shared
In this hybrid scenario we leave a 20% safety margin on top of the originally requested shared resoruces,
  to avoid accidentally throttling the higher prio exclusive thread when the lower prio shared threads are overloaded.
Levovar added a commit that referenced this issue Jan 12, 2021
This commit solves Issue #25.
When a container is using shared pool resources, the CFS quota is set to its limit value
With exclusive users it is set to the total amount of all exclusive cores * 1000 + 100
  (constant 100 is added to avoid activating throttling mechanisms near 100% utilization)
When both are requested the overall quota is set to exclusive*1000 + 1.2*shared
In this hybrid scenario we leave a 20% safety margin on top of the originally requested shared resoruces,
  to avoid accidentally throttling the higher prio exclusive thread when the lower prio shared threads are overloaded.
balintTobik pushed a commit to balintTobik/CPU-Pooler that referenced this issue Jul 5, 2021
This commit solves Issue nokia#25.
When a container is using shared pool resources, the CFS quota is set to its limit value
With exclusive users it is set to the total amount of all exclusive cores * 1000 + 100
  (constant 100 is added to avoid activating throttling mechanisms near 100% utilization)
When both are requested the overall quota is set to exclusive*1000 + 1.2*shared
In this hybrid scenario we leave a 20% safety margin on top of the originally requested shared resoruces,
  to avoid accidentally throttling the higher prio exclusive thread when the lower prio shared threads are overloaded.
nxsre pushed a commit to nxsre/CPU-Pooler that referenced this issue Apr 14, 2024
This commit solves Issue nokia#25.
When a container is using shared pool resources, the CFS quota is set to its limit value
With exclusive users it is set to the total amount of all exclusive cores * 1000 + 100
  (constant 100 is added to avoid activating throttling mechanisms near 100% utilization)
When both are requested the overall quota is set to exclusive*1000 + 1.2*shared
In this hybrid scenario we leave a 20% safety margin on top of the originally requested shared resoruces,
  to avoid accidentally throttling the higher prio exclusive thread when the lower prio shared threads are overloaded.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants