-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a memory and cpu limits on deployments #20193
Conversation
This also doesn't handle updating the value if the settings change. Right now we don't edit the deployment because it's hooked into the worker start. Not sure if that's something we also want for the first pass on this. |
Hmmm, weird,
The only info in the BZ: https://bugzilla.redhat.com/show_bug.cgi?id=MANAGEIQ-5096 was "To avoid CPU Alarms, throttle getting units of work from MiqQueue based on configurable CPU threshold" It would skip delivering queue messages in the generic worker if it reached 90% CPU usage. I believe this is when workers polled the queue on their own. Looking to see why it was changed to 0 and 100%.... 👀 |
It was changed to 99% in 326ea6579ab573e4dd5d3288fb6a9baf0962e502 for: then to 100% along with other settings in 15de028089b8661077a8ede57091883e90e635b6 for https://bugzilla.redhat.com/show_bug.cgi?id=MANAGEIQ-5749 (oh, that was me... but they said to do it) 2e59d858b5646b9229fecf54ebce6a2a532038a4 changed event handler to have 0 threshold (effectively skipping checks) So yes, it's effectively disabled right now. Note, it's only checked (and always skipped because of 0 or 100 setting) in queue workers. We can enable it or change it if we need to repurpose it for pods. |
The other "correct" answer is to delete it and add it if and when we want something like it. ✂️ 🗑️ 🚽 Let me know if you agree on the "correct" answer... |
I guess I want to keep it, but it will only do anything in pods as opposed to memory which will be enforced everywhere. So my question now is if I change it to use "milli-cores" as kubernetes does, should I change the name to something pods specific or keep it matching with memory even though we'll probably never enforce it on the appliance? |
f6f5d60
to
f2d2bdc
Compare
So this now adds a CPU and memory limit to all the workers, but unfortunately this makes the requests default to the limits. ref This means that by default you would need a cluster with at least 6 cores and 6Gib of memory to even schedule the default set of workers, nevermind adding provider workers or scaling. This also doesn't include the case where you set requests on the rest of the pods in the project. When we deploy the non-worker containers (postgres, httpd, orchestrator, etc.) we default to no limits/requests, but they can be easily set in the CR at deploy time. I'm not sure how we would make a similar change for the workers. I don't think it's a good experience to have to deploy the whole app then have to go into advanced settings to mess about with resource constraints. Maybe we have a single flag that tells the orchestrator to either add the constraints or not? Then that can be set in settings at deploy time? |
af8f9d3
to
ff24b2c
Compare
Settled on adding yet another flag to disable this by default. It can be enabled in ManageIQ/manageiq-pods#532 in the CR at deploy time. |
@carbonin systemd does have a CPUQuota ref which takes a percentage the unit is allowed to use on a single CPU so it might be possible to enforce this on the appliance as well in the future. We should be able to convert between millicore and pct (I think 500 millicore = 0.5 cpu = 50% of a single cpu). |
Yup, that makes sense @agrare. I guess I meant that we won't be enforcing it in the server process. Pushing the resource monitoring to systemd makes a lot of sense. |
I also have some pseudo-real-world data that might help with better values for cpu usage: This env has ~125 real VMware VMs, 1000 simulated VMware VMs (vcsim) and ~250 AWS instances, but isn't doing any app side work (reporting, provisioning, automate, etc). Based on that a half core seems reasonable for things like the generic worker, but maybe we can tone it down for refresh workers? @agrare what do you think? |
I can't help but see the orchestrator taking up 1/6 of the memory and roughly 1/6 or 1/7 of the CPU for the project. ✨ Shiny object. |
This is based on the worker's memory_threshold setting Fixes ManageIQ#20163
This was only ever set to either 100% or 0% which means it was never used in either case.
For now, we'll start with half a core for all the worker types we can change this for individual workers as we get more data.
This setting will be set by the orchestrator startup process which is in turn driven by a parameter in the CR. The default for this is false in the operator so will also be false in the settings. Fixes ManageIQ#20163
07c5295
to
b06cb7d
Compare
Checked commits carbonin/manageiq@ec269f7~...b06cb7d with ruby 2.5.7, rubocop 0.69.0, haml-lint 0.28.0, and yamllint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, will merge when green.
hmm, @carbonin failed with
seems unrelated |
Yeah that's me, I'm on it |
#20330 should resolve that |
Merged, sorry for the interruption |
Merging, it's enforce_resource_constraints: false to start so it should be not turned on yet... ManageIQ/manageiq-pods#532 will enable and use it. |
Add a memory and cpu limits on deployments (cherry picked from commit 7c7d7f7)
Jansa backport details:
|
This is based on the worker's memory_threshold setting
Fixes #20163
This really needs to be added to, but I'm not sure where to get the other settings from.
We have something that looks like CPU settings already, but they look either broken or unused as the only values we have are 0 or 100 percent.
I'm also not sure how to handle requests, these are used for scheduling pods on kubernetes nodes. For the rest of the components in github.com/manageiq/manageiq-pods I don't set a request by default so that the containers will be schedule-able out of the box even on small environments like minikube.
In this repo we would need to expose settings for the requests which would be 0 (or some other flavor of "unset") by default to have similar behavior.