-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add documentation on handling node resources #2992
Add documentation on handling node resources #2992
Conversation
Hi @itskingori. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
I actually think this is great, and a really wonderful explanation. The only thing I would love to see is some sort of "rule of thumb", i.e. if you're setting a value and it works, say "Users have reported good results setting X,Y,Z". The idea being that first of all we can gradually promote that to a recommendation, and then to an automated setting. But also we expect users will test various values, and - critically - will report if X is not right for them. If we don't pick a value we won't get those reports :-) |
docs/node_resource_handling.md
Outdated
|
||
### Understanding Node Resources | ||
|
||
Each node in a cluster has resources available to it and pods scheduled to run on the node may or may not have resource requests or limits set on them. Kubernetes schedules pods on nodes that have resources that satisfy the pod's specified requirements. As a result, pods are sort of [bin-packed][4] onto the nodes in a best effort attempt to utilize as much of the resources available with as few nodes as possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may or may not have resource requests or limits set on them
you probably want to say something like ...(although for a reliable cluster you really should set limits)
Also, nit-picking, but inserting line-breaks makes it easier, especially for github commenting :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe instead of the fuzzy "As a result, pods are sort of [bin-packed]..." we say "Broadly, pods are [bin-packed]...` ? But just a style thing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@justinsb About your feedback ...
you probably want to say something like
...(although for a reliable cluster you really should set limits)
I see you've seen that I addressed that in the list at the bottom. 😄
inserting line-breaks makes it easier, especially for github commenting :-)
Addressed in 1435965.
Maybe instead of the fuzzy ...
Addressed in 1bd329a.
docs/node_resource_handling.md
Outdated
But, it seems fitting to recommend the following: | ||
|
||
1. Always set requests with some breathing room – do not set requests to match your application's resource profile during idle time too closely. | ||
2. Always set limits – so that your application doesn't hog all the memory on a node during a spike. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah - I see you got the "set limits" advice here :-)
/ok-to-test @itskingori I'm happy to merge this as-is. I didn't know whether you wanted to add a straw-man setting though, in the hopes of spurring some discussion! |
Line-breaks makes it easier for reviews, especially Github commenting.
@justinsb About #2992 (comment) ... I see your point. I've been a bit swamped on my end so haven't had the time to test the reserved resources feature (but I plan to). And that might be a while ... 😅 ... but I always come back with some feedback/reports if our previous interactions are anything to go by. I can give some numbers/recommendations then? |
That's great @itskingori - thank you so much for this, and if you want to come back with specific numbers, that's great too :-) /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: itskingori, justinsb The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue |
At a minimum, this is meant to give more context on why the feature in #2982 was added and attempts to give some recommendations of what to consider when evaluating node system resources.
I hope this spurs some discussion and that the recommendations I make maybe be assessed further. For example ... in one of the links I referenced, we're advised to set
system-reserved
only if we know what we are doing (which I can't say I do 💯% ... 🤷♂️) and we're even warned to only set it if you really need to.