Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce guarantee memory to avoid going beyond the allocatable value #804

Merged
merged 1 commit into from
Nov 5, 2021

Conversation

damianavila
Copy link
Contributor

Each node has some allocatable resources [1]
If you try to schedule a pod with guarantee requirements above the
allocatable values, it will fail to be spawned.
With the "very large" option, we are requesting 110G at a minimum and that
is dangerously close to the theoretical allocatable memory for the
n1-standard-32 node. Then, let's give that guarantee value some
breath ;-)

[1] https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-architecture#memory_cpu

More details about the whole debugging lives on the corresponding freshdesk-support thread

Each node has some allocatable resources [1]
If you try to schedule a pod with guarantee requirements above the
allocatable values, it will fail to be spawned.
With the "very large" option, we are requesting 110G at minimum and that
is dangerously close the the theoretical allocatable memory for the
n1-standard-32 node. Then, let's give that guarantee value some
breath ;-)

[1] https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-architecture#memory_cpu
@damianavila
Copy link
Contributor Author

I have manually deployed this one to meom staging and it seems to work as intended!

Copy link
Member

@choldgraf choldgraf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm - I gave this a shot and ran into the pod didn't trigger scale-up issue :-/

also a couple quick comments!

config/hubs/meom-ige.cluster.yaml Show resolved Hide resolved
@@ -75,7 +75,7 @@ hubs:
description: "~32 CPU, ~128G RAM"
kubespawner_override:
mem_limit: 128G
mem_guarantee: 110G
mem_guarantee: 100G
node_selector:
node.kubernetes.io/instance-type: n1-standard-32
- display_name: "Huge"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there something similar we should do for "huge"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fetching info from the nodes:

small >> 5758056Ki >> 5.75 GB allocatable (mem_guarantee: 5GB)
medium  >> 27155328Ki >> 27.15 GB allocatable (mem_guarantee: 25GB)
large >> 56183024Ki >> 56.18 GB allocatable (mem_guarantee: 50GB)
very large >> 114336712Ki >> 114.33G allocatable (mem_guarantee: 100GB with this PR)
huge >> 235367296Ki >> 235.37GB allocatable (mem_guarantee: 220G)

I think we are OK with the huge one but happy to discuss all the other values 😉 .

@damianavila
Copy link
Contributor Author

damianavila commented Nov 4, 2021

hmmm - I gave this a shot and ran into the pod didn't trigger scale-up issue :-/

Did you try with staging, right? https://staging.meom-ige.2i2c.cloud/

@choldgraf
Copy link
Member

Confirmed that this does work with staging!

Copy link
Contributor

@consideRatio consideRatio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 For updating the description of in spawn options to say ~110 GB of memory, but this LGTM and can be merged.

@damianavila
Copy link
Contributor Author

I will merge it as is because this is what I validated on staging and open a follow-up issue to correct all the values in the profile descriptions (that actually needs better calculations first).

@damianavila damianavila merged commit 11278c2 into master Nov 5, 2021
@damianavila damianavila deleted the fix_meom_very_large branch November 5, 2021 19:06
@damianavila
Copy link
Contributor Author

Follow-up: #809

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants