Reduce guarantee memory to avoid going beyond the allocatable value #804

damianavila · 2021-11-04T23:01:02Z

Each node has some allocatable resources [1]
If you try to schedule a pod with guarantee requirements above the
allocatable values, it will fail to be spawned.
With the "very large" option, we are requesting 110G at a minimum and that
is dangerously close to the theoretical allocatable memory for the
n1-standard-32 node. Then, let's give that guarantee value some
breath ;-)

[1] https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-architecture#memory_cpu

More details about the whole debugging lives on the corresponding freshdesk-support thread

Each node has some allocatable resources [1] If you try to schedule a pod with guarantee requirements above the allocatable values, it will fail to be spawned. With the "very large" option, we are requesting 110G at minimum and that is dangerously close the the theoretical allocatable memory for the n1-standard-32 node. Then, let's give that guarantee value some breath ;-) [1] https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-architecture#memory_cpu

damianavila · 2021-11-04T23:02:27Z

I have manually deployed this one to meom staging and it seems to work as intended!

choldgraf

hmmm - I gave this a shot and ran into the pod didn't trigger scale-up issue :-/

also a couple quick comments!

config/hubs/meom-ige.cluster.yaml

choldgraf · 2021-11-04T23:27:12Z

config/hubs/meom-ige.cluster.yaml

@@ -75,7 +75,7 @@ hubs:
                description: "~32 CPU, ~128G RAM"
                kubespawner_override:
                  mem_limit: 128G
-                  mem_guarantee: 110G
+                  mem_guarantee: 100G
                  node_selector:
                    node.kubernetes.io/instance-type: n1-standard-32
              - display_name: "Huge"


is there something similar we should do for "huge"?

Fetching info from the nodes:

small >> 5758056Ki >> 5.75 GB allocatable (mem_guarantee: 5GB) medium >> 27155328Ki >> 27.15 GB allocatable (mem_guarantee: 25GB) large >> 56183024Ki >> 56.18 GB allocatable (mem_guarantee: 50GB) very large >> 114336712Ki >> 114.33G allocatable (mem_guarantee: 100GB with this PR) huge >> 235367296Ki >> 235.37GB allocatable (mem_guarantee: 220G)

I think we are OK with the huge one but happy to discuss all the other values 😉 .

damianavila · 2021-11-04T23:47:14Z

hmmm - I gave this a shot and ran into the pod didn't trigger scale-up issue :-/

Did you try with staging, right? https://staging.meom-ige.2i2c.cloud/

choldgraf · 2021-11-05T12:12:43Z

Confirmed that this does work with staging!

consideRatio

+1 For updating the description of in spawn options to say ~110 GB of memory, but this LGTM and can be merged.

damianavila · 2021-11-05T19:05:28Z

I will merge it as is because this is what I validated on staging and open a follow-up issue to correct all the values in the profile descriptions (that actually needs better calculations first).

damianavila · 2021-11-05T19:13:47Z

Follow-up: #809

choldgraf reviewed Nov 4, 2021

View reviewed changes

damianavila requested review from consideRatio and yuvipanda November 5, 2021 11:14

consideRatio approved these changes Nov 5, 2021

View reviewed changes

consideRatio mentioned this pull request Nov 5, 2021

[Incident] Grenoble hub (meom-ige dedicated k8s cluster): failure to spawn "very large" or "huge" servers #806

Closed

6 tasks

damianavila merged commit 11278c2 into master Nov 5, 2021

damianavila deleted the fix_meom_very_large branch November 5, 2021 19:06

damianavila mentioned this pull request Nov 5, 2021

Correct profile RAM values on the meom hub #809

Closed

damianavila self-assigned this Nov 5, 2021

damianavila mentioned this pull request Nov 8, 2021

Team Sync - Monday, November 8th 2i2c-org/team-compass#288

Closed

sgibson91 mentioned this pull request Dec 7, 2021

New Hub: CarbonPlan / Azure #800

Closed

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce guarantee memory to avoid going beyond the allocatable value #804

Reduce guarantee memory to avoid going beyond the allocatable value #804

damianavila commented Nov 4, 2021

damianavila commented Nov 4, 2021

choldgraf left a comment

choldgraf Nov 4, 2021

damianavila Nov 4, 2021

damianavila commented Nov 4, 2021 •

edited

Loading

choldgraf commented Nov 5, 2021

consideRatio left a comment •

edited

Loading

damianavila commented Nov 5, 2021

damianavila commented Nov 5, 2021

Reduce guarantee memory to avoid going beyond the allocatable value #804

Reduce guarantee memory to avoid going beyond the allocatable value #804

Conversation

damianavila commented Nov 4, 2021

damianavila commented Nov 4, 2021

choldgraf left a comment

Choose a reason for hiding this comment

choldgraf Nov 4, 2021

Choose a reason for hiding this comment

damianavila Nov 4, 2021

Choose a reason for hiding this comment

damianavila commented Nov 4, 2021 • edited Loading

choldgraf commented Nov 5, 2021

consideRatio left a comment • edited Loading

Choose a reason for hiding this comment

damianavila commented Nov 5, 2021

damianavila commented Nov 5, 2021

damianavila commented Nov 4, 2021 •

edited

Loading

consideRatio left a comment •

edited

Loading