-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[INCREASED HUB ACTIVITY] 2i2c climatematch #2753
Comments
I'll handle the increase in nodes for Wednesday. |
Thank you @pnasrat! |
Looking at the climate match configuration. kubespawner allocates notebook servers with a limit of 7G and guarantee of 5G onto a pool of n1-highmem-2 2vcPUs 13G, so we can only fit 2 users per node at guarantee but if both burst memory it'll cause a reschedule. That seems low. I'm wondering why this isn't larger nodes more densely packed with some overhead capacity 700 active users at peak would need 700 nodes which isn't ideal in my mind. . A user profile isn't suitable for this setup as it's a class. CC @consideRatio for thoughts on sizing |
Created initial pull request in #2757 to handle the instructor case but we should resolve the plan for actual node sizing for next week and the event itself. |
Legacy reasons I presume. When there are 100+ users, having at minimum 10 users per node is essential to not run out of quotas of node disks or public IPs and avoid startup times I'd say.
I'm approving #2757, but like you I think there is a lot of room for improvements. I'd say that we should go for n2-highmem-16 machines with 128 GB RAM if the requests/limits are 5/7GB. |
@pnasrat @consideRatio I've documented that here now: #2765. I agree we should make the node bigger now. |
Note they have also requested notebook memory to be 16G @yuvipanda you mentioned quota issues for using n2 machines yesterday do you have more specific information on that? 128/16 minus some overhead means about 7 notebooks for nodes so for concurrent 700 users we'd need 100 nodes for a synchronous workshop It's unclear if 700 active users is sustained load or it'll be more intermittent usage of 700 unique users. |
I've created #2775 with n1-highmem-32 (which would support 10 user servers per node, happy for feedback on machine type. I've also communicated with the community via support that we'll be changing the pool. Once that's done I'll increase the memory request to 16G as requested |
I requested a quota increase for n2 machines, if that arrives, then I suggest use of A less impactful idea in my mind is to have affinity towards choosing subset of all the available |
Approved! Apparently the same minute it was received. For reference about quotas, the reason we had to request it is related to this project was relatively old. Newer project was initialized with a non-0 quota. For example a new project i created had 300 n2 and n1 quota in a few regions, but had 600 n1 quota in a few as well. |
Recommendation for early morning America/New_York TZ on 2023-07-17 to grow the min pool to 10, I'll be on support so I can take that. |
@damianavila assigning to you for potential end of event eng work after August 1 |
Given that we are past August 1st, we should probably go back to the previous state although it is not clear to me what that previous state would look like. It seems we had these PRs related to the increased activity:
It would be nice to get some agreement from @2i2c-org/engineering about the state we want to achieve before @GeorgianaElena works on it (btw, Georgiana, if you are opinionated about this one, feel free to move forward when you have some time). |
@damianavila, Also, we might want to also double check with them the decomission date, which seems to be in two weeks per https://github.com/2i2c-org/leads/issues/110 |
Ok, so I've been going through tickets and PRs related to this event and my opinion about what the next steps should be is that we should:
Note that there's also an ongoing support discussion about enabling other features on this hub. I will close this issue now and ping partnerships about this lead in https://github.com/2i2c-org/leads/issues/110. Anyone on @2i2c-org/engineering , please feel free to re-open if you don't agree with the conclusion about keeping the current state of the infra for this hub. |
Awesome, @GeorgianaElena, I concur with your opinions and actions so far. Thanks!! |
Summary
The Climatematch community is expecting an increased hub activity starting with Wednesday this week until August 1st.
Event Info
There are multiple triggers for the increased hub activity as specified in the https://2i2c.freshdesk.com/a/tickets/804 ticket:
Hub info
The text was updated successfully, but these errors were encountered: