-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tweak some of the neurohackademy hub resources #1554
Conversation
Support and Staging deployments
Production deployments
|
memory: | ||
guarantee: 6G | ||
limit: 8G |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that the current machine choice is n1-highmem-4, 4 CPU and 26 GB memory. The machine type used historically was m1-ultamem-40 with 40 CPU and 961 GB memory.
We planned to support up to 24 GB of memory use, but that was too much and not needed. The user with most memory used was ~6 GB. Assuming that, I think it can make sense to plan for that each user has on average 3 GB of memory and limit them at 6 GB.
Currently, if 6GB memory is guaranteed, and the machines has ~26 GB memory, four users would fit on each machine. The autoscaling limits to 10 machines, so a total of ~40 users would be supported in this configuration.
Something should change, exactly how is not obvious but I'd suggest using bigger machines than 4 core, such as n1-highmem-16 with 16 CPU and 104 GB RAM. If we have 30 users with 3GB ram each on average, then ~4 such machines would cover the 120 users and there would be room for more and a possibility to scale up and down a bit.
Concrete suggestion:
-
Use n1-highmem-16 machines
-
Limit memory to for example 8 GB, and guarantee 3 GB of memory per user, which makes the n1-highmem-16 machine end up at ~104/3 = 34 users per machine.
-
Use very high limits on the CPU, perhaps 50% - 100% of the machines total CPU, to make sure its used properly and without drawbacks I can think of. Guarantee CPU to be something low, such as 0.1 CPU as its not important as long as the number of users are capped based on the memory guarantee already.
I think overall, using a few larger machines, where many (30-100) users fit on a single machine, makes better use of the machines CPU per user than if a smaller machine houses a smaller amount of users, due to how intermittently people actually use the CPU. Also less overhead is needed as the deviation from the mean average RAM usage is less with more users.
Its tricky to come up with sensible estimates about this overall, but I think the above should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for these suggestions, @consideRatio. I've implemented all of these, with some minor tweaks. I do think it's important to set non-trivial CPU guarantees - otherwise it only takes two users on the same node using upto their CPU limit to make sure all other users basically get just their CPU guarantee level of CPU. So i've set it to 0.5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we document that recommendation somewhere in our infra docs?
terraform plan is:
|
- Use n1-highmem-16 nodes - Provide 8GB limit but 4GB requests, so everyone is guaranteed at least 4GB - Set a high CPU limit and a low CPU request. The request will make sure that everyone gets at least that much CPU - it only takes two users on the same VM to eat up all CPU if we don't set these requests
7f641ed
to
46442c4
Compare
🎉🎉🎉🎉 Monitor the deployment of the hubs here 👉 https://github.com/2i2c-org/infrastructure/actions/workflows/deploy-hubs.yaml?query=branch%3Amaster |
Related to #1532