-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New Hub] University of Washington - NASA SnowEx Hackweek 2022 #1309
Comments
I've updated the issue with the TODOs that need to happen. I'm going to use the existing uwhackweeks infra to prototype GPU support as well the scratch bucket support. |
I've asked for an increase in quota on the uwhackweeks AWS account for GPU instances, and can proceed onces that comes through. I've also asked @scottyhq for credits voucher to kickstart a new account. |
We're going to use a new account so that costs for this are separate from costs for the current uwhackweeks setup |
Can we list here the specific quotas you requested an increase for (so we can later document the specific requirement and process)? Thanks! |
- Document howto set up GPUs on AWS - Temporarily add a GPU profile to the uwhackweeks hub, until we setup an account for the snowex hackweek Ref 2i2c-org#1309
@damianavila documented as part of #1314 |
- Document howto set up GPUs on AWS - Temporarily add a GPU profile to the uwhackweeks hub, until we setup an account for the snowex hackweek Ref 2i2c-org#1309
- Bump up AWS provider version, as there had been a few deprecations in the IAM resources - Mimic the GCP setup as much as possible Ref 2i2c-org#1309
- Bump up AWS provider version, as there had been a few deprecations in the IAM resources - Mimic the GCP setup as much as possible - Write some docs on how to enable this for S3 - Setup IRSA with Terraform so users can be granted IAM roles - Use the uwhackweeks to test Ref 2i2c-org#1309
- Bump up AWS provider version, as there had been a few deprecations in the IAM resources - Mimic the GCP setup as much as possible - Write some docs on how to enable this for S3 - Setup IRSA with Terraform so users can be granted IAM roles - Use the uwhackweeks to test Ref 2i2c-org#1309
Hey all, thanks for getting this going! I think we're set now with credits for a new account for this. A couple more details and questions below:
For the URL, we currently have For "Community representatives" we can add @jomey! who is going to be the main point of contact during the week of the event. |
@scottyhq ah, so how about this:
Does this sound acceptable to you? |
Sounds great with the condition that we do this transition after June 1 (we told people that's the end date for the icesat-2 hub!) |
@scottyhq sounds good! I'll setup the snowex hub first, and we can separately decomm the existing hub later. |
- Moves current hackweeks hub (which is really icesat hackweek hub) config out of common and into staging / prod.yaml - Add new config for snowex hackweek - Add scratch bucket for snowex hackweek Ref 2i2c-org#1309
@scottyhq ok I've set up https://snowex.uwhackweeks.2i2c.cloud now! Take it for a spin? I've also setup what is needed for https://snowex.uwhackweeks.2i2c.cloud to work - if you install |
@scottyhq you'd also need to give access to the JupyterHub (https://infrastructure.2i2c.org/en/latest/howto/configure/auth-management.html?highlight=github#follow-up-github-organization-administrators-must-grant-access) for it to allow logins based on GitHub teams. |
@scottyhq ok, I think this is completely set up on https://snowex.uwhackweeks.2i2c.cloud/. Try it and let me know if it works out ok? I opened #1336 for decomissioning the existing hub. |
@scottyhq also, we only have GPU quota for 2 concurrent GPU instances now. Can you tell me how many you would want to support, and I'll ask for a quota increase right away? |
Awesome! will kick the tires today.
Ideally we want to support up to 100 simultaneous users. But we'll make due with whatever we get. A minimum viable number would be ~20 (where one person per group in the hackweek has guaranteed access to a GPU node). |
Accidental close! @scottyhq ok I'll ask! Note that the GPU nodes will only have 4CPUs & about 55G of RAM. Is that ok? And I'm getting K80 GPUs |
CPU and RAM is fine, generally I just assume the instance resource ratios are optimized in some way for most workloads. Not too picky about GPU type, but it would be nice to have more modern options (T4, A100, V100), and perhaps a second type would also help with increased quota. |
@yuvipanda, do you have any news about the quota increase you have requested? |
@damianavila @scottyhq so i heard back, and our quota increase was approved only until 32 - so that gives us just 8 GPUs :( The quotas are for 'P and G' types together, and that's all the GPU types - so we can't split it up among multiple types either. I also asked if paying for premium support would increase the chances of the quota being granted and was told no it would not. The specific response is: Hello, Thank you for your patience. We partially fulfilled your quota increase request. Your new quota for All P and G instances is 32. We can reassess a higher quota increase at a later stage. In the meantime, consider alternative instance types, or spreading your instances across AWS Regions. For a full list of our alternative instance types, see: To avoid processing delays, submit a quota increase requests in the Service Quotas console: Feel free to ask if you have any other questions. Have an awesome week ahead. Please let us know if we helped resolve your issue:This is a bit frustrating, as I had explained to them why we wanted the quota increase we did. |
Multiple regions doesn't really work for us since we want to stay in us-west-2, and our home directories are there. We could ask for a spot instance quota increase too maybe? |
But that would help with the imposed limit? |
@damianavila the quota is for CPU count so bigger instances won't help |
OK... let me see if I understand this... You requested a raise and they give you 32 CPUs which translates to 32/4 = 8 GPU. If you use the p2.8xlarge, you will have the 32 CPU and 8 GPU but your quota will most likely? (I am supposing that will be the case, I might be wrong, it should be something to ask their support team) be bigger than that, I presume between 32 and 64 (with a max 64)... because at 64 CPU they will "force" you again to use p2.16xlarge. Does it makes sense or I am talking nonsense 😉? |
The ratio of CPU to GPU is 4 for all p2 instances, so we can only get 8 GPUs with 32 CPU quota regardless of the size of instances we use. So we can get 32 total CPUs, which will only provide us with at most 8GPUs in whatever configuration. |
OK... but that is assuming you have the 32 quota fixed across the instances... what I am saying here is that you have that quota fixed for the smaller instances BUT I think they might raise the quota limit if you use a bigger instance. |
@damianavila quota is fixed across the instance family - this is for all P and G type instances put together. With a 32 quota, trying to launch a instance with 64 CPUs will just fail with an error message about not enough quotas. |
But you agree with me this is totally nonsense, right? Or am I missing something else? |
@damianavila that they gave us only 32 CPU quota (given I had asked for 400) is definitely nonsense and I agree! But I'm not sure I fully understand what the suggested next step is. In particular, I don't think us trying to use different instance sizes matters - so I'm a little confused there! |
@yuvipanda, I would probably further ask AWS support IF changing the instance to a bigger one actually gives us the opportunity to additionally being granted with a higher quota/limit (more than 32). |
Frustrating that the quota is so low! I'd reply with something like the following:
|
@damianavila I can ask them specifically, but based on my experience so far I think it's a dead end. They generally don't care how many instances you use the CPUs in. But if your experience with AWS has been different, I can ask them! |
In general, I did not have a different experience than you with their support. |
We got bigger GPU quota thanks to @cgentemann! I think this issue can be closed now. |
Follow-up: I opened up the following issue to discuss these issues around new AWS organizations and resource limits/quotas: |
Hub Description
The hub has the following needs:
https://quay.io/repository/uwhackweek/snowex
Community Representative(s)
@scottyhq
Important dates
Hub Authentication Type
GitHub Authentication (e.g., @MyGitHubHandle)
Hub logo information
Hub user image
Extra features you'd like to enable
Other relevant information
GPU support in AWS is not yet available in our infra and it should be developed for this Hub.
Hub URL
TBD.TBD.2i2c.cloud
Hub Type
daskhub
Tasks to deploy the hub
The text was updated successfully, but these errors were encountered: