-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable spot for only certain InstanceTypes #277
Comments
What would the config look like for that? I think that the request makes sense. Right now spot and on-demand are all or nothing. I'm thinking that each entry in the Include section could be extended to be either an instance type/family, or a dictionary with additional configuration that is specific to the type/family. For example:
|
Your proposal is elegant. I think I would use it. And none of this would be necessary if parallelcluster team didn't put a limit on the number of compute resources. If you know changes coming there, maybe this can be deferred/will-not-implement? That said, if you supported it now, I would use it today. |
I don't know of any plans to increase the number of compute resources. I think that it is related to the use of EC2 fleets to manage CRs and limits on the number of fleets, but not sure. All I know is that currently the limit is 50 and if you need more than that your best option is multiple clusters. This was part of my rationale for supporting being able to use your virtual desktop as a login node for multiple clusters. |
Right UseOnDemand, UseSpot, and DisableSimultaneousMultithreading are global parameters affect all instance types. Add new configuration option to use the existing parameters as defaults for each instance type, but allow them to be configured for each included instance family and each included instance type. This allows admins to reduce the number of compute resources by, for example, only configuring spot for small instance types, but not for larger ones. Resolves #277
Right UseOnDemand, UseSpot, and DisableSimultaneousMultithreading are global parameters affect all instance types. Add new configuration option to use the existing parameters as defaults for each instance type, but allow them to be configured for each included instance family and each included instance type. This allows admins to reduce the number of compute resources by, for example, only configuring spot for small instance types, but not for larger ones. Resolves #277 Add documentation of manual commands for deconfiguring before deleting cluster. Resolves #282 ========================================================================= Go through everything and change the original term I used, Submitter, to External Login Node. Just need to make things consistent.
Right UseOnDemand, UseSpot, and DisableSimultaneousMultithreading are global parameters affect all instance types. Add new configuration option to use the existing parameters as defaults for each instance type, but allow them to be configured for each included instance family and each included instance type. This allows admins to reduce the number of compute resources by, for example, only configuring spot for small instance types, but not for larger ones. Resolves #277 Add documentation of manual commands for deconfiguring before deleting cluster. Resolves #282 ========================================================================= Go through everything and change the original term I used, Submitter, to External Login Node. Just need to make things consistent.
I am happy with the way you support the multiple clusters. Most of my users will use a specific cluster as I mentioned above, and I can have different windows open with different cluster access, so I'm happy happy. |
Right UseOnDemand, UseSpot, and DisableSimultaneousMultithreading are global parameters affect all instance types. Add new configuration option to use the existing parameters as defaults for each instance type, but allow them to be configured for each included instance family and each included instance type. This allows admins to reduce the number of compute resources by, for example, only configuring spot for small instance types, but not for larger ones. Resolves #277 Add documentation of manual commands for deconfiguring before deleting cluster. Resolves #282 ========================================================================= Go through everything and change the original term I used, Submitter, to External Login Node. Just need to make things consistent.
Right UseOnDemand, UseSpot, and DisableSimultaneousMultithreading are global parameters affect all instance types. Add new configuration option to use the existing parameters as defaults for each instance type, but allow them to be configured for each included instance family and each included instance type. This allows admins to reduce the number of compute resources by, for example, only configuring spot for small instance types, but not for larger ones. Resolves #277 Add documentation of manual commands for deconfiguring before deleting cluster. Resolves #282 ========================================================================= Go through everything and change the original term I used, Submitter, to External Login Node. Just need to make things consistent.
I'm trying to set up one cluster for all my EDA tool and SW team needs. This might be too grand of a goal and I might ultimately just might need to have multiple clusters for different "tool" types, i.e. simulation, physical design, software team. But let me just document this request, see if you find value in it or have other suggestions.
I like using spot for my simulation jobs that require smaller machines, less memory. Parallelcluster limits us to 50 compute resources. So if we generate a spot instance for every OnDemand instances, you can only have 25 instance types. Here is what I presently have enabled.
Is it possible to extend the configuration language to allow me to specify which machines are enabled for spot? Or maybe another section? I don't want to have another cluster for spot, as my understanding is the users would have to bounce between clusters with different module loads.
Maybe I just build a "small" machine cluster and a "big" machine cluster and not worry about it as most PD engineers who use the bigger machines never use the smaller machines... This is what I've done the past.
Your thoughts are appreciated as you have seen more clusters in action than I have...
Arguably this should also be a ticket with parallelcluster team to raise the CR limit to unlimited - I'm missing why there is a hard limit.
The text was updated successfully, but these errors were encountered: