You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Historically, this event checklist was inside the event issue template and a small part of if still is.
However, since there's a longer discussion here that involves the resource allocation script, the profile options, and node sharing concepts I believe this needs a proper documentation page.
Proposal
Ideally, for events, we would use the resource allocation script to update the infrastructure if needed and this docs page would just describe how to use it.
However, we are not there yet, but this doesn't mean we don't have events that we need to understand how or if we need to prepare the infrastructure before.
I propose we document a checklist for the engineers to go through before events even if it's incomplete or not perfect. For example:
Check the quotas
Depending on the type of cluster:
create a separate nodepool for the hub if on a shared cluster
consider adding/changing the hub's profile options based on their expected usage
consider scheduling users on bigger nodes to decrease startup times
etc.
I believe this information is already scattered in various other issues and/or PRs and comments, and we need to gather it in some form, so we can iterate on it and have a general understanding of possible event challenges for the infrastructure.
Updates and actions
No response
The text was updated successfully, but these errors were encountered:
GeorgianaElena
changed the title
[Documentation] Add a checklist for events that engineers can follow to decide if infrastructure needs to be updated
[Documentation] Add a checklist to decide if infrastructure needs to be updated before an event
Nov 30, 2023
Context
Follow-up to #3436.
Historically, this event checklist was inside the event issue template and a small part of if still is.
However, since there's a longer discussion here that involves the resource allocation script, the profile options, and node sharing concepts I believe this needs a proper documentation page.
Proposal
Ideally, for events, we would use the resource allocation script to update the infrastructure if needed and this docs page would just describe how to use it.
However, we are not there yet, but this doesn't mean we don't have events that we need to understand how or if we need to prepare the infrastructure before.
I propose we document a checklist for the engineers to go through before events even if it's incomplete or not perfect. For example:
Check the quotas
Depending on the type of cluster:
etc.
I believe this information is already scattered in various other issues and/or PRs and comments, and we need to gather it in some form, so we can iterate on it and have a general understanding of possible event challenges for the infrastructure.
Updates and actions
No response
The text was updated successfully, but these errors were encountered: