From 5bba928355274e78e21187997a1cf72f61782fd4 Mon Sep 17 00:00:00 2001 From: Chris Holdgraf <choldgraf@berkeley.edu> Date: Thu, 12 May 2022 14:06:51 +0200 Subject: [PATCH 01/13] Updating service objectives page --- about/overview.md | 2 +- about/service-objectives.md | 109 +++++++++++++++++++++++++++ about/strategy/index.md | 1 - about/strategy/service-objectives.md | 75 ------------------ conf.py | 1 + index.md | 1 + 6 files changed, 112 insertions(+), 77 deletions(-) create mode 100644 about/service-objectives.md delete mode 100644 about/strategy/service-objectives.md diff --git a/about/overview.md b/about/overview.md index a1cc74f..ad98d55 100644 --- a/about/overview.md +++ b/about/overview.md @@ -41,7 +41,7 @@ The service is currently in an **alpha phase**, and may evolve as we learn more 2i2c will operate and manage a 2i2c JupyterHub deployment for use by you and your community, accessible via a web URL. 2i2c will handle the design, configuration, development, and ongoing operation of the hub infrastructure. The following sections describe several common activities that the 2i2c team will perform as a part of your managed JupyterHub. -In addition, see our [Service Level Objectives](strategy/service-objectives.md) for an explanation of what we aim to accomplish in terms of uptime, reliability, and support for these services. +In addition, see our [Service Level Objectives](service-objectives.md) for an explanation of what we aim to accomplish in terms of uptime, reliability, and support for these services. ### JupyterHub Setup diff --git a/about/service-objectives.md b/about/service-objectives.md new file mode 100644 index 0000000..dbb2126 --- /dev/null +++ b/about/service-objectives.md @@ -0,0 +1,109 @@ +# Service Level Objectives + +This page describes the **Service Level Objectives** (SLOs) of 2i2c's infrastructure services[^slos]. +These are our goals in running infrastructure for the communities that we serve. +They indicate what our users can expect when using the infrastructure we support. +They will evolve over time as we get feedback from communities we serve, and learn more about how to best deliver impact via our services. + +:::{note} +2i2c does not currently have a **Service Level Agreement** (SLAs), and the SLOs here are not legally-binding. +We aim to create SLAs once we learn more about our capacity to fulfill them sustainably.[^zenodo] +::: + + +(objectives:stability)= +## Availability and uptime + +The infrastructure that 2i2c runs should be available to its communities 24/7, and with minimal human intervention needed to maintain this level of performance. +We invest in continous development to improve the resiliency and efficiency of the infrastructure that we run, following best-practices in service design and engineering in the cloud. + +- Communities should feel comfortable relying on 2i2c's services for critical educational and research needs. +- There should not be prolonged periods of service disruption for any community. + +:::{admonition} To be refined... +It is a known anti-pattern to define an ambiguous SLO like "24/7". +Truly meeting such an objective is nearly impossible and extremely costly. +In the future, we plan to run an audit of our infrastructure and practices, and design quantifiable uptime targets for our SLOs. +::: + + +(objectives:intentional-downtime)= +### Intentional downtime + +In some cases there may be intentional downtime for the infrastructure that we run. +For example, if we need to undergo major maintenance of infrastructure transitions, it may necessitate bringing down the infrastructure for a few hours. + +- We will communicate with communities before any intentional downtime. +- We will aim for downtime windows that happen outside of heavy usage. +- We will communicate with communities when the expected downtime is over. + +(objectives:reduced-capacity)= +### Reduced capacity + +There are some periods of time when we have **expected reduced capacity**. +These are periods of time when we are less strict about adhering to the service objectives on this page. +This ensures that our work practices are sustainable for our team and avoid burnout. + +Here are periods of expected reduced capacity: + +- Weekends +- The first and last weeks of the year. +- Periods of overlapping international holidays. + +If this is disruptive to your community's activies, please reach out and we can discuss. +However, we encourage you to avoid planning mission-critical events or actions during periods of expected reduced capacity. + +(objectives:support)= +## Support responsiveness + +Support is one of the most important services that 2i2c provides, especially when there are problems or outages. +For this reason, we commit to developing a support process that is efficient in responding to issues that communities bring to us. + +- We have a dedicated communications channel for support (see [](../support.md)). +- At least one team member is always tasked with monitoring this channel. +- We will acknowledge receipt of this ticket within 24 working hours. +- We will triage support requests within 24 working hours. +- Support requests related to degraded user experience will be prioritized over changes and enhancement requests. +- For major or complex outages, we will re-direct capacity on our engineering team to resolve them. + +:::{seealso} +See [](../support.md) for more information about contacting support. +See [](tc:support:process) for our team's support process. +::: + + +(objectives:cost)= +## Costs and cloud flexibility + +Our communities rely on us to keep their cloud costs as low as possible. +They also rely on us to provide infrastructure that is dynamic and meets the needs of diverse communities. + +There is an inherent tension between doing things quickly (which generally requires using extra resources to encourage speed) and cost efficiency (because you pay for those extra resources). +This is particularly relevant during sharp increases in hub usage. + +- Communities should feel comfortable that moderate increases in usage will not result in instability. +- Communities should feel comfortable that this flexibility does not result in unexpected cloud costs. +- We should provide this flexibility in a way that is sustainable for our team. +- If infrastructure requires steady, but semi-random usage, we should prioritize cost efficiency. +- If infrastructure will have known spikes of activity, we may temporarily favor speed over cost by asking for extra resources from the cloud provider. +- If spikes in activity will come just after a holiday or weekend, we may make these changes a few days early to avoid working off-hours. + +:::{seealso} +See [](pricing/index.md) for more information about costs. +::: + + +(objectives:updates)= +## Upgrades and maintenance + +By continuously upgrading the cloud infrastructure and software environments that our hubs offer, we improve the experience of the communities that we serve by giving them new features, enhancements, and bug and security fixes. + +We aim to continuously upgrade this infrastructure in a way that minimizes the risk of instability or outages. + +- We will keep our hubs relatively up-to-date with the latest [JupyterHub](https://jupyterhub.readthedocs.io) and [Zero to JupyterHub](https://z2jh.jupyter.org) releases. +- We will ensure that our hub infrastructure is compatible with the latest software releases in the common open source ecosystems we provide. +- We will support open source communities in making regular updates and releases to their tools. + +[^slos]: For more about the difference between Service Level Objectives, Agreements, and Indicators, see [the Google SRE handbook](https://sre.google/sre-book/service-level-objectives/). + +[^zenodo]: This practice is inspired by [Zenodo's intentional lack of Service Level Agreements](https://about.zenodo.org/principles/). diff --git a/about/strategy/index.md b/about/strategy/index.md index 45c61bd..2be9590 100644 --- a/about/strategy/index.md +++ b/about/strategy/index.md @@ -8,7 +8,6 @@ We aim to run this pilot for several months, gaining experience and sharpening o This page describes the major strategy of the 2i2c Managed JupyterHubs pilot. ```{toctree} -service-objectives.md roadmap.md ``` diff --git a/about/strategy/service-objectives.md b/about/strategy/service-objectives.md deleted file mode 100644 index 796f46a..0000000 --- a/about/strategy/service-objectives.md +++ /dev/null @@ -1,75 +0,0 @@ -# Service Level Objectives and Principles - -This page describes the **Service Level Objectives** (SLOs) of 2i2c's infrastructure services[^slos]. -These describe our goals in running infrastructure for the communities that we serve. -They indicate what our users can expect when using the infrastructure we support. - -We design our infrastructure, and consistently hone our practices, to meet these objectives. -They evolve over time as we get feedback from communities we serve, and learn more about how to best deliver impact via our services. - -:::{note} -2i2c does not currently have a **Service Level Agreement** (SLAs), as this is generally a legally-binding document that involves calculation of risk via revenue lost during service outages. -We currently do not have the capacity to design and litigate strict SLAs, and believe that we will have the most impact by instead committing to service **objectives** that are transparent and follow best practices.[^zenodo] - -We may revisit this in the future depending on the feedback we get from other communities! -::: - -## High availability - -The infrastructure that 2i2c runs should be available to its communities 24/7, and with minimal human intervention needed to maintain this level of performance. -We invest in continous development to improve the resiliency and efficiency of the infrastructure that we run, following best-practices in service design and engineering in the cloud. - -:::{admonition} To be refined... -It is a known anti-pattern to define an ambiguous SLO like "24/7". -Truly meeting such an objective is nearly impossible and extremely costly. -In the future, we plan to run an audit of our infrastructure and practices, and design quantifiable uptime targets for our SLOs. -::: - -## Balance speed and cost - -There is an inherent tension between doing things quickly (which generally requires using extra resources to encourage speed) and cost efficiency (because you pay for those extra resources). -This is particularly relevant during **scaling events**. -These are moments when the infrastructure has enough usage that it must grow the cloud resources available to handle the new load. - -2i2c strives to build infrastructure that strikes a balance that depends on the particular use-case. -If infrastructure requires steady, but semi-random usage, we should prioritize cost efficiency. -If infrastructure will have known spikes of activity at the same time, we may temporarily favor speed over cost by asking for extra resources from the cloud provider. - -:::{note} -If your community requires a change in the infrastructure that occurs over a weekend, we will generally try to do this on the Friday beforehand, rather than over the weekend, even if this means it will cost marginally more in cloud infrastructure. -If we anticipate the cost to be significant, we will discuss with you ahead of time. -::: - -## Support responsiveness - -We have a dedicated communications channel for support at `support@2i2c.org`, and somebody on the engineering team is always tasked with monitoring this channel. - -When questions come in on the support channel, we triage them based on whether they cover a major problem for the community (e.g., if there is a major hub outage). - -If this is the case, we strive to respond as quickly as possible to mobilize the right team members and fix the problem. -We will communicate with the Community Representative throughout this process, and let them know when the problem has been resolved. -In general, we aim to respond to all support questions within 24 hours - though we strive for more quick responses if the issue is critical. - -## Intentional downtime - -In some cases there may be intentional downtime for the infrastructure that we run. -For example, if we need to undergo major maintenance of infrastructure transitions, it may necessitate bringing down the infrastructure for a few hours. - -In these cases, we will communicate with the Community Representative ahead of time, to inform them of our intentions and give an opportunity for them to tell us when this will be least disruptive. -We will then carry out our maintenance as quickly as possible, with minimal downtime, and notify the community representative(s) when this has been complete. - -## Holidays, weekends, and expected downtime - -Expected downtime are periods of time when there is generally less availability from the team (as well as from the communities we serve). -This includes weekends and heavy holiday periods like the end of the year. - -While we strive for our services to be available 24/7, we also believe in the importance of protecting weekends and holiday time for our team. -During expected downtime periods, you should expect reduced responsiveness in our support channels, and no promises about our ability to respond to questions or issues with your infrastructure. -We may agree to perform some of these operations during expected downtime, but this should be the exception, not the rule. - -If this is disruptive to your community's activies, please reach out and we can discuss. -However, we encourage you to avoid planning mission-critical events or actions during periods of expected downtime. - -[^slos]: For more about the difference between Service Level Objectives, Agreements, and Indicators, see [the Google SRE handbook](https://sre.google/sre-book/service-level-objectives/). - -[^zenodo]: This practice is inspired by [Zenodo's intentional lack of Service Level Agreements](https://about.zenodo.org/principles/). \ No newline at end of file diff --git a/conf.py b/conf.py index d27e992..dab32a1 100644 --- a/conf.py +++ b/conf.py @@ -61,6 +61,7 @@ } rediraffe_redirects = { + "about/strategy/service-objectives.md": "about/service-objectives.md", } # Disable linkcheck for anchors because it throws false errors for any JS anchors diff --git a/index.md b/index.md index 029db3b..297d62c 100644 --- a/index.md +++ b/index.md @@ -13,6 +13,7 @@ These sections describe the hub service at an organizational level. :caption: About the service about/overview about/pricing/index +about/service-objectives about/strategy/index ``` From 870b57f6a389e1675ac560b23b06e119f15c4ade Mon Sep 17 00:00:00 2001 From: Chris Holdgraf <choldgraf@berkeley.edu> Date: Fri, 13 May 2022 09:00:37 +0200 Subject: [PATCH 02/13] Update service objectives --- about/service-objectives.md | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/about/service-objectives.md b/about/service-objectives.md index dbb2126..e29db5e 100644 --- a/about/service-objectives.md +++ b/about/service-objectives.md @@ -3,7 +3,7 @@ This page describes the **Service Level Objectives** (SLOs) of 2i2c's infrastructure services[^slos]. These are our goals in running infrastructure for the communities that we serve. They indicate what our users can expect when using the infrastructure we support. -They will evolve over time as we get feedback from communities we serve, and learn more about how to best deliver impact via our services. +They will evolve over time as we get feedback and learn how to best deliver impact via our services. :::{note} 2i2c does not currently have a **Service Level Agreement** (SLAs), and the SLOs here are not legally-binding. @@ -31,7 +31,7 @@ In the future, we plan to run an audit of our infrastructure and practices, and ### Intentional downtime In some cases there may be intentional downtime for the infrastructure that we run. -For example, if we need to undergo major maintenance of infrastructure transitions, it may necessitate bringing down the infrastructure for a few hours. +For example, if we need to undergo major maintenance of infrastructure transitions. - We will communicate with communities before any intentional downtime. - We will aim for downtime windows that happen outside of heavy usage. @@ -42,7 +42,7 @@ For example, if we need to undergo major maintenance of infrastructure transitio There are some periods of time when we have **expected reduced capacity**. These are periods of time when we are less strict about adhering to the service objectives on this page. -This ensures that our work practices are sustainable for our team and avoid burnout. +This ensures that our work practices are sustainable and fair for our team. Here are periods of expected reduced capacity: @@ -61,8 +61,7 @@ For this reason, we commit to developing a support process that is efficient in - We have a dedicated communications channel for support (see [](../support.md)). - At least one team member is always tasked with monitoring this channel. -- We will acknowledge receipt of this ticket within 24 working hours. -- We will triage support requests within 24 working hours. +- We will triage support requests and respond to them within 24 working hours. - Support requests related to degraded user experience will be prioritized over changes and enhancement requests. - For major or complex outages, we will re-direct capacity on our engineering team to resolve them. From 556878d9e41cc990ded31334972c51ea9616b2d4 Mon Sep 17 00:00:00 2001 From: Chris Holdgraf <choldgraf@berkeley.edu> Date: Fri, 13 May 2022 09:02:56 +0200 Subject: [PATCH 03/13] Outages --- about/service-objectives.md | 1 + 1 file changed, 1 insertion(+) diff --git a/about/service-objectives.md b/about/service-objectives.md index e29db5e..ec9c902 100644 --- a/about/service-objectives.md +++ b/about/service-objectives.md @@ -19,6 +19,7 @@ We invest in continous development to improve the resiliency and efficiency of t - Communities should feel comfortable relying on 2i2c's services for critical educational and research needs. - There should not be prolonged periods of service disruption for any community. +- When outages do occur, we will prioritize these over other work that our team is doing. :::{admonition} To be refined... It is a known anti-pattern to define an ambiguous SLO like "24/7". From 65964f4b9c42ac76f98013444bb698dece5211f0 Mon Sep 17 00:00:00 2001 From: Chris Holdgraf <choldgraf@berkeley.edu> Date: Mon, 16 May 2022 11:27:10 +0200 Subject: [PATCH 04/13] Updates to support around incidents --- about/service-objectives.md | 29 ++++++++++++++++++++++++----- 1 file changed, 24 insertions(+), 5 deletions(-) diff --git a/about/service-objectives.md b/about/service-objectives.md index ec9c902..39b65e9 100644 --- a/about/service-objectives.md +++ b/about/service-objectives.md @@ -59,18 +59,37 @@ However, we encourage you to avoid planning mission-critical events or actions d Support is one of the most important services that 2i2c provides, especially when there are problems or outages. For this reason, we commit to developing a support process that is efficient in responding to issues that communities bring to us. +We define two types of support with 2i2c: -- We have a dedicated communications channel for support (see [](../support.md)). -- At least one team member is always tasked with monitoring this channel. -- We will triage support requests and respond to them within 24 working hours. -- Support requests related to degraded user experience will be prioritized over changes and enhancement requests. -- For major or complex outages, we will re-direct capacity on our engineering team to resolve them. +- **Change Requests** are general requests for changes or improvements to a community's hub. For example, updating the environment or improving an open source tool. +- **Incidents** are requests connected with significant degraded service for one or more communities. For example, a system-wide outage or inability of users to log-in. + +Below are our objectives broken down by the type of support they relate to. :::{seealso} See [](../support.md) for more information about contacting support. See [](tc:support:process) for our team's support process. ::: +### General support objectives + +- We have a dedicated communications channel for support (see [](../support.md)). +- At least one team member is always tasked with monitoring this channel. +- Our support team is communicative, helpful, and [abides by our Code of Conduct](tc:code-of-conduct). + +### Incident support objectives + +Our goal is to be more rapid in responding, communicating, and resolving support requests during incidents. +Our ability to meet these objectives will depend on the times they are reported relative to the working hours of our support team. + +- We will triage and respond to Incidents within 6 working hours. +- We will prioritize resolving Inicdents over any other Change requests. +- For major or complex outages, we will re-direct capacity on our engineering team to resolve them. + +### Change Request support objectives + +- We will triage support requests and respond to them within 24 working hours. +- We will prioritize resolving Change Requests by balancing them against our other development priorities as described in {doc}`our Support Team Process documentation <tc:projects/managed-hubs/support>`) (objectives:cost)= ## Costs and cloud flexibility From faf88335791bd38a27905cc15cbc0ef4f6f5c8b4 Mon Sep 17 00:00:00 2001 From: Chris Holdgraf <choldgraf@berkeley.edu> Date: Mon, 16 May 2022 11:28:38 +0200 Subject: [PATCH 05/13] working day --- about/service-objectives.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/about/service-objectives.md b/about/service-objectives.md index 39b65e9..36e5dad 100644 --- a/about/service-objectives.md +++ b/about/service-objectives.md @@ -82,7 +82,7 @@ See [](tc:support:process) for our team's support process. Our goal is to be more rapid in responding, communicating, and resolving support requests during incidents. Our ability to meet these objectives will depend on the times they are reported relative to the working hours of our support team. -- We will triage and respond to Incidents within 6 working hours. +- We will triage and respond to Incidents within 8 working hours. - We will prioritize resolving Inicdents over any other Change requests. - For major or complex outages, we will re-direct capacity on our engineering team to resolve them. From e7baf139fcc5db91e7333c7a600b1ef9eb6609f3 Mon Sep 17 00:00:00 2001 From: Chris Holdgraf <choldgraf@berkeley.edu> Date: Mon, 16 May 2022 11:33:56 +0200 Subject: [PATCH 06/13] More notes --- about/service-objectives.md | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/about/service-objectives.md b/about/service-objectives.md index 36e5dad..a870f03 100644 --- a/about/service-objectives.md +++ b/about/service-objectives.md @@ -19,6 +19,7 @@ We invest in continous development to improve the resiliency and efficiency of t - Communities should feel comfortable relying on 2i2c's services for critical educational and research needs. - There should not be prolonged periods of service disruption for any community. +- We will invest in monitoring and reporting infrastructure to detect outages quickly and before they impact end-users. - When outages do occur, we will prioritize these over other work that our team is doing. :::{admonition} To be refined... @@ -39,7 +40,7 @@ For example, if we need to undergo major maintenance of infrastructure transitio - We will communicate with communities when the expected downtime is over. (objectives:reduced-capacity)= -### Reduced capacity +### Reduced team capacity There are some periods of time when we have **expected reduced capacity**. These are periods of time when we are less strict about adhering to the service objectives on this page. @@ -54,6 +55,11 @@ Here are periods of expected reduced capacity: If this is disruptive to your community's activies, please reach out and we can discuss. However, we encourage you to avoid planning mission-critical events or actions during periods of expected reduced capacity. +:::{admonition} A note on timezones +Remember that 2i2c's team is distributed globally, and our working time zone may be different from yours. +We aim to have team members in time zones that are working at the same time as the communities we serve, but there may occasionally be mis-matches in working hours. +::: + (objectives:support)= ## Support responsiveness @@ -67,8 +73,8 @@ We define two types of support with 2i2c: Below are our objectives broken down by the type of support they relate to. :::{seealso} -See [](../support.md) for more information about contacting support. -See [](tc:support:process) for our team's support process. +- See [](../support.md) for more information about contacting support. +- See [](tc:support:process) for our team's support process. ::: ### General support objectives From b663d20b541e54ae4ac42e790d3b96ac58a94455 Mon Sep 17 00:00:00 2001 From: Chris Holdgraf <choldgraf@berkeley.edu> Date: Mon, 16 May 2022 11:37:00 +0200 Subject: [PATCH 07/13] updates --- about/service-objectives.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/about/service-objectives.md b/about/service-objectives.md index a870f03..0ca4108 100644 --- a/about/service-objectives.md +++ b/about/service-objectives.md @@ -95,7 +95,7 @@ Our ability to meet these objectives will depend on the times they are reported ### Change Request support objectives - We will triage support requests and respond to them within 24 working hours. -- We will prioritize resolving Change Requests by balancing them against our other development priorities as described in {doc}`our Support Team Process documentation <tc:projects/managed-hubs/support>`) +- We will prioritize resolving Change Requests by balancing them against our other development priorities as described in {doc}`our Support Team Process documentation <tc:projects/managed-hubs/support>`. (objectives:cost)= ## Costs and cloud flexibility @@ -106,8 +106,7 @@ They also rely on us to provide infrastructure that is dynamic and meets the nee There is an inherent tension between doing things quickly (which generally requires using extra resources to encourage speed) and cost efficiency (because you pay for those extra resources). This is particularly relevant during sharp increases in hub usage. -- Communities should feel comfortable that moderate increases in usage will not result in instability. -- Communities should feel comfortable that this flexibility does not result in unexpected cloud costs. +- Communities should feel comfortable that moderate increases in usage will not result in instability, and that this flexibility does not result in unexpectedly high cloud costs. - We should provide this flexibility in a way that is sustainable for our team. - If infrastructure requires steady, but semi-random usage, we should prioritize cost efficiency. - If infrastructure will have known spikes of activity, we may temporarily favor speed over cost by asking for extra resources from the cloud provider. From 1c48ac07ff8831f6a55cca12c543ceeaa50183a8 Mon Sep 17 00:00:00 2001 From: Chris Holdgraf <choldgraf@berkeley.edu> Date: Mon, 16 May 2022 12:00:15 +0200 Subject: [PATCH 08/13] Change and guidance --- about/service-objectives.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/about/service-objectives.md b/about/service-objectives.md index 0ca4108..e05bb05 100644 --- a/about/service-objectives.md +++ b/about/service-objectives.md @@ -67,8 +67,9 @@ Support is one of the most important services that 2i2c provides, especially whe For this reason, we commit to developing a support process that is efficient in responding to issues that communities bring to us. We define two types of support with 2i2c: -- **Change Requests** are general requests for changes or improvements to a community's hub. For example, updating the environment or improving an open source tool. - **Incidents** are requests connected with significant degraded service for one or more communities. For example, a system-wide outage or inability of users to log-in. +- **Change Requests** are general requests for changes or improvements to a community's hub. For example, updating the environment or improving an open source tool. +- **Guidance Requests** are questions or requests for conversations to discuss infrastructure decisions, provide guidance, etc. Below are our objectives broken down by the type of support they relate to. @@ -92,10 +93,10 @@ Our ability to meet these objectives will depend on the times they are reported - We will prioritize resolving Inicdents over any other Change requests. - For major or complex outages, we will re-direct capacity on our engineering team to resolve them. -### Change Request support objectives +### Change and Guidance Request support objectives -- We will triage support requests and respond to them within 24 working hours. -- We will prioritize resolving Change Requests by balancing them against our other development priorities as described in {doc}`our Support Team Process documentation <tc:projects/managed-hubs/support>`. +- We will triage Change and Guidance requests and respond to them within 24 working hours. +- We will prioritize resolving Change and Guidance Requests by balancing them against our other development priorities as described in {doc}`our Support Team Process documentation <tc:projects/managed-hubs/support>`. (objectives:cost)= ## Costs and cloud flexibility From aff4dd464672c927365aa169ca19f7464c98fc98 Mon Sep 17 00:00:00 2001 From: Chris Holdgraf <choldgraf@berkeley.edu> Date: Mon, 16 May 2022 05:37:06 -0700 Subject: [PATCH 09/13] Update about/service-objectives.md Co-authored-by: Yuvi Panda <yuvipanda@gmail.com> --- about/service-objectives.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/about/service-objectives.md b/about/service-objectives.md index e05bb05..284c618 100644 --- a/about/service-objectives.md +++ b/about/service-objectives.md @@ -65,7 +65,7 @@ We aim to have team members in time zones that are working at the same time as t Support is one of the most important services that 2i2c provides, especially when there are problems or outages. For this reason, we commit to developing a support process that is efficient in responding to issues that communities bring to us. -We define two types of support with 2i2c: +We define three types of support with 2i2c: - **Incidents** are requests connected with significant degraded service for one or more communities. For example, a system-wide outage or inability of users to log-in. - **Change Requests** are general requests for changes or improvements to a community's hub. For example, updating the environment or improving an open source tool. From 04b0d71d445e93aaebf20d734bec76cde8f1b29f Mon Sep 17 00:00:00 2001 From: Yuvi Panda <yuvipanda@gmail.com> Date: Mon, 30 May 2022 14:52:14 +0530 Subject: [PATCH 10/13] Update when we respond to incidents Co-authored-by: Chris Holdgraf <choldgraf@berkeley.edu> --- about/service-objectives.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/about/service-objectives.md b/about/service-objectives.md index 284c618..82a282e 100644 --- a/about/service-objectives.md +++ b/about/service-objectives.md @@ -89,7 +89,7 @@ Below are our objectives broken down by the type of support they relate to. Our goal is to be more rapid in responding, communicating, and resolving support requests during incidents. Our ability to meet these objectives will depend on the times they are reported relative to the working hours of our support team. -- We will triage and respond to Incidents within 8 working hours. +- We will triage and respond to Incidents within 8 working hours **at most**. We will on average respond to Incidents within **2 working hours**. - We will prioritize resolving Inicdents over any other Change requests. - For major or complex outages, we will re-direct capacity on our engineering team to resolve them. From 5044a98ff3282c768146ad491ef8a4ec7d225c0b Mon Sep 17 00:00:00 2001 From: Yuvi Panda <yuvipanda@gmail.com> Date: Mon, 30 May 2022 14:52:27 +0530 Subject: [PATCH 11/13] Fix typo Co-authored-by: Sarah Gibson <44771837+sgibson91@users.noreply.github.com> --- about/service-objectives.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/about/service-objectives.md b/about/service-objectives.md index 82a282e..f3918e6 100644 --- a/about/service-objectives.md +++ b/about/service-objectives.md @@ -90,7 +90,7 @@ Our goal is to be more rapid in responding, communicating, and resolving support Our ability to meet these objectives will depend on the times they are reported relative to the working hours of our support team. - We will triage and respond to Incidents within 8 working hours **at most**. We will on average respond to Incidents within **2 working hours**. -- We will prioritize resolving Inicdents over any other Change requests. +- We will prioritize resolving Incidents over any other Change requests. - For major or complex outages, we will re-direct capacity on our engineering team to resolve them. ### Change and Guidance Request support objectives From 210919a0e296385275f3222675c3537db81b87b0 Mon Sep 17 00:00:00 2001 From: Chris Holdgraf <choldgraf@gmail.com> Date: Fri, 3 Jun 2022 15:08:28 +0200 Subject: [PATCH 12/13] about/2i2c --- about/service-objectives.md | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/about/service-objectives.md b/about/service-objectives.md index f3918e6..9a5833f 100644 --- a/about/service-objectives.md +++ b/about/service-objectives.md @@ -38,6 +38,10 @@ For example, if we need to undergo major maintenance of infrastructure transitio - We will communicate with communities before any intentional downtime. - We will aim for downtime windows that happen outside of heavy usage. - We will communicate with communities when the expected downtime is over. +:::{admonition} This may change +We are still exploring how to effectively communicate and schedule work around intentional downtime, and our processes may change. +[See this issue for example](https://github.com/2i2c-org/team-compass/issues/423). +::: (objectives:reduced-capacity)= ### Reduced team capacity @@ -89,13 +93,15 @@ Below are our objectives broken down by the type of support they relate to. Our goal is to be more rapid in responding, communicating, and resolving support requests during incidents. Our ability to meet these objectives will depend on the times they are reported relative to the working hours of our support team. -- We will triage and respond to Incidents within 8 working hours **at most**. We will on average respond to Incidents within **2 working hours**. +- We will triage and respond to Incidents within **at most one working day**[^working-day]. We will **on average** respond to Incidents significantly faster than this, but do not commit to a specific timeline until we gain more experience. - We will prioritize resolving Incidents over any other Change requests. - For major or complex outages, we will re-direct capacity on our engineering team to resolve them. +[^working-day]: We define a "working day" as a continuous 24 hour period between Monday and Friday. Our team and communities we serve are split across many time zones, and thus we use this more general definition of a working day rather than something timezone-specific. + ### Change and Guidance Request support objectives -- We will triage Change and Guidance requests and respond to them within 24 working hours. +- We will triage Change and Guidance requests and respond to them within one working day. - We will prioritize resolving Change and Guidance Requests by balancing them against our other development priorities as described in {doc}`our Support Team Process documentation <tc:projects/managed-hubs/support>`. (objectives:cost)= From 46c31bec58365abfeffc54cd8c3c3191e4394dc5 Mon Sep 17 00:00:00 2001 From: Chris Holdgraf <choldgraf@gmail.com> Date: Fri, 3 Jun 2022 15:08:59 +0200 Subject: [PATCH 13/13] Update team link --- about/2i2c.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/about/2i2c.md b/about/2i2c.md index 1da2c30..95aaf7c 100644 --- a/about/2i2c.md +++ b/about/2i2c.md @@ -31,7 +31,7 @@ Here are a few of the major projects our team memebers have been involved in ove ## 2i2c has expertise in open source workflows and Jupyter 2i2c's team is comprised of several "[Distinguished Contributors](https://jupyter.org/about)" in the Jupyter ecosystem, which is a crucial technical component of this service. -We are [core team members of JupyterHub and Binder](https://jupyterhub-team-compass.readthedocs.io/en/latest/team.html), and make regular contributions across the Jupyter ecosystem. +We are [core team members of JupyterHub and Binder](https://jupyterhub-team-compass.readthedocs.io/en/latest/team/index.html), and make regular contributions across the Jupyter ecosystem. Moreover, our team has many years of experience with all aspects of the Jupyter stack and we are comfortable interacting with open source communities everywhere. This makes 2i2c uniquely capable of both utilizing and improving this technology through upstream contributions.