Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus Alertmanager config for PagerDuty #5290

Merged
merged 7 commits into from
Dec 19, 2024

Conversation

sunu
Copy link
Contributor

@sunu sunu commented Dec 17, 2024

Send alerts to PagerDuty when jupyterhub-home-nfs disk usage is above 90%. Related to #5062.

In addition to these changes, we need to store the Pagerduty key in a structure like this in helm-charts/support/support.secret.values.yaml :

alertmanager:
  config:
    receivers:
      - name: pagerduty
        pagerduty_configs:
          - service_key: xxxxxx

cc @sgibson91

Send alerts to PagerDuty when jupyterhub-home-nfs
disk usage is above 90%
@sunu sunu marked this pull request as draft December 17, 2024 11:53
Copy link

github-actions bot commented Dec 17, 2024

Merging this PR will trigger the following deployment actions.

Support and Staging deployments

Cloud Provider Cluster Name Upgrade Support? Reason for Support Redeploy Upgrade Staging? Reason for Staging Redeploy
gcp 2i2c-uk Yes Support helm chart has been modified Yes Core infrastructure has been modified
aws smithsonian Yes Support helm chart has been modified Yes Core infrastructure has been modified
aws openscapes Yes Support helm chart has been modified Yes Core infrastructure has been modified
aws gridsst Yes Support helm chart has been modified Yes Core infrastructure has been modified
kubeconfig queensu Yes Support helm chart has been modified Yes Core infrastructure has been modified
aws catalystproject-africa Yes Support helm chart has been modified Yes Core infrastructure has been modified
aws earthscope Yes Support helm chart has been modified Yes Core infrastructure has been modified
aws nasa-veda Yes Support helm chart has been modified Yes Core infrastructure has been modified
aws jupyter-meets-the-earth Yes Support helm chart has been modified Yes Core infrastructure has been modified
aws maap Yes Support helm chart has been modified Yes Core infrastructure has been modified
aws nmfs-openscapes Yes Support helm chart has been modified Yes Core infrastructure has been modified
gcp catalystproject-latam Yes Support helm chart has been modified Yes Core infrastructure has been modified
gcp hhmi Yes Support helm chart has been modified Yes Core infrastructure has been modified
aws victor Yes Support helm chart has been modified Yes Core infrastructure has been modified
aws kitware Yes Support helm chart has been modified Yes Core infrastructure has been modified
aws strudel Yes Support helm chart has been modified Yes Core infrastructure has been modified
aws nasa-cryo Yes Support helm chart has been modified Yes Core infrastructure has been modified
aws ubc-eoas Yes Support helm chart has been modified Yes Core infrastructure has been modified
aws nasa-ghg Yes Support helm chart has been modified Yes Core infrastructure has been modified
kubeconfig utoronto Yes Support helm chart has been modified Yes Core infrastructure has been modified
gcp pangeo-hubs Yes Support helm chart has been modified Yes Core infrastructure has been modified
gcp leap Yes Support helm chart has been modified Yes Core infrastructure has been modified
aws jupyter-health Yes Support helm chart has been modified Yes Core infrastructure has been modified
aws projectpythia Yes Support helm chart has been modified Yes Core infrastructure has been modified
gcp awi-ciroh Yes Support helm chart has been modified Yes Core infrastructure has been modified
aws opensci Yes Support helm chart has been modified Yes Core infrastructure has been modified
aws 2i2c-aws-us Yes Support helm chart has been modified Yes Core infrastructure has been modified
gcp dubois Yes Support helm chart has been modified Yes Core infrastructure has been modified
gcp 2i2c Yes Support helm chart has been modified Yes Core infrastructure has been modified
gcp cloudbank Yes Support helm chart has been modified Yes Core infrastructure has been modified

Production deployments

Cloud Provider Cluster Name Hub Name Reason for Redeploy
gcp 2i2c-uk lis Core infrastructure has been modified
aws smithsonian prod Core infrastructure has been modified
aws openscapes prod Core infrastructure has been modified
aws openscapes workshop Core infrastructure has been modified
aws gridsst prod Core infrastructure has been modified
kubeconfig queensu prod Core infrastructure has been modified
aws catalystproject-africa nm-aist Core infrastructure has been modified
aws catalystproject-africa must Core infrastructure has been modified
aws catalystproject-africa uvri Core infrastructure has been modified
aws catalystproject-africa wits Core infrastructure has been modified
aws catalystproject-africa kush Core infrastructure has been modified
aws catalystproject-africa molerhealth Core infrastructure has been modified
aws catalystproject-africa aibst Core infrastructure has been modified
aws catalystproject-africa bhki Core infrastructure has been modified
aws catalystproject-africa bon Core infrastructure has been modified
aws earthscope prod Core infrastructure has been modified
aws nasa-veda prod Core infrastructure has been modified
aws nasa-veda binder Core infrastructure has been modified
aws jupyter-meets-the-earth prod Core infrastructure has been modified
aws maap prod Core infrastructure has been modified
aws nmfs-openscapes prod Core infrastructure has been modified
gcp catalystproject-latam unitefa-conicet Core infrastructure has been modified
gcp catalystproject-latam cicada Core infrastructure has been modified
gcp catalystproject-latam gita Core infrastructure has been modified
gcp catalystproject-latam iner Core infrastructure has been modified
gcp catalystproject-latam plnc Core infrastructure has been modified
gcp catalystproject-latam unam Core infrastructure has been modified
gcp catalystproject-latam cabana Core infrastructure has been modified
gcp catalystproject-latam nnb-ccg Core infrastructure has been modified
gcp catalystproject-latam labi Core infrastructure has been modified
gcp catalystproject-latam areciboc3 Core infrastructure has been modified
gcp catalystproject-latam valledellili Core infrastructure has been modified
gcp hhmi prod Core infrastructure has been modified
gcp hhmi spyglass Core infrastructure has been modified
gcp hhmi binder Core infrastructure has been modified
aws victor prod Core infrastructure has been modified
aws kitware prod Core infrastructure has been modified
aws strudel prod Core infrastructure has been modified
aws nasa-cryo prod Core infrastructure has been modified
aws ubc-eoas prod Core infrastructure has been modified
aws nasa-ghg prod Core infrastructure has been modified
kubeconfig utoronto prod Core infrastructure has been modified
kubeconfig utoronto r-prod Core infrastructure has been modified
gcp pangeo-hubs prod Core infrastructure has been modified
gcp pangeo-hubs coessing Core infrastructure has been modified
gcp leap prod Core infrastructure has been modified
gcp leap public Core infrastructure has been modified
aws jupyter-health prod Core infrastructure has been modified
aws projectpythia prod Core infrastructure has been modified
aws projectpythia pythia-binder Core infrastructure has been modified
gcp awi-ciroh prod Core infrastructure has been modified
aws opensci sciencecore Core infrastructure has been modified
aws opensci climaterisk Core infrastructure has been modified
aws opensci small-binder Core infrastructure has been modified
aws opensci big-binder Core infrastructure has been modified
aws 2i2c-aws-us showcase Core infrastructure has been modified
aws 2i2c-aws-us ncar-cisl Core infrastructure has been modified
aws 2i2c-aws-us itcoocean Core infrastructure has been modified
aws 2i2c-aws-us cosmicds Core infrastructure has been modified
gcp dubois prod Core infrastructure has been modified
gcp dubois ephemeral Core infrastructure has been modified
gcp 2i2c imagebuilding-demo Core infrastructure has been modified
gcp 2i2c binderhub-ui-demo Core infrastructure has been modified
gcp 2i2c demo Core infrastructure has been modified
gcp 2i2c ohw Core infrastructure has been modified
gcp 2i2c temple Core infrastructure has been modified
gcp 2i2c ucmerced Core infrastructure has been modified
gcp 2i2c mtu Core infrastructure has been modified
gcp cloudbank bcc Core infrastructure has been modified
gcp cloudbank ccc Core infrastructure has been modified
gcp cloudbank ccsf Core infrastructure has been modified
gcp cloudbank chabot Core infrastructure has been modified
gcp cloudbank csm Core infrastructure has been modified
gcp cloudbank csulb Core infrastructure has been modified
gcp cloudbank csum Core infrastructure has been modified
gcp cloudbank demo Core infrastructure has been modified
gcp cloudbank dvc Core infrastructure has been modified
gcp cloudbank elac Core infrastructure has been modified
gcp cloudbank elcamino Core infrastructure has been modified
gcp cloudbank evc Core infrastructure has been modified
gcp cloudbank fresno Core infrastructure has been modified
gcp cloudbank foothill Core infrastructure has been modified
gcp cloudbank glendale Core infrastructure has been modified
gcp cloudbank high Core infrastructure has been modified
gcp cloudbank howard Core infrastructure has been modified
gcp cloudbank humboldt Core infrastructure has been modified
gcp cloudbank lacc Core infrastructure has been modified
gcp cloudbank lamission Core infrastructure has been modified
gcp cloudbank laney Core infrastructure has been modified
gcp cloudbank lbcc Core infrastructure has been modified
gcp cloudbank mendocino Core infrastructure has been modified
gcp cloudbank merced Core infrastructure has been modified
gcp cloudbank mills Core infrastructure has been modified
gcp cloudbank miracosta Core infrastructure has been modified
gcp cloudbank mission Core infrastructure has been modified
gcp cloudbank moreno Core infrastructure has been modified
gcp cloudbank norco Core infrastructure has been modified
gcp cloudbank palomar Core infrastructure has been modified
gcp cloudbank pasadena Core infrastructure has been modified
gcp cloudbank reedley Core infrastructure has been modified
gcp cloudbank riohondo Core infrastructure has been modified
gcp cloudbank sacramento Core infrastructure has been modified
gcp cloudbank saddleback Core infrastructure has been modified
gcp cloudbank santiago Core infrastructure has been modified
gcp cloudbank sbcc Core infrastructure has been modified
gcp cloudbank sbcc-dev Core infrastructure has been modified
gcp cloudbank sierra Core infrastructure has been modified
gcp cloudbank sjcc Core infrastructure has been modified
gcp cloudbank sjsu Core infrastructure has been modified
gcp cloudbank skyline Core infrastructure has been modified
gcp cloudbank srjc Core infrastructure has been modified
gcp cloudbank tuskegee Core infrastructure has been modified
gcp cloudbank wlac Core infrastructure has been modified

@sgibson91
Copy link
Member

we need to store the Pagerduty key in a structure like this

This is the file to change the structure of https://github.com/2i2c-org/infrastructure/blob/main/helm-charts/support/enc-support.secret.values.yaml

@sgibson91
Copy link
Member

I have committed those changes to this PR!

@sgibson91
Copy link
Member

sgibson91 commented Dec 18, 2024

Documentation requirements for this:

@sunu sunu marked this pull request as ready for review December 19, 2024 07:22
@sunu sunu requested a review from sgibson91 December 19, 2024 07:22
@sunu
Copy link
Contributor Author

sunu commented Dec 19, 2024

@sgibson91 I've added documentation on how to enable alerts through Prometheus Alertmanager and how to resize the EBS volume used by the NFS server.

I created and merged #5297 to test the workflow for increasing EBS volume size. But weirdly, that didn't start any deployments at all. Can we look into that if you have time today?

@sgibson91
Copy link
Member

I created and merged #5297 to test the workflow for increasing EBS volume size. But weirdly, that didn't start any deployments at all. Can we look into that if you have time today?

We don't automatically do anything when terraform config is changed, because we don't fully trust it not to delete something if the plan isn't inspected.

Copy link
Member

@sgibson91 sgibson91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good to merge once you've update that the terraform changes need to be applied locally (we don't automatically deploy terraform changes in CI/CD)

@sunu
Copy link
Contributor Author

sunu commented Dec 19, 2024

Thanks @sgibson91! I've updated the PR accordingly.

@sgibson91 sgibson91 merged commit c9ba7b0 into 2i2c-org:main Dec 19, 2024
38 checks passed
Copy link

🎉🎉🎉🎉

Monitor the deployment of the hubs here 👉 https://github.com/2i2c-org/infrastructure/actions/runs/12412069094

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants