Pin prometheus chart version to an older version #720

GeorgianaElena · 2021-09-29T13:33:23Z

Fixes #616

…oards

GeorgianaElena · 2021-09-29T13:48:09Z

I'm currently trying to deploy it locally with:

python3 deployer deploy-support 2i2c

to see if it works.

Update:
It failed with Error: UPGRADE FAILED: context deadline exceeded, but the new support-prometheus-server pod it's in ContainerCreating state for 38m.

Hmm, not sure it's normal for it to take this much time.

GeorgianaElena · 2021-09-29T14:24:32Z

Running kubectl describe on that pod showed this error:

Multi-Attach error for volume "pvc-dae5a17d-763f-4dd5-b73a-3006d436a7ed" Volume is already used by pod(s) support-p
rometheus-server-59b9b6f98c-9x462

So, I guess I need to manually delete the old pod?

yuvipanda · 2021-09-30T07:39:38Z

@GeorgianaElena yep, you should manually delete it. Prometheus should have config set like this to prevent this from happening. When the strategy is set to Recreate, the first pod is fully deleted (thus releasing the PVC) before the new pod comes up. The default is RollingUpdate which tries to bring up the new pod before old pod is deleted - causing this issue wrt PVCs. This only happens sometimes I think - I'm not so sure why, perhaps only if the new pod is in a different node?

GeorgianaElena · 2021-09-30T10:06:57Z

Thanks @yuvipanda! The deployment went ok after deleting the pod 🎉

However, it doesn't look like anything's changed in https://grafana.pilot.2i2c.cloud. Is there anything else I should do, or the dashboards should just have magically appeared if this prometheus pin would had fixed the issue?

yuvipanda · 2021-09-30T10:19:22Z

@GeorgianaElena I think github.com/jupyterhub/jupyterhub-grafana need to be deployed there to see if it works. Can you give that a shot too?

GeorgianaElena · 2021-09-30T12:29:42Z

@GeorgianaElena I think github.com/jupyterhub/jupyterhub-grafana need to be deployed there to see if it works. Can you give that a shot too?

I just did and I think they work 🎉

yuvipanda · 2021-09-30T13:15:45Z

Yay awesome!

We need to also automate deployment of rhe gradana dashboard too

choldgraf · 2021-10-01T03:53:15Z

yesssss dashboards!!! thanks @GeorgianaElena

@yuvipanda I assume you mean that automation should be tackled in the future, not as part of this PR?

yuvipanda · 2021-10-01T13:24:50Z

@choldgraf yep!

damianavila · 2021-10-01T13:33:47Z

I assume you mean that automation should be tackled in the future, not as part of this PR?

I quickly searched about grafana and prometheus open issues and there are several of them.
Maybe it is time to group those together under the topic "infra to get info about our infra" 😉 .
I know this could be view as something belonging to the "Managed JupyterHubs Infrastructure" column in the deliverables board, but maybe we need more granularity there?

During deployment of 2i2c-org#720, sometimes CI would fail because the prometheus pod would be stuck in 'ContainerCreating', as the old pod was holding on to the persistent disk the new pod needs to start. This was temporarily fixed by deleting the prometheus pod, but this tells kubernetes to delete the old pod properly first before starting the new one. Ref 2i2c-org#720 (comment)

choldgraf · 2021-10-05T22:15:14Z

@damianavila I don't think that I follow. Do you mean have a high-level issue about grafana/prometheus specifically?

@yuvipanda what are the things that we need to do in order to automatically deploy Grafana for our hubs? Is that something we should track in a new issue?

yuvipanda · 2021-10-06T13:12:21Z

@choldgraf i opened #739 to track that

damianavila · 2021-10-07T20:40:16Z

@damianavila I don't think that I follow. Do you mean have a high-level issue about grafana/prometheus specifically?

I was thinking more about a sort of "topic" label? The Managed JupyterHubs Infrastructure column is pretty big so some filtering by label would help to visibilize related stuff...

choldgraf · 2021-10-08T00:22:04Z

@damianavila ahhh - yes, I totally agree. That column has way more stuff in general than all of the other columns and I also find it hard to parse 😅. Another option is to create a label like

🏷️ reporting

Would that make sense?

damianavila · 2021-10-08T15:06:29Z

reporting? How that would work? It is not clear to me...

choldgraf · 2021-10-08T15:21:16Z

I was proposing that the topic we use to describe "infra to get info about our infra" is "reporting"

damianavila · 2021-10-08T15:37:58Z

Thanks for the clarification... reporting is too wide and prone to be confusing, IMHO, but I guess it is OK since I can come up with a better word 😉 .

choldgraf · 2021-10-08T16:18:05Z

fair enough - what about 🏷️ ops reporting or 🏷️ infrastructure reporting?

damianavila · 2021-10-09T00:56:44Z

infrastructure reporting, I guess?

yuvipanda · 2021-10-10T22:39:44Z

I think 'monitoring' is a more common industry standard term for this. https://sre.google/sre-book/monitoring-distributed-systems/ is a nice read

damianavila · 2021-10-11T02:41:32Z

Yep, 💯 to monitoring.

choldgraf · 2021-10-11T03:05:02Z

ah I knew there was a better word haha, thanks - will go with that

Pin prometheus chart version to an older version to fix Grafana dashb…

eabee95

…oards

yuvipanda approved these changes Sep 30, 2021

View reviewed changes

damianavila approved these changes Sep 30, 2021

View reviewed changes

GeorgianaElena merged commit da5cd99 into 2i2c-org:master Oct 1, 2021

yuvipanda mentioned this pull request Oct 5, 2021

Set prometheus server's deployment strategy to recreate #735

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pin prometheus chart version to an older version #720

Pin prometheus chart version to an older version #720

GeorgianaElena commented Sep 29, 2021

GeorgianaElena commented Sep 29, 2021 •

edited

Loading

GeorgianaElena commented Sep 29, 2021

yuvipanda commented Sep 30, 2021

GeorgianaElena commented Sep 30, 2021

yuvipanda commented Sep 30, 2021

GeorgianaElena commented Sep 30, 2021

yuvipanda commented Sep 30, 2021

choldgraf commented Oct 1, 2021

yuvipanda commented Oct 1, 2021

damianavila commented Oct 1, 2021

choldgraf commented Oct 5, 2021

yuvipanda commented Oct 6, 2021

damianavila commented Oct 7, 2021

choldgraf commented Oct 8, 2021

damianavila commented Oct 8, 2021

choldgraf commented Oct 8, 2021

damianavila commented Oct 8, 2021

choldgraf commented Oct 8, 2021 •

edited

Loading

damianavila commented Oct 9, 2021

yuvipanda commented Oct 10, 2021 •

edited

Loading

damianavila commented Oct 11, 2021

choldgraf commented Oct 11, 2021

Pin prometheus chart version to an older version #720

Pin prometheus chart version to an older version #720

Conversation

GeorgianaElena commented Sep 29, 2021

GeorgianaElena commented Sep 29, 2021 • edited Loading

GeorgianaElena commented Sep 29, 2021

yuvipanda commented Sep 30, 2021

GeorgianaElena commented Sep 30, 2021

yuvipanda commented Sep 30, 2021

GeorgianaElena commented Sep 30, 2021

yuvipanda commented Sep 30, 2021

choldgraf commented Oct 1, 2021

yuvipanda commented Oct 1, 2021

damianavila commented Oct 1, 2021

choldgraf commented Oct 5, 2021

yuvipanda commented Oct 6, 2021

damianavila commented Oct 7, 2021

choldgraf commented Oct 8, 2021

damianavila commented Oct 8, 2021

choldgraf commented Oct 8, 2021

damianavila commented Oct 8, 2021

choldgraf commented Oct 8, 2021 • edited Loading

damianavila commented Oct 9, 2021

yuvipanda commented Oct 10, 2021 • edited Loading

damianavila commented Oct 11, 2021

choldgraf commented Oct 11, 2021

GeorgianaElena commented Sep 29, 2021 •

edited

Loading

choldgraf commented Oct 8, 2021 •

edited

Loading

yuvipanda commented Oct 10, 2021 •

edited

Loading