Provide some default Prometheus alerts #3754

jmreicha · 2019-02-11T19:06:34Z

What keywords did you search in NGINX Ingress controller issues before filing this one? (If you have found any duplicates, you should instead reply there.): prometheus alerts, metric alerts

Is this a BUG REPORT or FEATURE REQUEST? (choose one): Feature request

With the addition of the Grafana dashboard it would also be useful to have a set of "default" rules that can be applied to Prometheus to monitor the Nginx ingress.

In my searching I have been able to find some other projects that provide some level of alerting for various metrics. Kubernetes and kube-prometheus for example provide jsonnet for creating the rules.

There is also the Monitoring Mixin Design Doc which outlines a way to package rules, alerts and dashboards, which may be a way to provide these things.

NGINX Ingress controller version: v0.21.0

Kubernetes version (use kubectl version): 1.10.x

Environment: NA

The text was updated successfully, but these errors were encountered:

BrianChristie · 2019-02-13T13:51:07Z

I think it would be great to have a monitoring-mixin for ingress-nginx. It may be a bit tricky to figure out what alerts are appropriate, because I think we might want the following separate:

Control plane alerts for ingress-nginx
Data plane alerts for cluster-applications which use Nginx ingress and are experiencing failures

For example, as a cluster-operator, I don't necessarily want to be paged when a cluster-users application is failing. But equally I'd like to provide an easy way for cluster-users to get alerts/pages when their application is failing.

Does anyone have thoughts on how these can be cleanly separated?

fejta-bot · 2019-05-14T14:36:00Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

jmreicha · 2019-05-14T14:51:17Z

/remove-lifecycle stale

HaveFun83 · 2019-08-08T08:39:10Z

Will be really nice to have a baseline for this topic.

fejta-bot · 2019-11-06T09:32:24Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

jmreicha · 2019-11-06T15:28:29Z

/remove-lifecycle stale

fejta-bot · 2020-02-04T15:47:05Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

jmreicha · 2020-02-04T18:13:24Z

/remove-lifecycle stale

fejta-bot · 2020-05-04T18:57:34Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2020-06-03T19:42:29Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

jmreicha · 2020-06-04T01:19:18Z

/remove-lifecycle rotten

nlamirault · 2020-07-20T10:01:40Z

Any news ?

fejta-bot · 2020-10-18T10:56:40Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

sandhose · 2020-11-17T11:01:40Z

/remove-lifecycle stale

cmluciano · 2021-01-13T19:20:16Z

I'm not sure if this is something that should be provided by default since alerts are very specific to a given organization. If the request is for mixin or doc outlining some general advice, then I'm happy to review a PR.

/triage needs-information

fejta-bot · 2021-04-13T19:56:50Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

fejta-bot · 2021-05-13T20:31:29Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

nlamirault · 2021-05-17T13:06:35Z

/remove-lifecycle stale

fejta-bot · 2021-06-16T13:23:30Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

k8s-ci-robot · 2021-06-16T13:23:37Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 14, 2019

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 14, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 6, 2019

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 6, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 4, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 4, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 4, 2020

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 3, 2020

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jun 4, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 18, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 17, 2020

k8s-ci-robot added the triage/needs-information Indicates an issue needs more information in order to work on it. label Jan 13, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 13, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 13, 2021

k8s-ci-robot closed this as completed Jun 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide some default Prometheus alerts #3754

Provide some default Prometheus alerts #3754

jmreicha commented Feb 11, 2019

BrianChristie commented Feb 13, 2019

fejta-bot commented May 14, 2019

jmreicha commented May 14, 2019

HaveFun83 commented Aug 8, 2019

fejta-bot commented Nov 6, 2019

jmreicha commented Nov 6, 2019

fejta-bot commented Feb 4, 2020

jmreicha commented Feb 4, 2020

fejta-bot commented May 4, 2020

fejta-bot commented Jun 3, 2020

jmreicha commented Jun 4, 2020

nlamirault commented Jul 20, 2020

fejta-bot commented Oct 18, 2020

sandhose commented Nov 17, 2020

cmluciano commented Jan 13, 2021

fejta-bot commented Apr 13, 2021

fejta-bot commented May 13, 2021

nlamirault commented May 17, 2021

fejta-bot commented Jun 16, 2021

k8s-ci-robot commented Jun 16, 2021

Provide some default Prometheus alerts #3754

Provide some default Prometheus alerts #3754

Comments

jmreicha commented Feb 11, 2019

BrianChristie commented Feb 13, 2019

fejta-bot commented May 14, 2019

jmreicha commented May 14, 2019

HaveFun83 commented Aug 8, 2019

fejta-bot commented Nov 6, 2019

jmreicha commented Nov 6, 2019

fejta-bot commented Feb 4, 2020

jmreicha commented Feb 4, 2020

fejta-bot commented May 4, 2020

fejta-bot commented Jun 3, 2020

jmreicha commented Jun 4, 2020

nlamirault commented Jul 20, 2020

fejta-bot commented Oct 18, 2020

sandhose commented Nov 17, 2020

cmluciano commented Jan 13, 2021

fejta-bot commented Apr 13, 2021

fejta-bot commented May 13, 2021

nlamirault commented May 17, 2021

fejta-bot commented Jun 16, 2021

k8s-ci-robot commented Jun 16, 2021