Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(mimir/rules/k8s): retry startup process on failure #5906

Conversation

hainenber
Copy link
Contributor

PR Description

Which issue(s) this PR fixes

Fixes #5650

Notes to the Reviewer

Should I allow the backoff settings configurable?

Logs when the component retries on its failed startups

d2edf8b5faaa6 node_id=tracing duration=3.75µs
ts=2023-12-02T09:55:10.492555426Z level=info msg="finished node evaluation" controller_id="" trace_id=f455aa145d4bb36bfc8d2edf8b5faaa6 node_id=logging duration=1.583µs
ts=2023-12-02T09:55:10.492571676Z level=info msg="finished node evaluation" controller_id="" trace_id=f455aa145d4bb36bfc8d2edf8b5faaa6 node_id=labelstore duration=12.625µs
ts=2023-12-02T09:55:10.492580426Z level=info msg="finished node evaluation" controller_id="" trace_id=f455aa145d4bb36bfc8d2edf8b5faaa6 node_id=otel duration=1µs
ts=2023-12-02T09:55:10.492583634Z level=info msg="finished complete graph evaluation" controller_id="" trace_id=f455aa145d4bb36bfc8d2edf8b5faaa6 duration=956.542µs
ts=2023-12-02T09:55:10.492684676Z level=info msg="scheduling loaded components and services"
ts=2023-12-02T09:55:10.492788093Z level=info msg="starting cluster node" peers="" advertise_addr=127.0.0.1:80
ts=2023-12-02T09:55:10.492997801Z level=info msg="peers changed" new_peers=grafana-agent-test-mzxsb
ts=2023-12-02T09:55:10.493016801Z level=info msg="now listening for http traffic" service=http addr=0.0.0.0:80
ts=2023-12-02T09:55:10.694624218Z level=error msg="failed to list rules from mimir" component=mimir.rules.kubernetes.promRules err="Get \"abc/prometheus/config/v1/rules\": unsupported protocol scheme \"\""
ts=2023-12-02T09:55:10.694697301Z level=error msg="starting up component failed" component=mimir.rules.kubernetes.promRules err="Get \"abc/prometheus/config/v1/rules\": unsupported protocol scheme \"\""
ts=2023-12-02T09:55:12.615748677Z level=error msg="failed to list rules from mimir" component=mimir.rules.kubernetes.promRules err="Get \"abc/prometheus/config/v1/rules\": unsupported protocol scheme \"\""
ts=2023-12-02T09:55:12.61595976Z level=error msg="starting up component failed" component=mimir.rules.kubernetes.promRules err="Get \"abc/prometheus/config/v1/rules\": unsupported protocol scheme \"\""
ts=2023-12-02T09:55:14.992224053Z level=error msg="failed to list rules from mimir" component=mimir.rules.kubernetes.promRules err="Get \"abc/prometheus/config/v1/rules\": unsupported protocol scheme \"\""

PR Checklist

  • CHANGELOG.md updated

Copy link
Member

@tpaschalis tpaschalis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a lot @hainenber 🙌

We can make the backoff configurable in a follow-up PR if it's deemed useful.

@tpaschalis tpaschalis enabled auto-merge (squash) December 8, 2023 15:07
@tpaschalis tpaschalis merged commit 7424b92 into grafana:main Dec 8, 2023
8 checks passed
BarunKGP pushed a commit to BarunKGP/grafana-agent that referenced this pull request Feb 20, 2024
@github-actions github-actions bot added the frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed. label Feb 21, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 21, 2024
@hainenber hainenber deleted the retry-failed-startup-for-mimir-rules-kubernetes branch June 25, 2024 17:15
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mimir.rules.kubernetes does not retry on error
2 participants