-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat!: add operator-metrics port #171
Conversation
Might be worth bumping version in Chart.yml (major one?) and re-running helm-docs. |
What are your thoughts on the "shared" ServiceMonitor? We now have a single ServiceMonitor for the pyrra-server and the pyrra-operator. Both containers run as part of a single pod. I think it's acceptable, but the Changing the |
maybe we should just split configuration between operator and pyrra? ie pyrra.serviceMonitor, pyrra.prometheusUrl, pyrraOperator.serviceMonitor |
I think this is a good idea. But when doing this we could also split them up into 2 deployments. On the other hand this might unnecessarily expand the original problem and should be done in another PR. What do you think? |
I can split the ServiceMonitor as part of this PR. This will already allow separate configuration and jobNames. In a follow-up PR it can be changed to use two deployments. |
1bc2716
to
8e958eb
Compare
8e958eb
to
b6bc47f
Compare
I finally came back to this and split the ServiceMonitor into one for operator and one for the server. I moved the config properties under If you want to move forward with the idea to split the operator and server into separate deployments, which I think it the right way to do, then most properties of the value file can be duplicated moved under |
@@ -20,6 +20,8 @@ additionalLabels: {} | |||
extraApiArgs: [] | |||
# -- Extra args for Pyrra's Kubernetes container | |||
extraKubernetesArgs: [] | |||
# -- Address to expose operator metrics | |||
operatorMetricsAddress: ":8080" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would either use the operatorMetricsAddress
or operatorMetricsPort
as they have to align. If you want you can make this possible to overwrite, something like if operatorMetricsPort is ""
then use {{ include "pyrra.operatorMetricsPort" . }}
. Could possibly be done in the helpers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, the service port can be configured independently of the container port via .Values.service.operatorMetricsPort
. The container port is taken from operatorMetricsAddress
, so they are aligned.
The service port and the container port do not have to align.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My idea was to extract operatorMetricsPort
from operatorMetricsAddress
but should be also fine to leave it for now like it is
Thank you for pushing this @fstr . I added two small nits, can you please have a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@rlex what do you think? |
So far so good i think, but it's probably worth bumping it to 1.0.0 since it's pretty major change. |
As we've discussed also splitting the Deployments and subsequently the Services, should we wait with the bump to 1.0.0 until that is done? Then we have separated them fully.
|
@fstr can you please fix the docs as stated in the CI. |
@rlex do you have anything to add? |
@rlex can we get this merged? |
Sorry for delay! Will merge as soon as CI passes. |
Description
Expose Pyrra Kubernetes operator container metrics on port 8080. With these metrics, we can also get kube-builder metrics like
controller_runtime_reconcile_errors_total
on which we can build an alert.The alert can be optionally enabled. Due to the reconciliation loop interval in the operator, we use a 20 minute interval on the rate function, as it is long enough to avoid dropping to 0 and having a flapping alert while the operator is not reconciling (and reporting a reconciliation error). The alert with resolve slightly delayed because of this, but it is much better than having no alert.
This feature is especially useful, as the Pyrra WebUI currently breaks and shows nothing if a broken SLO has been applied to your Kubernetes cluster.
How can this be tested
Enable the PyrraReconciliationError alert via
prometheusRule.enabled: true
and deploy a broken/invalid SLO. Even if the ValidatingWebhook is active, this SLO will be accepted but won't reconcile, as theerrors.metric
is not a valid Vector Selector due to theor expr
clause.BREAKING CHANGE: This is a breaking change, as both containers expose the default Golang metrics. Users that built monitoring on the existing metrics now have to separate them by container label.