-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Having duplicate analysis entries will corrupt /metrics end-point #2374
Comments
This issue is stale because it has been open 60 days with no activity. |
We're running into this issue as well. We're currently evaluating options on how to best solve this issue. |
A quick untested fix for this very specific issue could be: diff --git a/controller/metrics/analysis.go b/controller/metrics/analysis.go
index 12480b76..f4bcc67d 100644
--- a/controller/metrics/analysis.go
+++ b/controller/metrics/analysis.go
@@ -107,8 +107,17 @@ func collectAnalysisTemplate(ch chan<- prometheus.Metric, namespace, name string
}
addGauge(MetricAnalysisTemplateInfo, 1)
- for _, metric := range at.Metrics {
+ seen := make(map[string]struct{}, len(at.Metrics))
+ for i, metric := range at.Metrics {
metricType := metricproviders.Type(metric)
- addGauge(MetricAnalysisTemplateMetricInfo, 1, metricType, metric.Name)
+ metricName := metric.Name
+ _, found := seen[metricName]
+ if found {
+ metricName = fmt.Sprintf("%s-%d", metricName, i)
+ } else {
+ seen[metricName] = struct{}{}
+ }
+
+ addGauge(MetricAnalysisTemplateMetricInfo, 1, metricType, metricName)
}
} but looking at the bigger picture, #1946 is probably the best long term fix. |
@pims I would be interested in either a fix for this specific issue or the bigger fix of moving validation logic to a admission webhook, are you interested in contributing? |
Let's start with the fix for this issue and discuss a plan for moving the validation logic to an admission webhook in #1946 |
Checklist:
Describe the bug
If a user accidentally creates two Analysis' with duplicate names, it will corrupt the metrics such that Argo gets put into a broken state. The /metrics endpoint will return an error, this will cause k8s to put into an unready state.
To Reproduce
Create an AnalysisTemplate with duplicated analyses (https://gist.github.com/MarkSRobinson/7203d76d265a8217919662c1b57eb9cb). This will cause the metrics endpoint to become and stay broken.
Just
kubectl apply -f
the supplied file and argo rollouts will have a broken metrics endpoint.Restarting argo does not fix it.
Deleting the analysis template does fix it.
Expected behavior
Posting an invalid analysis template should not break the /metrics endpoint.
Screenshots
Output from /metrics
Version
1.3.1
Logs
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.
The text was updated successfully, but these errors were encountered: