-
Notifications
You must be signed in to change notification settings - Fork 348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fill JaegerStatus object with the number of spans and/or traces processed #248
Comments
After further reading of the docs, it's not clear whether the status object is meant to store such info, as it changes quite often. The reconcile response object can be set to be called again in certain time intervals (once per second, for instance), so, it might still be suitable. |
Working branch: https://github.com/jpkrohling/jaeger-operator/tree/248-JaegerStatus-object , depends on #231 / #249 . |
It seems a bit weird returning internal metrics. Could it instead return the result healthcheck and value of /version endpoint? |
In general, I agree, but these specific metrics are a good indication of the cluster state. It can certainly also return the result of health checks, but:
|
I am more interested in version than in whole health check - I agree that it is exposed by k8s API. About the metrics - I believe there should be a monitoring in the cluster where these metrics are exposed in dashboards. |
Yes, there should indeed be a place where these metrics are collected and displayed. Having it in the status could still be helpful for a quick check, to see if everything is wired correctly and for diagnosing possible issues (is the queue too high? are spans being stored at all?) How about we get this first implementation shipped and track possible improvements via a follow-up issue? I'm not sure which info will be really helpful in the day-to-day operations of the cluster. |
My preference would be to determine the information stored in the status based on specific requirements (usecases) for the operator. For example, if the operator needs to know when the current instance is becoming overloaded and needs to be scaled up, what stats does it need to make that decision? So drive the requirements based on specific actions that need to be taken by the operator. |
The current collector queue size is certainly one such metric that can/should be used when making a scaling up/down decision. I'm OK in removing the traces/spans metrics, leaving only the queue size. At this point, the most important IMO might be to have the basic feature implemented and adjust it based on feedback. |
Agree - just want to make sure we control what information is returned and ensure it is meeting a real need. So starting off with the usecase of automated scaling up/down based on load is a good first one to tackle. |
The operator can returns a status object for each instance. The status is currently empty and it would be helpful to provide at least the number of processed spans/traces as seen by the
/metrics
endpoint for thecollector
service. Metrics to use:The value in the status object should be the sum of those metrics across all
collector
pods for the given instance.The text was updated successfully, but these errors were encountered: