Fill JaegerStatus object with the number of spans and/or traces processed #248

jpkrohling · 2019-03-01T09:56:20Z

The operator can returns a status object for each instance. The status is currently empty and it would be helpful to provide at least the number of processed spans/traces as seen by the /metrics endpoint for the collector service. Metrics to use:

jaeger_collector_spans_received
jaeger_collector_traces_received
jaeger_collector_queue_length
jaeger_collector_spans_dropped_total

The value in the status object should be the sum of those metrics across all collector pods for the given instance.

The text was updated successfully, but these errors were encountered:

jpkrohling · 2019-03-01T10:18:46Z

After further reading of the docs, it's not clear whether the status object is meant to store such info, as it changes quite often.

The reconcile response object can be set to be called again in certain time intervals (once per second, for instance), so, it might still be suitable.

jpkrohling · 2019-03-01T15:05:17Z

Working branch: https://github.com/jpkrohling/jaeger-operator/tree/248-JaegerStatus-object , depends on #231 / #249 .

pavolloffay · 2019-03-04T09:01:15Z

It seems a bit weird returning internal metrics.

Could it instead return the result healthcheck and value of /version endpoint?

jpkrohling · 2019-03-04T09:20:34Z

In general, I agree, but these specific metrics are a good indication of the cluster state.

It can certainly also return the result of health checks, but:

Those should be checked by Kubernetes already. Any nodes with a failing health check should be rescheduled by Kubernetes automatically
What to do with conflicting info? Like, 2 nodes have a failing health check, 2 have a "ok" state. What should the cluster state be? Should we list the state individually, per node? Same question applies for the "/version".

pavolloffay · 2019-03-04T09:31:24Z

I am more interested in version than in whole health check - I agree that it is exposed by k8s API.

About the metrics - I believe there should be a monitoring in the cluster where these metrics are exposed in dashboards.

jpkrohling · 2019-03-04T09:38:39Z

Yes, there should indeed be a place where these metrics are collected and displayed. Having it in the status could still be helpful for a quick check, to see if everything is wired correctly and for diagnosing possible issues (is the queue too high? are spans being stored at all?)

How about we get this first implementation shipped and track possible improvements via a follow-up issue? I'm not sure which info will be really helpful in the day-to-day operations of the cluster.

objectiser · 2019-03-04T09:54:29Z

My preference would be to determine the information stored in the status based on specific requirements (usecases) for the operator.

For example, if the operator needs to know when the current instance is becoming overloaded and needs to be scaled up, what stats does it need to make that decision? So drive the requirements based on specific actions that need to be taken by the operator.

jpkrohling · 2019-03-04T10:01:28Z

The current collector queue size is certainly one such metric that can/should be used when making a scaling up/down decision. I'm OK in removing the traces/spans metrics, leaving only the queue size.

At this point, the most important IMO might be to have the basic feature implemented and adjust it based on feedback.

objectiser · 2019-03-04T10:03:29Z

Agree - just want to make sure we control what information is returned and ensure it is meeting a real need. So starting off with the usecase of automated scaling up/down based on load is a good first one to tackle.

jpkrohling self-assigned this Mar 1, 2019

jpkrohling mentioned this issue Mar 4, 2019

Store back the CR only if it has changed #249

Merged

jpkrohling mentioned this issue Mar 4, 2019

Added metrics to JaegerStatus object #254

Merged

jpkrohling closed this as completed in #254 Mar 4, 2019

jpkrohling mentioned this issue Mar 5, 2019

JaegerStatus object #259

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fill JaegerStatus object with the number of spans and/or traces processed #248

Fill JaegerStatus object with the number of spans and/or traces processed #248

jpkrohling commented Mar 1, 2019 •

edited

Loading

jpkrohling commented Mar 1, 2019

jpkrohling commented Mar 1, 2019

pavolloffay commented Mar 4, 2019

jpkrohling commented Mar 4, 2019

pavolloffay commented Mar 4, 2019

jpkrohling commented Mar 4, 2019

objectiser commented Mar 4, 2019 •

edited

Loading

jpkrohling commented Mar 4, 2019

objectiser commented Mar 4, 2019 •

edited

Loading

Fill JaegerStatus object with the number of spans and/or traces processed #248

Fill JaegerStatus object with the number of spans and/or traces processed #248

Comments

jpkrohling commented Mar 1, 2019 • edited Loading

jpkrohling commented Mar 1, 2019

jpkrohling commented Mar 1, 2019

pavolloffay commented Mar 4, 2019

jpkrohling commented Mar 4, 2019

pavolloffay commented Mar 4, 2019

jpkrohling commented Mar 4, 2019

objectiser commented Mar 4, 2019 • edited Loading

jpkrohling commented Mar 4, 2019

objectiser commented Mar 4, 2019 • edited Loading

jpkrohling commented Mar 1, 2019 •

edited

Loading

objectiser commented Mar 4, 2019 •

edited

Loading

objectiser commented Mar 4, 2019 •

edited

Loading