Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fill JaegerStatus object with the number of spans and/or traces processed #248

Closed
jpkrohling opened this issue Mar 1, 2019 · 9 comments
Closed
Assignees

Comments

@jpkrohling
Copy link
Contributor

jpkrohling commented Mar 1, 2019

The operator can returns a status object for each instance. The status is currently empty and it would be helpful to provide at least the number of processed spans/traces as seen by the /metrics endpoint for the collector service. Metrics to use:

jaeger_collector_spans_received
jaeger_collector_traces_received
jaeger_collector_queue_length
jaeger_collector_spans_dropped_total

The value in the status object should be the sum of those metrics across all collector pods for the given instance.

@jpkrohling jpkrohling self-assigned this Mar 1, 2019
@jpkrohling
Copy link
Contributor Author

After further reading of the docs, it's not clear whether the status object is meant to store such info, as it changes quite often.

The reconcile response object can be set to be called again in certain time intervals (once per second, for instance), so, it might still be suitable.

@jpkrohling
Copy link
Contributor Author

@pavolloffay
Copy link
Member

It seems a bit weird returning internal metrics.

Could it instead return the result healthcheck and value of /version endpoint?

@jpkrohling
Copy link
Contributor Author

In general, I agree, but these specific metrics are a good indication of the cluster state.

It can certainly also return the result of health checks, but:

  1. Those should be checked by Kubernetes already. Any nodes with a failing health check should be rescheduled by Kubernetes automatically
  2. What to do with conflicting info? Like, 2 nodes have a failing health check, 2 have a "ok" state. What should the cluster state be? Should we list the state individually, per node? Same question applies for the "/version".

@pavolloffay
Copy link
Member

I am more interested in version than in whole health check - I agree that it is exposed by k8s API.

About the metrics - I believe there should be a monitoring in the cluster where these metrics are exposed in dashboards.

@jpkrohling
Copy link
Contributor Author

Yes, there should indeed be a place where these metrics are collected and displayed. Having it in the status could still be helpful for a quick check, to see if everything is wired correctly and for diagnosing possible issues (is the queue too high? are spans being stored at all?)

How about we get this first implementation shipped and track possible improvements via a follow-up issue? I'm not sure which info will be really helpful in the day-to-day operations of the cluster.

@objectiser
Copy link
Contributor

objectiser commented Mar 4, 2019

My preference would be to determine the information stored in the status based on specific requirements (usecases) for the operator.

For example, if the operator needs to know when the current instance is becoming overloaded and needs to be scaled up, what stats does it need to make that decision? So drive the requirements based on specific actions that need to be taken by the operator.

@jpkrohling
Copy link
Contributor Author

The current collector queue size is certainly one such metric that can/should be used when making a scaling up/down decision. I'm OK in removing the traces/spans metrics, leaving only the queue size.

At this point, the most important IMO might be to have the basic feature implemented and adjust it based on feedback.

@objectiser
Copy link
Contributor

objectiser commented Mar 4, 2019

Agree - just want to make sure we control what information is returned and ensure it is meeting a real need. So starting off with the usecase of automated scaling up/down based on load is a good first one to tackle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants