Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large traces cause problems #178

Closed
mabn opened this issue May 25, 2017 · 3 comments
Closed

Large traces cause problems #178

mabn opened this issue May 25, 2017 · 3 comments
Labels

Comments

@mabn
Copy link

mabn commented May 25, 2017

Currently jaeger has problems with traces which exceed few hundred spans.
I noticed following:

  • searching for traces fetches very large payloads - basically /api/traces returns all matching traces along with spans, tags and logs, but the UI shows only a summary like number of spans per service, duration, etc. The search response should contain only the details required to render the screen. Currently the UI becomes very sluggish when the search result contains a large trace or does not render the results at all.
  • showing trace view for a very large trace has similar performance issues - the response payload is very large, the UI is sluggish or does not render.
    • In this case it would make sense to only fetch some part of the trace duration - e.g. first 200 spans. Or top 200 spans when counting distance from the root span. Some way to expand additional spans could allow to gradually navigate the trace. Or maybe some span filters would do the job.
    • another idea that comes to mind would involve dedicating some bits in the span id for "distance from root" to make it possible to search for the top spans efficiently
    • one more way to improve it would be to fetch spans without tags/log and fetch those only when "tags" or "logs" section on a span is "expanded"
  • paylods for large traces are pretty large (10MB, 100MB and more) and the browser has problems with processing them

I was able to view traces that had more than 1k spans (it was laggy and took some time), but not ~5k.

The traces I encountered were usually created by long iterative processes - e.g. recalculating something on thousands of records.
Another more pathological reason is bad communication design - an example would be a process which emits thousands of messages instead of using some batching.

Related post on jaeger-tracing group: here

@yurishkuro
Copy link
Member

The first issue (search results) can be solved by implementing graphql #169.

The second issue (trace view) might be solved by graphql and some serious improvements in the UI to do incremental loads for large traces.

@yurishkuro
Copy link
Member

@mabn the 0.9.0 release contains significant improvements to the UI when working with large traces. Can you try again?

@black-adder black-adder added the ui label Nov 28, 2017
@mabn
Copy link
Author

mabn commented Dec 11, 2017

I've tried on 1.0. It's much better - the search results appeared even though there was a large trace there - it showed it as having 10k spans (probably truncated?) with around 9MB (uncompressed) json. I also managed to open the trace view. It is slow and hangs sometimes for a fraction of a second, but it can be used to see something if needed.

Previously it wasn't working at all and was causing the browser's tab to hang. So - yes, it's much better now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants