Jaeger Query can OOM when retrieving large traces #1051

vprithvi · 2018-09-05T18:08:37Z

Requirement - what kind of business use case are you trying to solve?

Retrieve large traces
Be resilient against bad instrumentation using same traceID for all traces

Problem - what in Jaeger blocks you from solving the requirement?

Jaeger Query OOMs on retrieval of large traces on Cassandra.
If someone is crafty, they can easily create a trace with millions of spans, and attempt to retrieve it to systematically bring down all jaeger-query instances.

Proposed Solution - Cassandra

We might do some combination of the following:

Trace retrieval limits: Test that the number of spans per trace is less than a user defined threshold before retrieving spans.
Protect against large spans submitted on the HTTP POST endpoints by setting a user defined span size limit.
Limit number of concurrent requests served by the HTTP GET handler so that we can accurately predict and bound worst case memory utilization.

Any open questions to address

Does this affect ES as well?

pavolloffay · 2018-09-06T07:17:03Z

Related issue to ES, although its related to overloading ES not query #960

jpkrohling · 2018-09-06T07:20:47Z

And this is why it's a good idea to have the collector in a different instance for production environments :)

zigmund · 2018-10-04T04:26:32Z

Same here with ES. One failed microservice caused retry loop and we got giant trace with thousands spans. During request jaeger query eats all avaliable memory and get killed by OOM.

jpkrohling · 2018-10-04T08:16:43Z

We probably want to add this scenario to our test cases

cc @jkandasa

yurishkuro · 2018-10-12T20:06:58Z

We may consider adding LIMIT to Cassandra queries by trace ID.

pavolloffay · 2018-10-15T08:37:42Z

Limit on the number of spans in a trace?

yurishkuro · 2018-10-15T13:43:34Z

Well there are two separate things. We're seeing some traces in production with many millions of spans so we want to implement a limit on ingestion. But separately, there can be a limit on reading side to avoid loading too much even if that many spans were saved.

annanay25 · 2018-10-29T08:14:31Z

The olivere/elastic library supports sending a range query using From(0) and Size(100) to fetch a predefined number of documents - for ex: https://github.com/olivere/elastic/blob/release-branch.v6/example_test.go#L172

Would it make sense to pass this range as a param in MultiSearch here - https://github.com/jaegertracing/jaeger/blob/master/plugin/storage/es/spanstore/reader.go#L256 ?

pavolloffay · 2018-10-29T09:51:25Z

Perhaps yes, I am not 100% percent sure how the current search works. Would you like to submit a patch?

annanay25 · 2018-10-29T13:17:47Z

Sure, I'll give it a shot soon.

annanay25 · 2018-10-30T05:57:17Z

@pavolloffay could you review this? annanay25@2d167b8
Reduced the default fetch limit from 10,000 to 1,000. And added a FetchNext function to implement scroll.

yurishkuro · 2018-10-30T16:27:42Z

I think fetching in batches is not the issue here. If a trace has 1mm spans, whether you fetch in batches or all at once, the query service still needs to store all of them in memory before returning to the client, since we do not support streaming of the results (something to think about when designing the protobuf API for query).

I am not sure what impact using fetching has on the ES itself. At least with Cassandra we're getting a result set that we iterate through, so the driver & storage have the opportunity to load data in a streaming fashion, without Cassandra node holding everything in memory.

annanay25 · 2018-10-30T17:26:53Z

@yurishkuro the idea was that the query nodes would hold data of only one batch at a time in-memory, and would send this batch to the client (assuming the UI). The client could choose to read and then discard the spans in batches.

But as you have mentioned that jaeger does not support streaming of results, what approach do you recommend?

yurishkuro · 2018-10-30T18:53:20Z

I think the simplest fix is what is proposed in the description of this ticket. It's not ideal since it will truncate the data, but we can revisit that later when/if we support a streaming API from the query service. At least this fix is fairly straightforward and will protect the query svc from OOM. Most of the analytics will choke on a trace with >100k spans anyway. The limits will be an optional configuration.

zigmund · 2018-10-30T19:20:05Z

@yurishkuro I agree. Maybe should just mark trace in web interface that it is truncated due span count limit.

annanay25 · 2018-11-20T08:06:52Z

@yurishkuro truncated search results (truncated span-count as well, for ES) implemented for ES and Cassandra - annanay25@9eb7590

We were waiting on a test case for this? Is someone working on that?

yurishkuro · 2018-11-20T14:49:33Z

Not afaik

mohit-chawla · 2020-05-29T03:34:56Z

Not 100% related but i have a question:

Edge case: For a use case, i need to use in mem storage, was wondering about edge case where one huge trace takes up whole memory, will this cause any OOM errors or will the trace be discard?

do we have a max trace size available ?
how can i increase the mem allocated ?

cc: @annanay25

Sreevani871 · 2020-09-24T09:13:51Z

Encountering OOM issues while fetching traces from jaeger-UI.
Details :
Backend Storage: Elasticsearch
Instance RAM: 4 GB(~3.7 GB Free Memory)
Instance vCPU:2

We are encountering OOM kill upon fetching 20 traces(each trace consists of ~20K spans of 1kb size each) from UI. This is causing frequent process restarts.
While going through code, observed jaeger-query service FindTraces API is fetching all spans of a resulted traces in this call itself.
Would it be more efficient if it is feasible to fetch only top traceID's and related stats info to display in UI by using ES count queries and all spans of a trace can be queried on click on particular trace. In this case also OOM issue will come if a single trace has large set of spans. To avoid this can we just load default no of spans for a trace initially and providing an option like load more in UI to scroll further set of spans till the trace has no more spans left.

yurishkuro · 2020-09-24T17:21:49Z

@Sreevani871 yes, we even had tickets somewhere for that, but no volunteers to implement it.

Sreevani871 · 2020-09-25T11:07:24Z

@yurishkuro Any specific reason for fetching all info in FindTraces call itself from UI?

yurishkuro · 2020-09-25T14:33:33Z

If you mean the internal implementation of FindTraces, then yes, the reason is because there is no summary information about the full trace available anywhere else, since collectors receive spans one at a time. Storage is the only place that can provide a full trace, but you need to read that full trace in order to extract summary. It's possible to build a post-processing, but it will require additional infrastructure/coordination.

Sreevani871 · 2020-09-28T10:56:14Z

@Sreevani871 yes, we even had tickets somewhere for that, but no volunteers to implement it.

Can you share the tickets will have a look?

yurishkuro · 2020-09-28T15:21:27Z

There is context here: jaegertracing/jaeger-ui#247

Sreevani871 · 2020-10-12T07:57:06Z

@yurishkuro Any progress on this?

Sreevani871 · 2020-10-12T08:59:44Z

To overcome frequent OOM kills of jaeger-query service (Due to traces having a large number of spans), we need to avoid fetching of all spans data of respective TopN traces. Here listing out the approaches we thought of to solve the issue.
we are using elasticsearch as span storage. Please review and share your feedback

Approach - I:
Fetch summary of TopN traces instead of fetching all raw span data of respective traces from the backend
Returning the summary instead of all raw span data to UI avoids the summary generation step at the UI side which intern saves UI page load time.
Changes Required at Backend:

Introduce a new API: /summary to fetch a summary of traces.
Need to add code flow for /summary API which involves changes at querier and span reader interface which will enforce all span storage types to implement summary method at the reader side.
The API returns response as map[TraceID]TraceSummary{}

type TraceSummary  {
    ServiceName string.        //  service name of the old span of trace
    OperationName string    // operation name of the old span of trace
    StartTime string             // start time of the old span of a trace.
    Duration time.Duration // duration of the old span of a trace.
    SpanCount int               // total span count of a trace
    SpanCountByService map[string]int
    ErrorSpanCount int // error span count of a trace
    SpanID string
    }

Here old span of trace refers to parent span ideally. To avoid cases like parent span missing or parent span not indexed at the time fo search, considering fetching old span(sort by startTime) of a trace available at the time of the search.

Code Changes Required at UI:

UI has to call a new API to get the summary of traces data.
Make use of summary data returned from the backend to generate the FindTraces page.
Since UI does not have full span data of a trace, On click on trace, UI should call the backed for full span details.

Pros:

The memory usage for this is approach is very minimal and no.of the objects returned from the backend to UI is equal to no.of traces we fetch and each trace object carries the only summary.
Memory usage is better than Approach-2

Cons:

Requires code changes at both backend and frontend. Compare to Approach-2 UI requires considerable code changes to adopt this approach.

Note: FindTraces page has a Deep Dependency Graph option, Since we are avoiding fetching all span data in the FindTraces API call, Need to have a separate API which will fetch the required fields of spans for traces to construct DDG which will overlap with our idea of Approach-2.

Approach - II:
Analyze the required fieldset of span to compute the FindTraces UI page without modifying the existing API response data model.
Observed following fieldset of the span are being used to generate the FindTraces list view and DDG.

ServiceName
OperationName
References
Tags.Span.Kind
StartTime
Duration

Changes Required At Backend:

Change the ES query to fetch only selective fields from the source.
The following specifies data flow from ES to UI back.
FindTraces() method returns []jaeger.model.Trace
ES -> []byte -> MultiSearchResults{} -> es.dbmodel.Span -> jaeger.model.Span -> jaeger.model.Trace
- ConvertToUI model
[]jager.model.Trace -> []json.Trace
In the above flow data transformation(serialization/deserialization) steps are consuming 2 to 3X extra memory of actual data. Confirmed from heap profile alloc_space data.

To avoid this extra memory usage we can reduce these steps and will have the following code flow
ES ->[]byte -> MultiSearchResults{} -> []json.Trace.

Changes Required At Frontend:
- No considerable code changes required at the UI side expect the API call if we introduce a new API for this approach.
Pros:

Only changes required at the backend side. No changes required at the frontend side.
It will reduce the memory consumption by x percentage(since we are avoiding fetching unnecessary fieldset of a span).

Cons:

It will reduce the memory by x percentage compared to existing code, but no the objects returned from backed to UI not reduced here.
Ex: Fetching 20 traces with each trace has 20K spans resulting in 400K objects.
Approach-1 is comparatively better in terms of memory usage reduction and the UI page load time since we reduced UI operations of building a summary from data returned from the backend.

yurishkuro · 2020-10-12T22:51:27Z

Note: FindTraces page has a Deep Dependency Graph option...

That's a very astute observation, since DDG is built entirely by the UI and requires loading all spans of the trace (but not all data in the spans).

A more flexible variation of approach 2 is to build support for GraphQL. It's going to be difficult to support partial field retrieval at the storage level (some storage implementations may only allow retrieving the whole trace as a blob), so the whole data is still going to be loaded in memory from the database. The memory savings can come from refactoring the storage API to be more like stream, i.e. instead of synchronous:

FindTraces(query) ([]*model.Span, error)

implement the streaming version (incidentally, the grpc-plugin storage API already uses streaming under the hood):

FindTracesStreaming(query, handler func(*model.Span) error) error

The handler function can immediately convert model.Span to JSON output object, AND trim it down by filtering only the fields required by GraphQL query.

This will not reduce the number of memory allocations, but will allow query service not to hold onto every single span of the whole result set.

Later this can be further extended into approach 1, which will reduce memory requirements even further (but DDG from search will likely require a second query, which is probably ok since not every search is used to display DDG).

Sreevani871 · 2020-10-20T06:53:23Z

Using json.Unmarshal() func instead of json.Decode() func here https://github.com/jaegertracing/jaeger/blob/master/plugin/storage/es/spanstore/reader.go#L233 shows improvement on memory usage because json.Decode() func is buffering entire json value in memory again. Since here the data passed to unmarshalJSON() method is already in memory so using json.Unmarshal() is improving performance in terms of memory usage.
Benchmarking Results:


goos: darwin
goarch: amd64
BenchmarkUnmarshal-8     2251400               527 ns/op             352 B/op          5 allocs/op
BenchmarkDecoder-8       1470145               812 ns/op            1096 B/op          8 allocs/op

package test

import (
	"bytes"
	"testing"
	"fmt"
	"encoding/json"
)

type JSON struct {
	a []int
	b string
	c float64
	d []byte
}

var j = JSON{
	a: []int{1,2,4,5,566,9},
	b: "Testing Data",
	c: 145.0,
	d: []byte("Testing Data"),
}
func BenchmarkUnmarshal(b *testing.B){
	for n := 0; n < b.N; n++ {
		unmarshalJson(marshalJson(j))
	}
}

func BenchmarkDecoder(b *testing.B){
	for n := 0; n < b.N; n++ {
		decodeJson(marshalJson(j))
	}
}

func marshalJson(j JSON) []byte{
	v,err:=json.Marshal(j)
	if err !=nil {
		fmt.Println(err)
		return nil
	}
	return v
}
func unmarshalJson(data []byte) {
	var j =JSON{}
	err:=json.Unmarshal(data, &j)
	if err!=nil {
		fmt.Println(err)
	}
}

func decodeJson(data []byte) {
	var j = JSON{}
	d := json.NewDecoder(bytes.NewReader(data))
	d.UseNumber()
	if err := d.Decode(&j); err != nil {
		fmt.Println(err)
	}
}

yurishkuro · 2023-08-07T18:28:04Z

Interesting query benchmark numbers from a post on Slack:

1 trace with approximately 1700 spans returns in 1.5 seconds. (5 unique ES queries (+ 2 ES queries added per any additional trace/limit) all complete in under a second)
10 traces, each with approximately 1700 spans, return in 5 seconds. Memory: 150MB
100 traces, each with approximately 1700 spans, return in 30 seconds. Memory: 700MB - 1.5GB (203 unique ES queries, all of which complete in under 3 seconds)

jkowall · 2024-06-07T12:00:22Z

One possible solution here would be to limit the number of spans returned when pulling up a trace to 50k and allow this to be configurable.

yurishkuro · 2024-12-15T00:03:20Z

Many of the issues discussed in this issue will be alleviated by Storage v2 API (#5079) which will support streaming of the results. It will not address the issue if a single trace has millions of spans since it still needs to be loaded by the query service in order to apply adjustments and conversion, but the query service will be in control of pulling as much data as it wants to it can put a limit on the number of spans it's willing to receive and terminate the load after that.

jpkrohling added bug area/storage performance labels Sep 6, 2018

pavolloffay mentioned this issue Sep 12, 2018

Performance for large number of traces jaegertracing/jaeger-ui#247

Open

yurishkuro added the help wanted Features that maintainers are willing to accept but do not have cycles to implement label Oct 4, 2018

annanay25 mentioned this issue Nov 20, 2018

[query] Truncate search results to prevent OOM on jaeger-query. #1202

Closed

jpkrohling added the hacktoberfest label Sep 27, 2019

albertteoh mentioned this issue May 9, 2023

BUG: Remove TerminateAfter from Elasticsearch/Opensearch query resulting in incomplete span count/list #4336

Merged

yurishkuro mentioned this issue May 15, 2023

Add CLI configurable MaxNumSpans while retrieving spans from ES. #1283

Merged

yurishkuro removed the hacktoberfest label Feb 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jaeger Query can OOM when retrieving large traces #1051

Jaeger Query can OOM when retrieving large traces #1051

vprithvi commented Sep 5, 2018 •

edited

Loading

pavolloffay commented Sep 6, 2018

jpkrohling commented Sep 6, 2018

zigmund commented Oct 4, 2018 •

edited

Loading

jpkrohling commented Oct 4, 2018

yurishkuro commented Oct 12, 2018

pavolloffay commented Oct 15, 2018

yurishkuro commented Oct 15, 2018

annanay25 commented Oct 29, 2018

pavolloffay commented Oct 29, 2018

annanay25 commented Oct 29, 2018

annanay25 commented Oct 30, 2018

yurishkuro commented Oct 30, 2018

annanay25 commented Oct 30, 2018

yurishkuro commented Oct 30, 2018

zigmund commented Oct 30, 2018 •

edited

Loading

annanay25 commented Nov 20, 2018 •

edited

Loading

yurishkuro commented Nov 20, 2018

mohit-chawla commented May 29, 2020

Sreevani871 commented Sep 24, 2020 •

edited

Loading

yurishkuro commented Sep 24, 2020

Sreevani871 commented Sep 25, 2020

yurishkuro commented Sep 25, 2020

Sreevani871 commented Sep 28, 2020

yurishkuro commented Sep 28, 2020

Sreevani871 commented Oct 12, 2020

Sreevani871 commented Oct 12, 2020

yurishkuro commented Oct 12, 2020

Sreevani871 commented Oct 20, 2020

yurishkuro commented Aug 7, 2023

jkowall commented Jun 7, 2024

yurishkuro commented Dec 15, 2024

Jaeger Query can OOM when retrieving large traces #1051

Jaeger Query can OOM when retrieving large traces #1051

Comments

vprithvi commented Sep 5, 2018 • edited Loading

Requirement - what kind of business use case are you trying to solve?

Problem - what in Jaeger blocks you from solving the requirement?

Proposed Solution - Cassandra

Any open questions to address

pavolloffay commented Sep 6, 2018

jpkrohling commented Sep 6, 2018

zigmund commented Oct 4, 2018 • edited Loading

jpkrohling commented Oct 4, 2018

yurishkuro commented Oct 12, 2018

pavolloffay commented Oct 15, 2018

yurishkuro commented Oct 15, 2018

annanay25 commented Oct 29, 2018

pavolloffay commented Oct 29, 2018

annanay25 commented Oct 29, 2018

annanay25 commented Oct 30, 2018

yurishkuro commented Oct 30, 2018

annanay25 commented Oct 30, 2018

yurishkuro commented Oct 30, 2018

zigmund commented Oct 30, 2018 • edited Loading

annanay25 commented Nov 20, 2018 • edited Loading

yurishkuro commented Nov 20, 2018

mohit-chawla commented May 29, 2020

Sreevani871 commented Sep 24, 2020 • edited Loading

yurishkuro commented Sep 24, 2020

Sreevani871 commented Sep 25, 2020

yurishkuro commented Sep 25, 2020

Sreevani871 commented Sep 28, 2020

yurishkuro commented Sep 28, 2020

Sreevani871 commented Oct 12, 2020

Sreevani871 commented Oct 12, 2020

yurishkuro commented Oct 12, 2020

Sreevani871 commented Oct 20, 2020

yurishkuro commented Aug 7, 2023

jkowall commented Jun 7, 2024

yurishkuro commented Dec 15, 2024

vprithvi commented Sep 5, 2018 •

edited

Loading

zigmund commented Oct 4, 2018 •

edited

Loading

zigmund commented Oct 30, 2018 •

edited

Loading

annanay25 commented Nov 20, 2018 •

edited

Loading

Sreevani871 commented Sep 24, 2020 •

edited

Loading