Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jaeger Query can OOM when retrieving large traces #1051

Open
vprithvi opened this issue Sep 5, 2018 · 31 comments
Open

Jaeger Query can OOM when retrieving large traces #1051

vprithvi opened this issue Sep 5, 2018 · 31 comments
Labels
area/storage bug help wanted Features that maintainers are willing to accept but do not have cycles to implement performance

Comments

@vprithvi
Copy link
Contributor

vprithvi commented Sep 5, 2018

Requirement - what kind of business use case are you trying to solve?

  • Retrieve large traces
  • Be resilient against bad instrumentation using same traceID for all traces

Problem - what in Jaeger blocks you from solving the requirement?

Jaeger Query OOMs on retrieval of large traces on Cassandra.
If someone is crafty, they can easily create a trace with millions of spans, and attempt to retrieve it to systematically bring down all jaeger-query instances.

Proposed Solution - Cassandra

We might do some combination of the following:

  • Trace retrieval limits: Test that the number of spans per trace is less than a user defined threshold before retrieving spans.
  • Protect against large spans submitted on the HTTP POST endpoints by setting a user defined span size limit.
  • Limit number of concurrent requests served by the HTTP GET handler so that we can accurately predict and bound worst case memory utilization.

Any open questions to address

  • Does this affect ES as well?
@pavolloffay
Copy link
Member

Related issue to ES, although its related to overloading ES not query #960

@jpkrohling
Copy link
Contributor

And this is why it's a good idea to have the collector in a different instance for production environments :)

@zigmund
Copy link

zigmund commented Oct 4, 2018

Same here with ES. One failed microservice caused retry loop and we got giant trace with thousands spans. During request jaeger query eats all avaliable memory and get killed by OOM.

@jpkrohling
Copy link
Contributor

We probably want to add this scenario to our test cases

cc @jkandasa

@yurishkuro yurishkuro added the help wanted Features that maintainers are willing to accept but do not have cycles to implement label Oct 4, 2018
@yurishkuro
Copy link
Member

We may consider adding LIMIT to Cassandra queries by trace ID.

@pavolloffay
Copy link
Member

Limit on the number of spans in a trace?

@yurishkuro
Copy link
Member

Well there are two separate things. We're seeing some traces in production with many millions of spans so we want to implement a limit on ingestion. But separately, there can be a limit on reading side to avoid loading too much even if that many spans were saved.

@annanay25
Copy link
Member

The olivere/elastic library supports sending a range query using From(0) and Size(100) to fetch a predefined number of documents - for ex: https://github.com/olivere/elastic/blob/release-branch.v6/example_test.go#L172

Would it make sense to pass this range as a param in MultiSearch here - https://github.com/jaegertracing/jaeger/blob/master/plugin/storage/es/spanstore/reader.go#L256 ?

@pavolloffay
Copy link
Member

Perhaps yes, I am not 100% percent sure how the current search works. Would you like to submit a patch?

@annanay25
Copy link
Member

Sure, I'll give it a shot soon.

@annanay25
Copy link
Member

@pavolloffay could you review this? annanay25@2d167b8
Reduced the default fetch limit from 10,000 to 1,000. And added a FetchNext function to implement scroll.

@yurishkuro
Copy link
Member

I think fetching in batches is not the issue here. If a trace has 1mm spans, whether you fetch in batches or all at once, the query service still needs to store all of them in memory before returning to the client, since we do not support streaming of the results (something to think about when designing the protobuf API for query).

I am not sure what impact using fetching has on the ES itself. At least with Cassandra we're getting a result set that we iterate through, so the driver & storage have the opportunity to load data in a streaming fashion, without Cassandra node holding everything in memory.

@annanay25
Copy link
Member

@yurishkuro the idea was that the query nodes would hold data of only one batch at a time in-memory, and would send this batch to the client (assuming the UI). The client could choose to read and then discard the spans in batches.

But as you have mentioned that jaeger does not support streaming of results, what approach do you recommend?

@yurishkuro
Copy link
Member

I think the simplest fix is what is proposed in the description of this ticket. It's not ideal since it will truncate the data, but we can revisit that later when/if we support a streaming API from the query service. At least this fix is fairly straightforward and will protect the query svc from OOM. Most of the analytics will choke on a trace with >100k spans anyway. The limits will be an optional configuration.

@zigmund
Copy link

zigmund commented Oct 30, 2018

@yurishkuro I agree. Maybe should just mark trace in web interface that it is truncated due span count limit.

@annanay25
Copy link
Member

annanay25 commented Nov 20, 2018

@yurishkuro truncated search results (truncated span-count as well, for ES) implemented for ES and Cassandra - annanay25@9eb7590

We were waiting on a test case for this? Is someone working on that?

@yurishkuro
Copy link
Member

Not afaik

@mohit-chawla
Copy link

Not 100% related but i have a question:

Edge case: For a use case, i need to use in mem storage, was wondering about edge case where one huge trace takes up whole memory, will this cause any OOM errors or will the trace be discard?

  • do we have a max trace size available ?
  • how can i increase the mem allocated ?

cc: @annanay25

@Sreevani871
Copy link
Contributor

Sreevani871 commented Sep 24, 2020

Encountering OOM issues while fetching traces from jaeger-UI.
Details :
Backend Storage: Elasticsearch
Instance RAM: 4 GB(~3.7 GB Free Memory)
Instance vCPU:2

We are encountering OOM kill upon fetching 20 traces(each trace consists of ~20K spans of 1kb size each) from UI. This is causing frequent process restarts.
While going through code, observed jaeger-query service FindTraces API is fetching all spans of a resulted traces in this call itself.
Would it be more efficient if it is feasible to fetch only top traceID's and related stats info to display in UI by using ES count queries and all spans of a trace can be queried on click on particular trace. In this case also OOM issue will come if a single trace has large set of spans. To avoid this can we just load default no of spans for a trace initially and providing an option like load more in UI to scroll further set of spans till the trace has no more spans left.

@yurishkuro
Copy link
Member

@Sreevani871 yes, we even had tickets somewhere for that, but no volunteers to implement it.

@Sreevani871
Copy link
Contributor

@yurishkuro Any specific reason for fetching all info in FindTraces call itself from UI?

@yurishkuro
Copy link
Member

If you mean the internal implementation of FindTraces, then yes, the reason is because there is no summary information about the full trace available anywhere else, since collectors receive spans one at a time. Storage is the only place that can provide a full trace, but you need to read that full trace in order to extract summary. It's possible to build a post-processing, but it will require additional infrastructure/coordination.

@Sreevani871
Copy link
Contributor

@Sreevani871 yes, we even had tickets somewhere for that, but no volunteers to implement it.

Can you share the tickets will have a look?

@yurishkuro
Copy link
Member

There is context here: jaegertracing/jaeger-ui#247

@Sreevani871
Copy link
Contributor

@yurishkuro Any progress on this?

@Sreevani871
Copy link
Contributor

To overcome frequent OOM kills of jaeger-query service (Due to traces having a large number of spans), we need to avoid fetching of all spans data of respective TopN traces. Here listing out the approaches we thought of to solve the issue.
we are using elasticsearch as span storage. Please review and share your feedback

Approach - I:
Fetch summary of TopN traces instead of fetching all raw span data of respective traces from the backend
Returning the summary instead of all raw span data to UI avoids the summary generation step at the UI side which intern saves UI page load time.
Changes Required at Backend:

  • Introduce a new API: /summary to fetch a summary of traces.
  • Need to add code flow for /summary API which involves changes at querier and span reader interface which will enforce all span storage types to implement summary method at the reader side.
  • The API returns response as map[TraceID]TraceSummary{}
type TraceSummary  {
    ServiceName string.        //  service name of the old span of trace
    OperationName string    // operation name of the old span of trace
    StartTime string             // start time of the old span of a trace.
    Duration time.Duration // duration of the old span of a trace.
    SpanCount int               // total span count of a trace
    SpanCountByService map[string]int
    ErrorSpanCount int // error span count of a trace
    SpanID string
    }

Here old span of trace refers to parent span ideally. To avoid cases like parent span missing or parent span not indexed at the time fo search, considering fetching old span(sort by startTime) of a trace available at the time of the search.

Code Changes Required at UI:

  • UI has to call a new API to get the summary of traces data.
  • Make use of summary data returned from the backend to generate the FindTraces page.
  • Since UI does not have full span data of a trace, On click on trace, UI should call the backed for full span details.

Pros:

  • The memory usage for this is approach is very minimal and no.of the objects returned from the backend to UI is equal to no.of traces we fetch and each trace object carries the only summary.
  • Memory usage is better than Approach-2

Cons:

  • Requires code changes at both backend and frontend. Compare to Approach-2 UI requires considerable code changes to adopt this approach.

Note: FindTraces page has a Deep Dependency Graph option, Since we are avoiding fetching all span data in the FindTraces API call, Need to have a separate API which will fetch the required fields of spans for traces to construct DDG which will overlap with our idea of Approach-2.

Approach - II:
Analyze the required fieldset of span to compute the FindTraces UI page without modifying the existing API response data model.
Observed following fieldset of the span are being used to generate the FindTraces list view and DDG.

  1. ServiceName
  2. OperationName
  3. References
  4. Tags.Span.Kind
  5. StartTime
  6. Duration

Changes Required At Backend:

  • Change the ES query to fetch only selective fields from the source.
  • The following specifies data flow from ES to UI back.
  • FindTraces() method returns []jaeger.model.Trace
    ES -> []byte -> MultiSearchResults{} -> es.dbmodel.Span -> jaeger.model.Span -> jaeger.model.Trace
    - ConvertToUI model
    []jager.model.Trace -> []json.Trace
  • In the above flow data transformation(serialization/deserialization) steps are consuming 2 to 3X extra memory of actual data. Confirmed from heap profile alloc_space data.

Screen Shot 2020-10-12 at 2 26 20 PM

  • To avoid this extra memory usage we can reduce these steps and will have the following code flow
    ES ->[]byte -> MultiSearchResults{} -> []json.Trace.

Changes Required At Frontend:
- No considerable code changes required at the UI side expect the API call if we introduce a new API for this approach.
Pros:

  • Only changes required at the backend side. No changes required at the frontend side.
  • It will reduce the memory consumption by x percentage(since we are avoiding fetching unnecessary fieldset of a span).

Cons:

  • It will reduce the memory by x percentage compared to existing code, but no the objects returned from backed to UI not reduced here.
    Ex: Fetching 20 traces with each trace has 20K spans resulting in 400K objects.
    Approach-1 is comparatively better in terms of memory usage reduction and the UI page load time since we reduced UI operations of building a summary from data returned from the backend.

@yurishkuro
Copy link
Member

Note: FindTraces page has a Deep Dependency Graph option...

That's a very astute observation, since DDG is built entirely by the UI and requires loading all spans of the trace (but not all data in the spans).

A more flexible variation of approach 2 is to build support for GraphQL. It's going to be difficult to support partial field retrieval at the storage level (some storage implementations may only allow retrieving the whole trace as a blob), so the whole data is still going to be loaded in memory from the database. The memory savings can come from refactoring the storage API to be more like stream, i.e. instead of synchronous:

FindTraces(query) ([]*model.Span, error)

implement the streaming version (incidentally, the grpc-plugin storage API already uses streaming under the hood):

FindTracesStreaming(query, handler func(*model.Span) error) error

The handler function can immediately convert model.Span to JSON output object, AND trim it down by filtering only the fields required by GraphQL query.

This will not reduce the number of memory allocations, but will allow query service not to hold onto every single span of the whole result set.

Later this can be further extended into approach 1, which will reduce memory requirements even further (but DDG from search will likely require a second query, which is probably ok since not every search is used to display DDG).

@Sreevani871
Copy link
Contributor

Using json.Unmarshal() func instead of json.Decode() func here https://github.com/jaegertracing/jaeger/blob/master/plugin/storage/es/spanstore/reader.go#L233 shows improvement on memory usage because json.Decode() func is buffering entire json value in memory again. Since here the data passed to unmarshalJSON() method is already in memory so using json.Unmarshal() is improving performance in terms of memory usage.
Benchmarking Results:


goos: darwin
goarch: amd64
BenchmarkUnmarshal-8     2251400               527 ns/op             352 B/op          5 allocs/op
BenchmarkDecoder-8       1470145               812 ns/op            1096 B/op          8 allocs/op

package test

import (
	"bytes"
	"testing"
	"fmt"
	"encoding/json"
)

type JSON struct {
	a []int
	b string
	c float64
	d []byte
}

var j = JSON{
	a: []int{1,2,4,5,566,9},
	b: "Testing Data",
	c: 145.0,
	d: []byte("Testing Data"),
}
func BenchmarkUnmarshal(b *testing.B){
	for n := 0; n < b.N; n++ {
		unmarshalJson(marshalJson(j))
	}
}

func BenchmarkDecoder(b *testing.B){
	for n := 0; n < b.N; n++ {
		decodeJson(marshalJson(j))
	}
}

func marshalJson(j JSON) []byte{
	v,err:=json.Marshal(j)
	if err !=nil {
		fmt.Println(err)
		return nil
	}
	return v
}
func unmarshalJson(data []byte) {
	var j =JSON{}
	err:=json.Unmarshal(data, &j)
	if err!=nil {
		fmt.Println(err)
	}
}

func decodeJson(data []byte) {
	var j = JSON{}
	d := json.NewDecoder(bytes.NewReader(data))
	d.UseNumber()
	if err := d.Decode(&j); err != nil {
		fmt.Println(err)
	}
}

@yurishkuro
Copy link
Member

Interesting query benchmark numbers from a post on Slack:

  • 1 trace with approximately 1700 spans returns in 1.5 seconds. (5 unique ES queries (+ 2 ES queries added per any additional trace/limit) all complete in under a second)
  • 10 traces, each with approximately 1700 spans, return in 5 seconds. Memory: 150MB
  • 100 traces, each with approximately 1700 spans, return in 30 seconds. Memory: 700MB - 1.5GB (203 unique ES queries, all of which complete in under 3 seconds)

@jkowall
Copy link
Contributor

jkowall commented Jun 7, 2024

One possible solution here would be to limit the number of spans returned when pulling up a trace to 50k and allow this to be configurable.

@yurishkuro
Copy link
Member

Many of the issues discussed in this issue will be alleviated by Storage v2 API (#5079) which will support streaming of the results. It will not address the issue if a single trace has millions of spans since it still needs to be loaded by the query service in order to apply adjustments and conversion, but the query service will be in control of pulling as much data as it wants to it can put a limit on the number of spans it's willing to receive and terminate the load after that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/storage bug help wanted Features that maintainers are willing to accept but do not have cycles to implement performance
Projects
None yet
Development

No branches or pull requests

9 participants