Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Jaeger] Support receiving Jaeger spans #2886

Closed
axw opened this issue Nov 7, 2019 · 3 comments
Closed

[Jaeger] Support receiving Jaeger spans #2886

axw opened this issue Nov 7, 2019 · 3 comments
Assignees
Milestone

Comments

@axw
Copy link
Member

axw commented Nov 7, 2019

We have heard from users that they would like to use their existing Jaeger instrumentation with Elastic APM. It is already possible to store and search span data in Elasticsearch, but what is missing is integration with Elastic APM and related features, such as visualisation in the APM UI, central configuration, log correlation.

There are at least a couple of ways we can go about this:

  • We already have an issue to implement an OpenTelemetry Exporter (Feature: OpenTelemetry exporter apm#98). We could wait for OpenTelemetry Collector to be ready, and require users to redirect their Jaeger agents to OpenTelemetry Collector.
  • We could add support to the APM Server to ingest Jaeger data directly.

The exporter option has a couple of problems. First and foremost, it's more complex: an additional component is required, complicating users' lives. Second, we would not be able to support Jaeger's remote sampling in this way.

I suggest that we add support directly to the APM Server for receiving spans, and translating them into our data schema. The server would effectively be a drop-in for jaeger-collector. To do this, we'll need to introduce HTTP and gRPC endpoints, for receiving spans directly from clients, and from jaeger-agent, respectively. Jaeger also supports TChannel, but I don't think we need to support this.

It may be useful to reuse some of the machinery inside opentelemetry-collector, so that the server is effectively implementing an OpenTelemetry Exporter within the same process. This may enable us to add support for additional trace data sources in the future. I have prototyped this approach here: 7.x...axw:otel-7.x (sorry, it's a bit of a mess.)

One thing that is yet to be solved is auth. A limited subset of the Jaeger clients (Java and C#) support the JAEGER_AUTH_TOKEN config, which can be used to send a bearer token along with requests to the collector. This only works when using HTTP. The Jaeger agent only supports client certificate auth. We do support this, but it might also be worth a lighter-weight option, such as ferrying a secret/bearer token through tags, using the --jaeger.tags CLI flag.

@simitt
Copy link
Contributor

simitt commented Nov 25, 2019

TODOs:

Docs:

End-to-end Testing:

Follow up topics/concepts:

  • [concept+impl]: When using probabilistic sampling jaeger only sends sampled spans. Elastic APM requires a 100% of the transaction events and only samples spans. Investigate how to solve sampling with jaeger integration ([Jaeger] How to support sampling #3011)
  • [concept]: auth handling: Investigate further options, e.g. using special tags

Out of scope:

  • support for tchannel
  • providing endpoint exposing metrics
  • support on ESS

@axw
Copy link
Member Author

axw commented Nov 26, 2019

[concept]: When using probabilistic sampling jaeger only sends sampled spans. Elastic APM requires a 100% of the transaction events and only samples spans. Investigate how to solve sampling with jaeger integration.

Outside of the context of Jaeger, there's been some discussion around relying on metrics so we don't need to send and store non-sampled transactions. This is elastic/apm#104.

If we were to do that, we could have the server aggregate metrics based on sampled Jaeger trace data. Jaeger sends up the sampling rate in effect at the time a trace is captured, so we can scale the count/total based on that.

@simitt
Copy link
Contributor

simitt commented Feb 17, 2020

Work for 7.6 release has been finished. Moving follow up topics to new meta issue #3340.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants