Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jaeger v2 based on OpenTelemetry collector #3500

Closed
pavolloffay opened this issue Jan 28, 2022 · 20 comments
Closed

Jaeger v2 based on OpenTelemetry collector #3500

pavolloffay opened this issue Jan 28, 2022 · 20 comments

Comments

@pavolloffay
Copy link
Member

pavolloffay commented Jan 28, 2022

Update (Sept 2023)

Superseded by #4843


Proposal (2022)

Creating top-level issue for Jaeger v2 based on OpenTelemetry collector.

This has already been discussed a couple of times:

I would like to bring the topic back as I think it is vital for the future success of Jaeger and staying relevant. Since we worked on v2 a couple of things changed, and therefore, I have created a new proposal on how v2 could be designed https://docs.google.com/document/d/1d7j956tDVYacKHF-l0JL9sVhpbYkt0JECDyGeup-olc/edit?usp=sharing and what is the impact on the ecosystem.

I would like to open this for discussion. If you are Jaeger user please upvote the issue if you would like the proposal including the breaking changes and impact on deployment helm/operator.

@pavolloffay pavolloffay pinned this issue Jan 28, 2022
@pavolloffay pavolloffay changed the title Jaeger v2 based on OpenTelemtery collector Jaeger v2 based on Telemetry collector Jan 28, 2022
@eugeniyk
Copy link

Direction wise makes total sense as jaeger pipeline looks rudimentary, however I believe there are some features that are not yet avail in otel collectors

What's left to Jaeger then? custom protocol as receiver (which should go away at some point), UI as extension and custom storage integration? Means just coupling querier with particular exporters?

@yurishkuro yurishkuro changed the title Jaeger v2 based on Telemetry collector Jaeger v2 based on OpenTelemetry collector Jan 28, 2022
@pavolloffay
Copy link
Member Author

Jaeger will provide the storage layer and UI.

Once we are based on the OTEL collector it will be easier to write extensions and provide additional functionality.

@yurishkuro
Copy link
Member

What's left to Jaeger then?

@eugeniyk Think of this not from the architecture point of view but from the user impact - everything about Jaeger is still there, plus some additional capabilities we could inherit from OTEL (the most critical one - OTLP receiver). Jaeger is an end-to-end tracing platform, OTEL collector is not, just a piece of it.

@jkowall
Copy link
Contributor

jkowall commented Jan 29, 2022

So, what about the data model which is used on storage, and how about the terminology in Jaeger would evolve to align with Otel versus OT? I don't think we need to make those changes, but they would help for the future. Unfortunatley they don't really help users so much, but are more for maintainers and the project.

The only other concern is the UI, we haven't been anything really meaningful on the UI side beyond the recent monitoring tab for the last several years. I have been pushing the OpenSearch Dashboards team to add Jaeger format support, they are already working with an OpenTelemetry schema on OpenSearch, but as you know we are on an OT schema for Jaeger which is "legacy". I think we need to address the schema and allow the Jaeger format to be more widely used for UIs which will evolve, as the Jaeger UI is likely to stagnate.

@pavolloffay
Copy link
Member Author

So, what about the data model which is used on storage, and how about the terminology in Jaeger would evolve to align with Otel versus OT? I don't think we need to make those changes, but they would help for the future. Unfortunatley they don't really help users so much, but are more for maintainers and the project.

The question I would as here is, what "features" OTEL schema gives compared to Jaeger/OT? They are pretty much interchangeable. The storage layer can change schema independently at any point, it's more like an implementation detail given the scope of this issue.

The only other concern is the UI, we haven't been anything really meaningful on the UI side beyond the recent monitoring tab for the last several years. I have been pushing the OpenSearch Dashboards team to add Jaeger format support, they are already working with an OpenTelemetry schema on OpenSearch, but as you know we are on an OT schema for Jaeger which is "legacy". I think we need to address the schema and allow the Jaeger format to be more widely used for UIs which will evolve, as the Jaeger UI is likely to stagnate.

I agree the Jaeger UI needs migration work as weel. At minimum we can start with renaming some UI elements Tags->Attributes etc. and at one point migrate UI to use the Jaeger V3 or other OpenTelemetry compatible model.

Do you have references to OpenSeach adopting OTEL model? The OTEL JSON is (or was) not stable yet. In the past we had issues in Jaeger when storage layer used directly jaeger model. The separation gives more flexibility and shading from breaking changes.

@jkowall
Copy link
Contributor

jkowall commented Feb 1, 2022

The question I would as here is, what "features" OTEL schema gives compared to Jaeger/OT? They are pretty much interchangeable. The storage layer can change schema independently at any point, it's more like an implementation detail given the scope of this issue.

Just more about Jaeger being aligned with the current tracing standard versus being aligned with the deprecated tracing standard. The second idea is that if we adopt the Otel schema we could work with OpenSearch Dashboards trace analytics which is using an Otel schema too. More on this in the next paragraph.

I agree the Jaeger UI needs migration work as weel. At minimum we can start with renaming some UI elements Tags->Attributes etc. and at one point migrate UI to use the Jaeger V3 or other OpenTelemetry compatible model.

Yes.... and more on this below

Do you have references to OpenSeach adopting OTEL model? The OTEL JSON is (or was) not stable yet. In the past we had issues in Jaeger when storage layer used directly jaeger model. The separation gives more flexibility and shading from breaking changes.

Currently, they are using "Data prepper" which does the writing to OpenSearch/ElasticSearch. They are not using Otel collector since they are doing aggregates similar to the spark component in Jaeger. Here is the schema for Data Prepper: https://github.com/opendistro-for-elasticsearch/data-prepper/blob/634a426ef0377ebc2e525e954177b350ebaeabe2/docs/schemas/trace-analytics/otel-v1-apm-span-index-template.md Ideally they will support a Jaeger schema for reading too, but without aggregates there will be a bunch of features that do not work (maps, aggregated metrics, etc).

@yurishkuro
Copy link
Member

OpenSearch Dashboards trace analytics

https://docs.aws.amazon.com/opensearch-service/latest/developerguide/trace-analytics.html

Fairly primitive so far, but certainly a viable alternative to the Monitor tab (#2954). @jkowall do you know if OS is planning to venture more into trace visualizations like the timeline view, graph view, etc.?

I think this brings one other crucial questions of Jaeger V2 - should we just focus on a single storage backend that can actually provide these analytical capabilities? TBH, it's not clear to me if querying OS for aggregate data is preferable over the Monitoring tab approach of using a metrics system for the same aggregates. It's certainly simpler dealing with just one backend, but likely at the expense of query latency.

@jkowall
Copy link
Contributor

jkowall commented Feb 1, 2022

Still needs work, the plugin is not that old. We had the PM on a Jaeger community call last year. They are planning on timeline and graphical topology. We also want to build metric query capabilities into OpenSearch Dashboards so you can hook it up to a PromQL backend, but no one is actively working on that at this time.

I think less backends is a good move, but I also think that the current backends are not very good with Metrics. I hope there is something better on the horizon that supports PromQL and unstructured data well. Hard to tell what and when that might happen.

The number of breaking changes coming in v8 will be interesting: https://www.elastic.co/guide/en/elasticsearch/reference/8.0/migrating-8.0.html Welcome to the ongoing Elastic push to break open source and backwards compatibility.

@albertteoh
Copy link
Contributor

TBH, it's not clear to me if querying OS for aggregate data is preferable over the Monitoring tab approach of using a metrics system for the same aggregates. It's certainly simpler dealing with just one backend, but likely at the expense of query latency.

Another benefit of the Monitoring tab approach is its tighter integration within Jaeger UI, supporting use cases that rely on aggregated data to help narrow down the search space to the more "interesting" traces (high latency, error rate traces) with a single button click.

I agree though, that the approach used with the Monitoring tab has quite a few moving parts: OTEL collector with the correct config to perform the aggregation from traces to metrics, a metrics store to persist the metrics, then Jaeger query + UI to visualize the metrics; and like the idea of supporting a single storage backend.

@pavolloffay
Copy link
Member Author

Jager v2 would reduce the dependency on the OTEL collector for the monitoring feature. We bould include the processor that extracts metrics in the Jaeger main distribution.

@gai6948
Copy link

gai6948 commented Feb 8, 2022

From a user's perspective, I would like to know what it really means by "Jaeger collector based on Otel collector", is that a drop-in replacement for the Otel collector?

The folks from Dow Jones has written an excellent blog post detailing how OS, Jaeger and Otel plays together in enterprise settings, Otel acts as centralized trace pipeline and Jaeger being a sink.

Perhaps Jaeger in the future can remove client libraries/agents and focus on how to deal with the trace data (e.g. adding anomaly detection features?)

On a side note, recent enhancements in Grafana's Jaeger data source plugin are really impressive, not saying that can replace the Jaeger UI, but maybe in the future more users will use Grafana instead of the Jaeger UI.

Screenshot 2022-02-08 at 9 38 32 PM

@jkowall
Copy link
Contributor

jkowall commented Feb 9, 2022

From a user's perspective, I would like to know what it really means by "Jaeger collector based on Otel collector", is that a drop-in replacement for the Otel collector?

It would have some additional exporters or other capabilities the project needs, but otherwise it would be very similar and mostly upstream code.

The folks from Dow Jones has written an excellent blog post detailing how OS, Jaeger and Otel plays together in enterprise settings, Otel acts as centralized trace pipeline and Jaeger being a sink.

Problem is that data prepper is a pain to handle since it's doing the same things the collector is doing. Creates a lot of complexity, but yes it can play well, you are duplicating the data twice since it needs to be stored as Data prepper and Jaeger schemas.

Perhaps Jaeger in the future can remove client libraries/agents and focus on how to deal with the trace data (e.g. adding anomaly detection features?)

The problem is we'd have to become more opinionated about the backend. The client libraries would be replaced by Otel, in fact they already have been since we deprecated the old ones: https://www.jaegertracing.io/docs/1.30/client-libraries/

On a side note, recent enhancements in Grafana's Jaeger data source plugin are really impressive, not saying that can replace the Jaeger UI, but maybe in the future more users will use Grafana instead of the Jaeger UI.

The problem is that Grafana's licensing is not friendly to CNCF, and in the future I wouldn't be surprised to see an Elastic move pulled creating a gap in the Prometheus community and potentially a gap in Jaeger. I prefer to see an Apache 2.0 licensed solution like OpenSearch Dashboards or Apache Superset.

@pavolloffay
Copy link
Member Author

I have some good news for this ticket!

I have created https://github.com/jaegertracing/jaeger-opentelemetry-collector to bootstrap the work on the jaeger v2/ rebase on top of the OpenTelemetry collector. It is a community project, anybody interested can start contributing by migrating Jaeger storage implementations as exporters.

@yurishkuro
Copy link
Member

@pavolloffay may I suggest adding a tracking issue (or a project) in that repo that lists the overall plan / list of tasks that need to be done to achieve some success criteria?

@pavolloffay
Copy link
Member Author

That is a great idea - jaegertracing/jaeger-opentelemetry-collector#49

@issraee
Copy link

issraee commented Aug 5, 2022

Hello, I'm trying to send trace data of a wordpress website to opensearch through jaeger. I'm using wordpress plugin wordpress to send the data to jaeger, and from there to opensearch :

docker run --rm -it -v ${PWD}:/config
-e SPAN_STORAGE_TYPE=elasticsearch opensearchproject/opensearch
jaegertracing/jaeger-opentelemetry-collector
--config-file=/config/config.yaml
--es.server-urls=http://IP:9200/
--es.num-shards=3

With config.yaml :

exporters:
otlp/data-prepper:
endpoint: http://data-prepper:9200
insecure: true
processors:
attributes:
actions:
-key: user
action: delete
service:
pipelines:
traces:
processors: [attributes]

-> It gives the following error

./opensearch-docker-entrypoint.sh: line 140: /usr/share/opensearch/jaegertracing/jaeger-opentelemetry-collector: No such file or directory

I'm looking for this since using open telemetry seperately and data prepper is returning connection errors/no valid pipeline for execution. So if this works it would simplify the process so much. Any suggestions to discuss would be appreciated.

@yurishkuro
Copy link
Member

yurishkuro commented Aug 5, 2022

jaegertracing/jaeger-opentelemetry-collector

where are you getting instructions to use this ^ name? It's not supported atm. If your Wordpress installation exports OpenTelemetry data, you can send it to regular Jaeger collector: https://medium.com/jaegertracing/introducing-native-support-for-opentelemetry-in-jaeger-eb661be8183c

@derek-ho
Copy link

Hey folks @pavolloffay @yurishkuro @gai6948 @eugeniyk @albertteoh @issraee,

I've been working on jaeger integration with OpenSearch Dashboards Trace Analytics. Here's a quick demo, can you folks let me know what you think of it? We are hoping to get this into the next release, but would definitely like to hear some feedback about it from the community on what is useful/could be improved from end users and work with the Jaeger community to drive future development.

opensearch-project/dashboards-observability#83

The github issue tracking some of the PM/UI/UX work is here: opensearch-project/dashboards-observability#83.

Caveat: This demo is made from jaeger data ingested in a specific format that is more friendly to opensearch:
--es.tags-as-fields.all=true. It doesn't work well if it is not ingested in this format, we may want to bring this up to the community or make this the default, since it is not today. Any thoughts on this?

@yurishkuro
Copy link
Member

For people subscribing to this issue - I have a new PR that I'd like to get feedback on. It currently implements a working all-in-one on top of OTEL Collector framework. #4766

@yurishkuro yurishkuro pinned this issue Sep 26, 2023
yurishkuro added a commit that referenced this issue Sep 27, 2023
## Which problem is this PR solving?
- Third prototype of "Jaeger-v2"
- Another alternative approach to #3500

## Description of the changes
- Adds a new binary `jaeger-v2` using OTEL Collector framework
- Minimal amount of extensions is included, to mimic what
`jaeger-collector` normally has
- It will combine all previous functions of agent/collector/query in one
binary, but controllable via config file

```
$ go run -tags=ui ./cmd/jaeger-v2 --config ./cmd/jaeger-v2/config.yaml
```

## Roadmap


https://docs.google.com/document/d/1s4_6VgAS7qAVp6iEm5KYvpiGw3h2Ja5T5HpKo29iv00/edit

## Design

* the ingestion and storing of traces will be done via standard
receivers/processors/exporters OTEL Collector components
* the jaeger-query and UI are implemented as `jaeger_query` extension
(already working in this PR)

### Storage
In order to keep the flexibility of mixing & matching storage
implementations, all backends can be configured via `jaeger_storage`
extension (we may need to add `jaeger_metrics_storage` extension in the
future). It might look like this:
```yaml
jaeger_storage:
  memory:  # defines Factory
    memstore:
      max_traces: 100000
  cassandra:
    cassandra_primary:
      servers: [...]
      namespace: jaeger
    cassandra_archive:
      servers: [...]
      namespace: jaeger_archive
```

The `jaeger_query` extension then references specific storage factories
by name:
```yaml
  jaeger_query:
    trace_storage: memstore
    dependencies: something_else
    metrics_store: prometheus_store
```

It's not clear yet if `jaeger_query` extension should simply subsume
`jaeger_storage` extension, because Query is the only one that needs
this _generic_ access to storage, while things like exporters or Kafka
ingester (receiver) always deal with a single implementation (because
OTEL Coll pipeline allows to connect them with each other, which is not
possible with extensions).

## Trade-offs
- This not using OTEL Collector builder `ocb`. That means people won't
be able to assemble a different version of the collector with other
extensions.
- We may want to support `ocb` in the future, as it makes it easier to
write custom in-process exporters for custom storage. It will require
converting all the components into their own modules.

## Next steps

* [x] Get feedback from the community on the approach
* [x] Fully implement all-in-one by wiring receivers / exporters
correctly

## Open Questions
* How can we implement all-in-one equivalent that can be run without any
config file?
* Do we want
[healthcheckextension](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/extension/healthcheckextension/README.md)
to be included by default?
* Investigate startup error `2023-09-23T19:55:46.661-0400 warn
zapgrpc/zapgrpc.go:195 [core] [Channel #2 SubChannel #3] grpc:
addrConn.createTransport failed to connect to {Addr: ":16685",
ServerName: "localhost:16685", }. Err: connection error: desc =
"transport: Error while dialing: dial tcp :16685: connect: connection
refused" {"grpc_log": true}`

---------

Signed-off-by: Yuri Shkuro <[email protected]>
Signed-off-by: Yuri Shkuro <[email protected]>
Co-authored-by: Albert <[email protected]>
@yurishkuro
Copy link
Member

I am closing this in favor of a new issue where the actual roadmap is tracked.

#4843

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants