Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for query-path OTLP #9233

Closed
yurishkuro opened this issue Jan 6, 2024 · 8 comments
Closed

Support for query-path OTLP #9233

yurishkuro opened this issue Jan 6, 2024 · 8 comments

Comments

@yurishkuro
Copy link
Member

yurishkuro commented Jan 6, 2024

Is your feature request related to a problem? Please describe.

Jaeger-v2 is being reengineered to run as a custom build of OTel-collector (jaegertracing/jaeger#4843). The existing implementations of Jaeger's "exporter"-equivalents (aka Storage implementations) are using Jaeger's internal data model, which requires transformation from incoming OTLP from the receivers. We want to avoid this and change storage implementations to work directly with OTLP.

By being tightly integrated with OTEL pipeline, it means that the write path in our storage implementations will use pdata/ptrace. No particular issues here.

However, Jaeger also implements a query / read path, for example:

message SpansResponseChunk {
  repeated opentelemetry.proto.trace.v1.ResourceSpans resource_spans = 1;
}

service QueryService {
  rpc GetTrace(GetTraceRequest) returns (stream SpansResponseChunk) {}
  rpc FindTraces(FindTracesRequest) returns (stream SpansResponseChunk) {}
}

Ideally, the responses from these RPCs should also return OTLP. However, this requires integration at the level of generated protobuf types, which are hidden in OTel-collector. The only way we were able to do this today is to generate our own otlp types, and then perform multiple transforms: otel-pdata -> binary proto -> jaeger-otlp -> final output.

Describe the solution you'd like

Ideally, I would like to have our internal Storage API (read side) to use pdata, e.g.

type SpanReader interface {
  GetTrace(traceID string) ptrace.Traces
}

and the gRPC definition to remain the same as shown above, and use the most efficient way of translating ptrace.Traces -> SpansResponseChunk protobuf messages. This requires some support from OTEL-collector to make this possible. I don't know what that is exactly, e.g.:

  1. open up internal proto types (I think this is unlikely)
  2. perhaps extend ptrace/ptraceotlp with additional reusable types, similar to how ExportTraceServiceResponse is supported, but a new type that actually contains traces (essentially an equivalent of SpansResponseChunk above)

Describe alternatives you've considered

I've explored a number of alternatives, documented here.

My two possible, but not ideal, solutions right now are:

  • do not use pdata at all in the read path of Storage API, use some internal version of OTLP
  • fork/clone pdata into Jaeger with our own proto types, so that at least at the API level the read and write paths are similar, although with distinct import paths (cloning would need to be automated to stay in sync)

These are both pretty bad as they force us to create a whole parallel machinery that is nearly identical to pdata but has one extra public type wrapper.

I would like to get suggestions from the collector maintainers how best to approach this. Happy to prototype.

Proposal

  1. Similar to opentelemetry-proto/blob/main/opentelemetry/proto/collector/trace/v1/trace_service.proto, define query/trace/v1/trace_service.proto with a single message (for now - in the future this could be extended to a full service definition if Query SIG comes up with a proposal)
message QueryServiceTraceResponse {
  repeated opentelemetry.proto.trace.v1.ResourceSpans resource_spans = 1;
}
  1. Implement a corresponding pdata wrapper for that message similar to ExportResponse:
    // ExportResponse represents the response for gRPC/HTTP client/server.
    type ExportResponse struct {
    orig *otlpcollectortrace.ExportTraceServiceResponse
    state *internal.State
    }
  2. Repeat as needed for logs/metrics (not needed for Jaeger right now)
@atoulme
Copy link
Contributor

atoulme commented Jan 6, 2024

Can 2. be done in a separate package from pdata?

@yurishkuro
Copy link
Member Author

Can 2. be done in a separate package from pdata?

No, because it needs to reference opentelemetry.proto.trace.v1.ResourceSpans, which is internal to pdata.

@yurishkuro
Copy link
Member Author

added concrete proposal

@mx-psi
Copy link
Member

mx-psi commented Jan 8, 2024

Can 2. be done in a separate package from pdata?

No, because it needs to reference opentelemetry.proto.trace.v1.ResourceSpans, which is internal to pdata.

Can it be done on a separate module inside the pdata folder then?

@yurishkuro
Copy link
Member Author

Can it be done on a separate module inside the pdata folder then?

I suppose, as long as it has access to pdata/internal/data/protogen

@mx-psi
Copy link
Member

mx-psi commented Jan 8, 2024

@open-telemetry/collector-maintainers what do you think about having this on a separate module? Could be something like go.opentelemetry.io/pdata/ptrace/ptracequery work?

@bogdandrutu
Copy link
Member

bogdandrutu commented Jan 9, 2024

First proposal

Here is a path that I know about, it may be a bit ugly but tell me if that satisfy your requirement.

Second proposal

Another option is to change a bit your public gRPC definition to be:

message SpansResponseChunk {
  opentelemetry.proto.trace.v1.TracesData traces = 1;
  ... // may have later other properties.
}

Then only in the server that uses the pdata, change the definition to an wire equivalent that allows you to directly use the bytes:

message SpansResponseChunk {
  bytes traces = 1;  // populate using https://github.com/open-telemetry/opentelemetry-collector/blob/main/pdata/ptrace/pb.go
  ... // may have later other properties.
}

yurishkuro added a commit to jaegertracing/jaeger that referenced this issue Jan 15, 2024
## Which problem is this PR solving?
- Part of #5079
- Avoid triple marshaling due to our use of internally generated OTLP
proto types, which prevents us from directly using the output of
jaeger->otlp conversion from collector contrib.

## Description of the changes
- 🛑 breaking: the JSON format is changed to match
[OTLP-JSON][otlp-json], specifically
- the trace & span IDs are returned as hex-encoded strings, not base64
as required by Proto-JSON
  - enums are returned as integers, not strings
- Use Proposal 1 from
open-telemetry/opentelemetry-collector#9233 (comment)
- API-V3 proto is already declared to use official OTLP types; remove
the overrides to our internally generated OTLP proto types
- Replace `SpansResponseChunk` with official `otlp.TracesData`, but
override it internally to use a custom type
- Depends on jaegertracing/jaeger-idl#103

## How was this change tested?
- Unit tests

[otlp-json]:
https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#json-protobuf-encoding

---------

Signed-off-by: Yuri Shkuro <[email protected]>
@yurishkuro
Copy link
Member Author

@bogdandrutu proposal 1 did the trick, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants