Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add gRPC query service with OTLP model #76

Merged
merged 12 commits into from
Jun 25, 2021
78 changes: 78 additions & 0 deletions proto/api_v3/query_service.proto
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
// Copyright (c) 2019 The Jaeger Authors.
// Copyright (c) 2018 Uber Technologies, Inc.
pavolloffay marked this conversation as resolved.
Show resolved Hide resolved
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

syntax="proto3";

package jaeger.api_v3;

import "opentelemetry/proto/trace/v1/trace.proto";
import "google/protobuf/timestamp.proto";
import "google/protobuf/duration.proto";

option go_package = "api_v3";
option java_package = "io.jaegertracing.api_v3";

message GetTraceRequest {
bytes trace_id = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for REST/JSON API, which representation of the trace ID should we support? Jaeger's base16 or OTEL's base64?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we expect folks to build tools that can ingest our JSONs as if they were OTLP, we should follow their representation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. I expect people should be able to copy OTEL traceid (e.g. from logs) and query it directly from Jaeger.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurishkuro where did you find OTEL uses base64? The spec mentions hex encoding. The logging exporters use hex as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does OTEL have a spec for JSON format? If that format is rendered from proto, then it will be base64 for bytes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, to be honest, this is the reason why I gave up back in the day when trying to make a JSON API backed by proto IDL. I thought OTEL found a solution, but the change in the spec is a total cop out - "it's std proto-JSON except for this field" (which makes std proto-JSON unusable). You mentioned they had prototypes in other languages, how did they solve that?

I am inclined to just support all kinds of formats for IDs in the inputs, i.e. you should be able to paste both base64 and hex ID into the UI. But that doesn't answer what format we return in proto-JSON, and my preference would be to stick with the standard proto-JSON for that, meaning returning base64.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mentioned they had prototypes in other languages, how did they solve that?

custom codes, the difficulty depends on the language so not ideal for the consumers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also if we go with the streaming API for the get trace(s) the JSONPb codec will not work OOTB - see #76 (comment). The returned object is wrapped into another object.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the upstream issue for the reference grpc-ecosystem/grpc-gateway#1254 (comment) and it's apparently not fixed in v2 (I have asked on their slack).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have switched impl to use base64 for embedded IDs and keep using hex for queries.

}

message SpansResponseChunk {
repeated opentelemetry.proto.trace.v1.ResourceSpans resource_spans = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is a resource (maybe this)?

is it possible for jaeger-query to return spans from more than one resource? If so, for my learning, what are some examples?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit tricky to construct this from Jaeger spans because we denormalize the Process into each individual span. We do have some re-assembly logic when we return spans to the UI, but it's probably also valid to just return (resource, span) pairs as denormalized.

}

message TraceQueryParameters {
string service_name = 1;
pavolloffay marked this conversation as resolved.
Show resolved Hide resolved
string operation_name = 2;
map<string, string> tags = 3;
google.protobuf.Timestamp start_time_min = 4;
google.protobuf.Timestamp start_time_max = 5;
google.protobuf.Duration duration_min = 6;
google.protobuf.Duration duration_max = 7;
jpkrohling marked this conversation as resolved.
Show resolved Hide resolved
int32 search_depth = 8;
}

message FindTracesRequest {
TraceQueryParameters query = 1;
}

message GetServicesRequest {}

message GetServicesResponse {
repeated string services = 1;
}

message GetOperationsRequest {
string service = 1;
string span_kind = 2;
pavolloffay marked this conversation as resolved.
Show resolved Hide resolved
}

message Operation {
string name = 1;
string span_kind = 2;
}

message GetOperationsResponse {
repeated Operation operations = 2;
pavolloffay marked this conversation as resolved.
Show resolved Hide resolved
}

service QueryService {
rpc GetTrace(GetTraceRequest) returns (stream SpansResponseChunk) {}
joe-elliott marked this conversation as resolved.
Show resolved Hide resolved

rpc GetTraces(FindTracesRequest) returns (stream SpansResponseChunk) {}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurishkuro do you remember why streaming was used? An alternative would be to return a list of chunks. Also we could remove chunk to Trace which is more idiomatic.

One issue with the streaming and grpc-gateway is that it wraps the response into result - e.g. {result: {resource_spans: ...}} see https://github.com/jaegertracing/jaeger/pull/3086/files#diff-1429f7cc5a76981a44799039e43d3bc7372808373b9c2b97a333c7dcf650b00aR72

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Streaming - because large result sets are difficult to transmit as one response.

Returning chunks manually means implementing some kind of pagination API, which would require support in the storage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one way or another to fully support streaming the storage API would have to change anyways.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storage API already has FindTraceIDs, which allows the query service to load traces one by one


rpc GetServices(GetServicesRequest) returns (GetServicesResponse) {}

rpc GetOperations(GetOperationsRequest) returns (GetOperationsResponse) {}
}
14 changes: 14 additions & 0 deletions proto/api_v3/query_service_http.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# This is an API configuration to generate an HTTP/JSON -> gRPC gateway for the
# OpenTelemetry service using github.com/grpc-ecosystem/grpc-gateway.
pavolloffay marked this conversation as resolved.
Show resolved Hide resolved
type: google.api.Service
config_version: 3
http:
rules:
- selector: jaeger.api_v3.QueryService.GetTrace
get: /v3/traces/{trace_id}
- selector: jaeger.api_v3.QueryService.GetTraces
get: /v3/traces
- selector: jaeger.api_v3.QueryService.GetServices
get: /v3/services
- selector: jaeger.api_v3.QueryService.GetOperations
get: /v3/operations