Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add gRPC query service with OTLP model #76

Merged
merged 12 commits into from
Jun 25, 2021
94 changes: 94 additions & 0 deletions proto/api_v3/query_service.proto
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
// Copyright (c) 2021 The Jaeger Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

syntax="proto3";

package jaeger.api_v3;

import "opentelemetry/proto/trace/v1/trace.proto";
import "google/protobuf/timestamp.proto";
import "google/protobuf/duration.proto";

option go_package = "api_v3";
option java_package = "io.jaegertracing.api_v3";

// Request object to get a trace.
message GetTraceRequest {
string trace_id = 1;
}

// A single response chunk holds a single trace.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In v2 API this description would be inaccurate - a chunk is neither a full trace nor spans from a single trace. A chuck is just a mechanism of delivering large number of spans in smaller batches.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this hold in practice?

To my knowledge, a chunk at the moment always represents a single trace (a trace known at the query time).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have slightly changed the comment

// Note that a subsequent request to get a trace might return a different result
// because spans are reported in asynchronous chunks.
pavolloffay marked this conversation as resolved.
Show resolved Hide resolved
message SpansResponseChunk {
// OTLP resource spans
// see https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/trace/v1/trace.proto
pavolloffay marked this conversation as resolved.
Show resolved Hide resolved
repeated opentelemetry.proto.trace.v1.ResourceSpans resource_spans = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is a resource (maybe this)?

is it possible for jaeger-query to return spans from more than one resource? If so, for my learning, what are some examples?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit tricky to construct this from Jaeger spans because we denormalize the Process into each individual span. We do have some re-assembly logic when we return spans to the UI, but it's probably also valid to just return (resource, span) pairs as denormalized.

}

// Query parameters to find traces.
message TraceQueryParameters {
string service_name = 1;
pavolloffay marked this conversation as resolved.
Show resolved Hide resolved
string operation_name = 2;
map<string, string> attributes = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we clarify which attributes these are supposed to match? We've been pretty loose about it in the original API, leaving the interpretation to the storage. I.e. should all these attributes match on a single span, or could they match across spans? Do they match span attributes only or span logs as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The match should be done on tags and process tags. ES supports match on logs as well when kibana support is enabled (flat schema).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ES they must match on a single span. How is it done in cassandra? I think all storages should follow this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is true in Cassandra, because it takes the tag=value string and looks up an index that just gives trace IDs. If more than one tag is provided, it could easily match on different spans.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a comment on the TraceQueryParameters that clarifies that some storage implementations might deviate.

Alright so C* does deviate as well. What was the original design for attributes? Match any attributes within trace or in a single span?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way to see this: what would you expect as a user? If you specify two pairs of attributes, would you expect them to exist throughout the trace, or for all attributes to exist as part of the same span? I'm not quite sure I have an answer here... Here's one case advocating for attributes to exist throughout the trace:

  • root span has "userID=123", no other spans contain this
  • span (not root span) has error=true
  • SRE is looking for traces for user 123 with errors in it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to create an issue to address this, this PR can be merged as is.

google.protobuf.Timestamp start_time_min = 4;
google.protobuf.Timestamp start_time_max = 5;
google.protobuf.Duration duration_min = 6;
google.protobuf.Duration duration_max = 7;
jpkrohling marked this conversation as resolved.
Show resolved Hide resolved
int32 num_traces = 8;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to keep this as num_traces or use a more vague term like "search depth"? Because Cassandra storage does not guarantee num_traces in the response, which was often a source of confusion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to use num_traces it is what the query API uses. If C* does not implement it (or any other storage) we should document that.

The goal is to make it clear for users what the parameter means.

}

// Request object to search traces.
message FindTracesRequest {
TraceQueryParameters query = 1;
}

// Request object to get service names.
message GetServicesRequest {}

// Response object to get service names.
message GetServicesResponse {
repeated string services = 1;
}

// Request object to search for operation names.
message GetOperationsRequest {
string service = 1;
string span_kind = 2;
pavolloffay marked this conversation as resolved.
Show resolved Hide resolved
}

// Operation encapsulates information about operation.
message Operation {
string name = 1;
string span_kind = 2;
}

// Response object to get operation names.
message GetOperationsResponse {
repeated Operation operations = 1;
}

service QueryService {
// GetTraces returns single trace.
rpc GetTrace(GetTraceRequest) returns (stream SpansResponseChunk) {}
joe-elliott marked this conversation as resolved.
Show resolved Hide resolved

// GetTraces searches for traces.
pavolloffay marked this conversation as resolved.
Show resolved Hide resolved
rpc GetTraces(FindTracesRequest) returns (stream SpansResponseChunk) {}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurishkuro do you remember why streaming was used? An alternative would be to return a list of chunks. Also we could remove chunk to Trace which is more idiomatic.

One issue with the streaming and grpc-gateway is that it wraps the response into result - e.g. {result: {resource_spans: ...}} see https://github.com/jaegertracing/jaeger/pull/3086/files#diff-1429f7cc5a76981a44799039e43d3bc7372808373b9c2b97a333c7dcf650b00aR72

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Streaming - because large result sets are difficult to transmit as one response.

Returning chunks manually means implementing some kind of pagination API, which would require support in the storage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one way or another to fully support streaming the storage API would have to change anyways.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storage API already has FindTraceIDs, which allows the query service to load traces one by one


// GetServices returns service names.
rpc GetServices(GetServicesRequest) returns (GetServicesResponse) {}

// GetOperations returns operation names.
rpc GetOperations(GetOperationsRequest) returns (GetOperationsResponse) {}
}
14 changes: 14 additions & 0 deletions proto/api_v3/query_service_http.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# This is an API configuration to generate an HTTP/JSON -> gRPC gateway for the
# OpenTelemetry service using github.com/grpc-ecosystem/grpc-gateway.
pavolloffay marked this conversation as resolved.
Show resolved Hide resolved
type: google.api.Service
config_version: 3
http:
rules:
- selector: jaeger.api_v3.QueryService.GetTrace
get: /v3/traces/{trace_id}
- selector: jaeger.api_v3.QueryService.GetTraces
get: /v3/traces
- selector: jaeger.api_v3.QueryService.GetServices
get: /v3/services
- selector: jaeger.api_v3.QueryService.GetOperations
get: /v3/operations