-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add query service with OTLP #3086
Add query service with OTLP #3086
Conversation
@yurishkuro, @joe-elliott could you please have a quick look, especially on the proto definition. |
b558778
to
d59f0b3
Compare
be579e5
to
5e2b9cb
Compare
Codecov Report
@@ Coverage Diff @@
## master #3086 +/- ##
==========================================
- Coverage 95.92% 95.89% -0.04%
==========================================
Files 236 239 +3
Lines 10238 10525 +287
==========================================
+ Hits 9821 10093 +272
- Misses 348 355 +7
- Partials 69 77 +8
Continue to review full report at Codecov.
|
Makefile
Outdated
$(PROTO_INTERMEDIATE_DIR)/trace/v1/trace.proto | ||
|
||
# Revert changes in OTEL proto and modify only package | ||
# The goal here is to import opentelemetry.proto.trace.v1.ResourceSpans in the query service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to change the package again? Can't you have the service in one package and the OTLP proto in another?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added more comments to the makefile
cmd/query/app/otel/grpc_gateway.go
Outdated
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
package otel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think otlp
would be more accurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use otlp
for the package name where the otlp
is compiled. Not for Jaeger's query service. I will use apiv3
.
PR updated. It's ready for another round. |
ae53586
to
89fc0b9
Compare
# Target proto-prepare-otel modifies OTEL proto to use import path jaeger.proto.* | ||
# The modification is needed because OTEL collector already uses opentelemetry.proto.* | ||
# and two complied protobuf types cannot have the same import path. The root cause is that the compiled OTLP | ||
# in the collector is in private package, hence it cannot be used in Jaeger. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there a ticket in OTEL to export that package? It do we not want to use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see any open issue. It has been private from the get-go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a similar process in Tempo which is here:
https://github.com/grafana/tempo/blob/main/Makefile#L129
We have also since forked the collector and written a script to just rename internal
=> external
so we can have access to more of the otel guts. We will probably move to using this soon instead of our current setup.
https://github.com/grafana/opentelemetry-collector/tree/0.29-grafana
I didn't see any open issue. It has been private from the get-go.
It used to be exposed b/c Tempo vendored it initially. They moved it internal and we asked about keeping it exposed. They opted to keep it internal to protect themselves from people forming a dependency on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see it being open, but good to know. Perhaps they have reasons to keep it intenal. They even spent a lot of time on the (ugly) pdata package to make that happen.
PR updated and coverage is back on green. Please review again. |
I gave it a bit more love and added cancelation for the grpc gateway |
resourceSpans, ok := spansByLibrary[res] | ||
if !ok { | ||
resourceSpans = map[instrumentationLibrary]*v1.InstrumentationLibrarySpans{} | ||
resourceSpans[library] = &v1.InstrumentationLibrarySpans{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function looks like it's going to do the majority of work when fetching traces; would it be worth initializing slice capacities where we know the expected sizes to avoid memory realloc. For example, here we could add:
Spans: make([]*v1.Span, len(spans))
same with rss
and rs.InstrumentationLibrarySpans
below, as well as with some of the other j...ToOTLP
funcs which may have the opportunity for this simple optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make([]*v1.Span, len(spans))
Does not seem correct. The function iterates over jaeger spans and creates resource spans slice and instrumentation slice spans. before hand, the size of these two slices is not known.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah okay, although I thought the sizes for rss
and rs.InstrumentationLibrarySpans
could be determined up front as len(spansByLibrary)
& len(libMap)
respectively.
return rss | ||
} | ||
|
||
type instrumentationLibrary struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I learnt from an earlier review that types should be at the top of file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it mostly applies to the exported types. This is a helper type that is bounded to the not exported function below
sort.Strings(keys) | ||
sBuilder := strings.Builder{} | ||
sBuilder.Grow(stringTagsLen) | ||
for k, v := range tagMap { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
considering this scenario where span0 has a tag:
key = "a"
val = "bc"
and span1 has a tag:
key = "ab"
val = "c"
could these two result in the same hash? if so, what is the consequence?
similarly, for cases where the value is an empty string:
span0: a="", b="c" -> abc
span1: a="b", c="" -> abc
please ignore if these scenarios won't realistically happen :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
corner case :). Then the uniqueness in the map will depend only on the service name which is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is resource
used as the key in spansByLibrary
if uniqueness depends only on the service name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uniqueness does not depend only on the service name but as well on the string tags.
can somebody re-review? |
} | ||
if tag, ok := kvs.FindByKey(tracetranslator.TagStatusCode); ok { | ||
statusExists = true | ||
if code, err := getStatusCodeValFromTag(tag); err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
although I agree that we should not return an error to the API caller or even return early in this function if there is an error in translating status codes, would it be worth logging (even on debug) the error so the problem can be addressed?
I imagine failure to map a jaeger span status code to OTLP status code should be more likely the exception than the norm, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be an exception state, this is based on OTELcol translator which does not log the error either. Adding logger is not a problem, but it will a bit pollute translator API.
"github.com/jaegertracing/jaeger/model" | ||
commonv1 "github.com/jaegertracing/jaeger/proto-gen/otel/common/v1" | ||
resourcev1 "github.com/jaegertracing/jaeger/proto-gen/otel/resource/v1" | ||
v1 "github.com/jaegertracing/jaeger/proto-gen/otel/trace/v1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
related to #2988 (comment), i think something like otlptrace.ResourceSpans
is more readable than v1.ResourceSpans
. Perhaps the same for commonv1
and resourcev1
.
likewise, I think it would be easier to read jaeger.Span
than model.Span
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have renamed it to v1
to tracev1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
besides needing to rebase, lgtm
Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
3990451
to
0b215b3
Compare
Signed-off-by: Pavol Loffay [email protected]
Supersedes: #3051
Depends on: jaegertracing/jaeger-idl#76
This PR adds support for a new query gRPC and REST (via grpc-gateway) API. The API definition is in jaegertracing/jaeger-idl#76 and it mimics the current gRPC query API, however it uses the OTLP model.
IDs (trace, span, parent) are encoded as hex strings in JSON (see spec https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/otlp.md#otlphttp) and hex IDs are also accepted as query params e.g. get trace by id. IDs are defined as byte arrays in proto and by default proto serializes them into base64. This implementation uses gogo with the custom type feature (that overrides marshalling) to overcome this. I am fine to keep using gogo and migrate to custom marshaller/vtprotobuf once OTEL does.
Example REST requests:
Notable changes
The GraphQL API will not be used because:
TODOs