-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jaeger mangles (?) Trace IDs sent with Thrift #1951
Jaeger mangles (?) Trace IDs sent with Thrift #1951
Comments
It is more likely that the library you're using to generate the messages is doing something different. I think Jaeger internally is pretty consistent, and we have integration tests in different languages that validate string representation (aside from the 0 padding issue, which is not addressed in all clients, but that doesn't seem to be the main problem in your case). Internally Jaeger always represents trace ID as two |
I was suspecting that but I still cannot figure out what is wrong. In example 1 above the trace id generated by the application was The exporter deals with it using https://github.com/opencensus-beam/opencensus-jaeger/blob/master/src/oc_reporter_jaeger.erl : TraceId = Span#span.trace_id,
<<High:64, Low:64>> = <<TraceId:128/integer>>,
ParentSpanId = case Span#span.parent_span_id of
undefined -> 0;
PSI -> PSI
end,
[#'Jaeger.Thrift.Span'{'traceIdLow' = Low,
'traceIdHigh' = High, Running that gives me:
Everything looks fine here - the first byte is 0e and that is what I see in the logs, but in Jaeger it is e7. That is also interpreting those numbers as big endian. |
When you say "in Jaeger", what do you mean exactly? In the UI? It seems we never fixed #1578 on the backend, let's see if #1956 works. |
Yes, I mean in the UI. :) |
@haljin can you try with latest master? |
I assume the It seems the problems still persist, although there it seems it matches more often now. E.g; Note that the lower half matches
Now the upper half matches? |
Can you post your Elixir marshalling/unmarshalling code? |
The Thrift code is what I posted above, it's an Erlang library: https://github.com/opencensus-beam/opencensus-jaeger/blob/master/src/oc_reporter_jaeger.erl make_spans(Span) ->
TraceId = Span#span.trace_id,
<<High:64, Low:64>> = <<TraceId:128/integer>>,
ParentSpanId = case Span#span.parent_span_id of
undefined -> 0;
PSI -> PSI
end,
[#'Jaeger.Thrift.Span'{'traceIdLow' = Low,
'traceIdHigh' = High,
'spanId' = Span#span.span_id,
'parentSpanId' = ParentSpanId,
'operationName' = Span#span.name,
'references' = to_references(Span#span.links),
'flags' = Span#span.trace_options,
'startTime' = wts:to_absolute(Span#span.start_time),
'duration' = wts:duration(Span#span.start_time, Span#span.end_time),
'tags' = to_tags(Span#span.attributes),
'logs' = to_logs(Span#span.time_events)}]. The code for the logger is here: https://github.com/opencensus-beam/opencensus_elixir/blob/master/lib/opencensus/span_context.ex def hex_span_id(n) when is_integer(n) and n > 0,
do: :io_lib.format("~16.16.0b", [n]) |> to_string() |
What does this mean?
Is the |
It encodes the number on 128 bits and then takes the first 64 bits and puts them as an integer on the first half and the second 64 bits as an integer as the second half. 3> TraceId = 19249984526203758347956588965281933162.
19249984526203758347956588965281933162
4> << TraceId:128/integer>>.
<<14,123,105,131,128,85,135,111,223,139,126,224,236,252,39,106>> %Binary syntax - full Id byte by byte
9> << High:64>>.
<<14,123,105,131,128,85,135,111>> %First half of the binary
10> << Low:64>>.
<<223,139,126,224,236,252,39,106>> %Second half of the binary
6> High.
1043543752181122927
7> Low.
16108107996734433130
8> High bsl 64 + Low.
19249984526203758347956588965281933162 |
These numbers look ok to me:
However, you cannot assign 16108107996734433130 to the field in Thrift as is, because it's too large for int64, you need to convert it to a negative complement. See how this is done in jaeger-client-python: def id_to_int(big_id):
if big_id is None:
return None
# thrift defines ID fields as i64, which is signed,
# therefore we convert large IDs (> 2^63) to negative longs
if big_id > _max_signed_id:
big_id -= _max_unsigned_id
return big_id |
Fixes jaegertracing/jaeger#1951 Thrift IDs are i64, which is a _signed_ 64-bit integer, but the Jaeger protocol says IDs are unsigned, so in our records we store them unsigned. When sending the IDs through Thrift, we need to convert the bits directly from unsigned to signed. This matches the behavior in the Python client (https://github.com/jaegertracing/jaeger-client-python/blob/51d9027c3e0b87f6721dd14e7bc2b3303ce682c2/jaeger_client/thrift.py#L32-L47) as pointed out here (jaegertracing/jaeger#1951 (comment)), but we get to use type specifiers to make it clearer (:tada: Erlang!) instead of bit shifts and masking that Python needs to do.
I assume this has been answered. |
Requirement - what kind of business use case are you trying to solve?
Match Trace IDs output to logs to those in Jaeger
Problem - what in Jaeger blocks you from solving the requirement?
Jaeger interprets Trace IDs in a mysterious way that makes the hexed ID completely different
I have an application that uses OpenCensus to generate spans and traces and then sends those to Jaeger using the Thrift interface. At the same time it logs the current Trace ID with the log entries so that we can match the traces from Jaeger to application logs. Except these IDs don't seem to match. Sometimes they do however.
A couple examples (The application is written in Elixir and I used that to parse so I'll use the byte syntax to show binaries)
Trace ID generated by the application:
0e7b69838055876fdf8b7ee0ecfc276a
Trace ID in Jaeger:
e7b69838055876fa074811f1303d895
If we look at it in binary format:
Application:
00001110011110110110100110000011100000000101010110000111011011111101111110001011011111101110000011101100111111000010011101101010
Jaeger:
11100111101101101001100000111000000001010101100001110110111110100000011101001000000100011111000100110000001111011000100100000101
If you look closely it seems that leading zeros have been removed (I found #1578 but it seems only somewhat related to this problem) while the lower part is inverted? Upper part has leading zeroes while lower part has a leading 1.
Application upper:
0000111001111011011010011000001110000000010101011000011101101111
Jaeger upper:
1110011110110110100110000011100000000101010110000111011011111010
Application lower:
1101111110001011011111101110000011101100111111000010011101101010
Jaeger lower:
0000011101001000000100011111000100110000001111011000100100000101
Trace ID generated by the application:
3b4b7e0eb55df3283a352c86fa0df6af
Trace ID in Jaeger:
3b4b7e0eb55df3283a352c86fa0df6af
Binary:
00111011010010110111111000001110101101010101110111110011001010000011101000110101001011001000011011111010000011011111011010101111
This one seems to match perfectly? I noticed that both lower and upper half have leading zeroes.
Upper:
0011101101001011011111100000111010110101010111011111001100101000
Lower:
0011101000110101001011001000011011111010000011011111011010101111
Trace ID generated by the application:
9fd1aaa32ff34ca5ad7869260f54dc08
Trace ID in Jaeger:
e02e555cd00cb35ad28796d9f0ab23f7
Binary:
Application:
10011111110100011010101010100011001011111111001101001100101001011010110101111000011010010010011000001111010101001101110000001000
Jaeger:
11100000001011100101010101011100110100000000110010110011010110101101001010000111100101101101100111110000101010110010001111110111
Now both lower and upper have leading 1s causing weird things to happen most notable bits invert:
Application upper:
1001111111010001101010101010001100101111111100110100110010100101
Jaeger upper:
1110000000101110010101010101110011010000000011001011001101011010
Application lower:
1010110101111000011010010010011000001111010101001101110000001000
Jaeger lower:
1101001010000111100101101101100111110000101010110010001111110111
Is this some unsigned/signed magic that is happening that causes Jaeger to interpret these numbers differently? Is this some 1s complement magic with a sign bit? I am very confused and don't know if it is something I can change in my application (or rather the reporter library) or is it something in Jaeger? I am running jaeger with
jaegertracing/all-in-one:1.14
Docker image, but have seen the same problem happen on thelatest
image.The text was updated successfully, but these errors were encountered: