-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support XRay features (Sql, Cause, aws) in Encoder #59
Support XRay features (Sql, Cause, aws) in Encoder #59
Conversation
Please don't merge yet. I am still testing some cases since |
144e823
to
493f549
Compare
cd6c570
to
97c3223
Compare
27dfb33
to
c3090d9
Compare
c3090d9
to
eb81fda
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like good work. We are accumulating some tech-debt. Can you try to add a unit test? for example, in the test instantiate a zipkin2.Span with the input data and test the output json on string equality (ex assertThat(json).isEqualTo("{...}").
@@ -55,8 +57,13 @@ | |||
|| span.kind() != Span.Kind.SERVER && span.kind() != Span.Kind.CONSUMER) { | |||
writer.name("type").value("subsegment"); | |||
if (span.kind() != null) writer.name("namespace").value("remote"); | |||
writer.name("name").value(span.remoteServiceName() == null ? "" : span.remoteServiceName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to write empty name? or is leaving out name tolerable? If possible, I'd prefer to not write placeholder values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will test it and let you know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when it is null or empty string, it ignores span so it does not appear at all. There is two possibility when it is null or empty:
- name it "unknown"
- ignore span
Which one seems better for you @adriancole ?
} | ||
writer.name("name").value(span.localServiceName()); | ||
// override with the user remote tag |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should do it like this. we are setting the value of the xray field "namespace" to the value of a zipkin key "remote".
per amazon "Any calls to a remote service/broker should be namespace as ‘remote’. See ‘Optional Subsegment Fields’ under http://docs.aws.amazon.com/xray/latest/devguide/xray-api-segmentdocuments.html#api-segmentdocuments-subsegments."
So, we should automatically default "namespace" to "remote" when it is a downstream PRODUCER or CLIENT span. If you look in the docs, the other supported choice is "aws". We don't support that in brave, yet, but we could.. in that case, I'd recommend using the zipkin key "xray.namespace" as a way to conditionally override.
make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xray.namespace
sound good but all other variables are using something similar to JSON path dot notation
.
I will change it as you suggest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I instrument a method (It might not be available at the time of this writing), what kind of span it will be? Since it is not connecting something else, I have some doubts on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im referring about a tag key ex span.tag("xray.namespace", "aws"). Reporter only sends complete spans.. how would we have a problem in routine use? For example there are only two values remote (we know) and aws (solved as soon as we write aws sdk instrumentation or integrate with it). Users will not do this manually in other words
import java.io.IOException; | ||
import okio.Buffer; | ||
|
||
public final class XRayFormatter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add docs to how you came up with the format.. also make the type not public unless required to be. If required to be, mention why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a utility class, basically brave.SpanCustomizer#tag
is using <String, String>
we can not pass an exception to it. So I had to prepare JSON to be passed to subsegments. Here is the place which is used https://github.com/openzipkin/zipkin-aws/pull/59/files#diff-5669c8fcacad45292f4bc1d457640d24R243. This is the dirtiest place since it is not using JsonWriter
. Could you review this part again please? If you are agree with this solution I will come up some documents regarding your notes before you stated.
Here is the relevant docs on XRay as well: The structure is deeply nested and there is not an easy solution with the current state.
http://docs.aws.amazon.com/xray/latest/devguide/xray-api-segmentdocuments.html#api-segmentdocuments-errors
I had also decompiled XRay source codes and found that place overly complicated.
That formatter class is not something very nice but it gives some information about exception and also let the user to use another logic.
I will add more test regarding Encoder. |
40a6576
to
ef1fed5
Compare
I would try to get close to user's desire as possible. Usually span name
isnt null, but it could be sent as null when updating a span post factum.
Maybe set to empty string and add a comment we are doing this to avoid the
data being dropped.
…On 8 Nov 2017 9:28 pm, "Cemalettin Koc" ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In storage-xray-udp/src/main/java/zipkin2/storage/xray_udp/
UDPMessageEncoder.java
<#59 (comment)>:
> @@ -55,8 +57,13 @@
|| span.kind() != Span.Kind.SERVER && span.kind() != Span.Kind.CONSUMER) {
writer.name("type").value("subsegment");
if (span.kind() != null) writer.name("namespace").value("remote");
+ writer.name("name").value(span.remoteServiceName() == null ? "" : span.remoteServiceName());
when it is null or empty string, it ignores span so it does not appear at
all. There is two possibility when it is null or empty:
1. name it "unknown"
2. ignore span
Which one seems better for you @adriancole <https://github.com/adriancole>
?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#59 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAD61ww8upy_aq5HWXnfwWzZf5lWIYOLks5s0ayJgaJpZM4QSHIk>
.
|
Sorry you said empty is not permitted.. yeah unknown is best but make sure
there is a comment (and a unit test saying why we do this as we might
potentially overwrite a good name with a bad one, unless the way logic is
structured it is unlikely..)
…On 9 Nov 2017 8:11 am, "Adrian Cole" ***@***.***> wrote:
I would try to get close to user's desire as possible. Usually span name
isnt null, but it could be sent as null when updating a span post factum.
Maybe set to empty string and add a comment we are doing this to avoid the
data being dropped.
On 8 Nov 2017 9:28 pm, "Cemalettin Koc" ***@***.***> wrote:
> ***@***.**** commented on this pull request.
> ------------------------------
>
> In storage-xray-udp/src/main/java/zipkin2/storage/xray_udp/UDPM
> essageEncoder.java
> <#59 (comment)>:
>
> > @@ -55,8 +57,13 @@
> || span.kind() != Span.Kind.SERVER && span.kind() != Span.Kind.CONSUMER) {
> writer.name("type").value("subsegment");
> if (span.kind() != null) writer.name("namespace").value("remote");
> + writer.name("name").value(span.remoteServiceName() == null ? "" : span.remoteServiceName());
>
> when it is null or empty string, it ignores span so it does not appear at
> all. There is two possibility when it is null or empty:
>
> 1. name it "unknown"
> 2. ignore span
>
> Which one seems better for you @adriancole
> <https://github.com/adriancole> ?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#59 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AAD61ww8upy_aq5HWXnfwWzZf5lWIYOLks5s0ayJgaJpZM4QSHIk>
> .
>
|
* | ||
* Formating an exception to be consumed by XRay by Spans annotations | ||
* | ||
* @param exceptionId an unique exception id @see brave.internal.Platform#randomLong() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an ID? Sometimes I have seen hashes for this .. ex hash of the stack trace. Either way i would hide this type, using internally in aws specific http and sql parsers. Hard to change public types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc: id – A 64-bit identifier for the exception, unique among segments in the same trace, in 16 hexadecimal digits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably can reuse span ID as zipkin instrumentation only record one error tag per span
* Formating an exception to be consumed by XRay by Spans annotations | ||
* | ||
* @param exceptionId an unique exception id @see brave.internal.Platform#randomLong() | ||
* @param isRemote Any calls to a remote service/broker should be true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presume applies to client and server side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc: remote – boolean indicating that the exception was caused by an error returned by a downstream service.
* | ||
* @param exceptionId an unique exception id @see brave.internal.Platform#randomLong() | ||
* @param isRemote Any calls to a remote service/broker should be true | ||
* @param maxStackTraceElement XRay has a limit of 60KB per segment. Choose a wise number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be global configurable as ex transports have limits that can be much smaller
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can I configure this number since the method is static? Should I declare another static variable can be changeable? Or should I make it static class and force users to create an instance of it?
.shared(false) | ||
.build(); | ||
|
||
byte[] bytes = UDPMessageEncoder.doEncode(span); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit i would extract a method that only encodes span json as that lets us in future do json asserts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got your point. I made it JSON assertions friendly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to defer exception mapping as it is complex to map and includes state assumptions.. it is better imho to get the easier stuff done and landed before harder stuff. What do you think? Punt the stack trace parsing to a separate issue (which we can follow up on)
@@ -121,11 +201,56 @@ | |||
writer.endObject(); | |||
} | |||
|
|||
Integer errorStatus = httpResponseStatus; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have standard http tag for this. http.status_code I believe. For input tags, reuse zipkin ones and namespace (ex xxxx.foo) where exist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is an http error and available http.status_code
, I am using it. But there might be some other errors as well. For example a DB exception or something else. For that part our instrumentation codes must provide error_status
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For input tags, reuse zipkin ones and namespace (ex xxxx.foo) where exist
I could not get this, can you expand please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly it is getting cognitively difficult to follow things. Let's try not to move ahead of the ball, as there's an advantage in x-ray which is that it is model based. Instead of creating error_status which isn't defined anywhere in zipkin, for possible correlated use between http and sql, perhaps define them separately, in namespaces such as http.status_code (and whatever it is for sql, ex sql.status_code). Eventhough I have more time than others I'm having trouble following this, so lets try and keep it organized as possible and using zipkin conventions whereever, and without support for hypothetical things (ex only code things you have a test case and actual brave instrumentation to support)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be other services can cause status_code. How can I add status_code for a redis call or a method instrumentation. If you believe that it is confusing. Let's
change error_status
to xray.error_status
since it is a top level property of Subsegments.
By the way, it is indeed easier to review seeing things applied. I think we were discussing making a brave-instrumentation-xray module which would produce the tags consumed here. Seeing together makes it easier to tell style impact. How do you feel about creating that as a maven module? Would you need a hand? Cc @devinsba |
in zipkin, a generic error is the tag "error" and the UI pays attention to it. for redis, it would likely be redis.error_status if there is something more specific. Lets' please stick to things concrete, it will be more smooth for both of us to only focus on things we can test |
OK, I removed it as well. It is better to address it in a separated issue. |
PS I'd like to help more, just trying to finish cassandra which is very
overdue. Notably I'd like to help get the integrated stuff in (ex such that
we can connect brave to this work, and test end-to-end) really appreciate
you blazing trails and patience with feedback.
|
No problem @adriancole. It was my pleasure. You are leading this community, the things we are doing nothing when it is compared with your work. I am just new to this land and had some difficulties to grab context. :) |
@adriancole is there anything else I can provide for this pr? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One test nit. Please add http server and client tests based on data brave produces.
Also remove the error section as it isnt tested or referenced yet. Reintroduce it later.
Good stuff will merge after above
.newBuilder() | ||
.kind(Span.Kind.CLIENT) | ||
.name("test-cemo") | ||
.remoteEndpoint(Endpoint.newBuilder().build()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty endpoint is not normal. Better left unset
} | ||
|
||
@Test | ||
public void doEncodeUnkown() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe encode_nameAsUnknownWhenRemoteServiceNameNull
hi, @cemo sorry if I've asked a lot. Let me know if you need a hand wrapping up last things I mentioned. |
No problem but I could not find time to complete. I would be glad if you polish and deliver it. |
No problem but I could not find time to complete. I would be glad if you
polish and deliver it.
sure thing. that's why I asked :) will polish up.
Thanks for all the help
|
I will do some polishing post-merge.. promise |
// using "unknown" subsegment name will help to detect missing names | ||
writer.name("name").value(span.remoteServiceName() == null ? "unknown" : span.remoteServiceName()); | ||
}else{ | ||
writer.name("name").value(span.localServiceName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey, I am now getting all my traces labelled as "unkown", previously they used to obey the localServiceName()
. How is remoteServiceName supposed to be set? I usually set the localServiceName through Tracing.newBuilder().localServiceName
, but there is no similar method for remote service name.
hey, I am now getting all my traces labelled as "unkown", previously they used to obey the localServiceName(). How is remoteServiceName supposed to be set? I usually set the localServiceName through Tracing.newBuilder().localServiceName, but there is no similar method for remote service name.
you have to scope your http tracing component when you are a client to
assign a remote service name. It is "clientOf"
https://github.com/openzipkin/brave/tree/master/instrumentation/http#span-data-policy
Not sure if it is fine to just leave out the name when we don't know the
remote side or not.. we can ask. If you look in the other PR, the test is
more obvious
https://github.com/openzipkin/zipkin-aws/pull/65/files#diff-cc008c2d18f80edd4dff94bf438c1b4aR72
|
No description provided.