Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Require and change description for error.id #1384

Merged
merged 1 commit into from
Sep 19, 2018

Conversation

simitt
Copy link
Contributor

@simitt simitt commented Sep 14, 2018

cc @elastic/apm-agent-devs

@simitt simitt requested a review from watson September 14, 2018 15:14
@roncohen roncohen mentioned this pull request Sep 14, 2018
30 tasks
@Qard
Copy link
Contributor

Qard commented Sep 14, 2018

Was there a discussion somewhere on the switch from 64-bit IDs to 128-bit? I can't seem to find it.

This change would make them incompatible with the TraceContext IDs used in spans and transactions, so we'd need to special-case ID generation for errors. 🤔

@simitt
Copy link
Contributor Author

simitt commented Sep 17, 2018

The agreement has always been to use 128-bit for error.id, I just overlooked that when implementing it. The argument for using 128-bit over 64-bit is that the trace_id is optional for errors, so the ID needs to be globally unique, while transactions and spans always live inside a trace, so their IDs only need to be unique within their trace.

In case we align error.id with transaction.id and span.id we risk that the error.id is not globally unique. If you think the risk is worth the alignment (e.g. in case errors should usually not happen outside of traces?) we can discuss this again.

@watson
Copy link
Contributor

watson commented Sep 17, 2018

If the error id wasn't required, would the APM Server then just generate one before inserting the error into ES?

@felixbarny
Copy link
Member

It takes 5,056,937,541 tries to have a 50% probability of a collision for 64 bits of randomness (see https://en.wikipedia.org/wiki/Universally_unique_identifier#Collisions). That means if 1000 Errors/s are created, it takes 58,5 days to reach that.

Whether that's good enough is of course debatable. But since Errors happening outside of a trace is rather the exception than the norm, I'd say it's probably worth to align the error id having 64 bits of randomness with span and transaction ids.

But I don't mean to veto against 128 bit ids for errors.

What would be the worst thing that can happen when there are collisions for an error id?

@watson
Copy link
Contributor

watson commented Sep 17, 2018

I've seen a few comments elsewhere about that not using 64-bit id's for errors will complicate the code in the agents as it's no longer the same as spans and transactions. I can't see exactly how that can be though. Can someone elaborate?

@felixbarny
Copy link
Member

In the Java agent, I have the TraceContext class which holds the fields traceId, id and parentId (and also the traceparent flags). Currently, Spans, Transactions and Errors all have a TraceContext field. When serializing the JSON, this field is flattened, so that I dont have transaction.traceContext.id but transaction.id, for example. When the error.id is not the same as transaction.id, I have to special case errors. Not a huge deal but aligning would be a bit nicer.

@watson
Copy link
Contributor

watson commented Sep 17, 2018

@felixbarny I see, thanks for explaining. Does the Java agent support reporting of errors outside the scope of a trace? And if so, will it then generate a trace-id because of the errors association with TraceContext?

There's a lot of room for optimization in the Node.js agent, and I guess that's part of the reason why I had a hard time understanding the relation. In the Node.js agent errors live 100% outside the world of the instrumentation logic. It's only recently that the agent added an optional error.transaction property so that we had a way of tying the two worlds a littler closer together.

@simitt
Copy link
Contributor Author

simitt commented Sep 17, 2018

@watson the APM Server does not set an error.Id (v1) in case it is missing. Afaik in the APM UI the error.grouping_key is used for showing details. I'm not sure if the error.id is used anywhere for querying - probably @sqren knows more here.

@felixbarny
Copy link
Member

Does the Java agent support reporting of errors outside the scope of a trace?

Yes

And if so, will it then generate a trace-id because of the errors association with TraceContext?

Currently, if an error is tracked outside of a transaction, it will not have an id at all. But if I were to add support that (for example to expose the error id so that users can display the error id for end-users of their application), I would not generate a new trace id as that would imply that this error is part of a particular trace.

@simitt
Copy link
Contributor Author

simitt commented Sep 18, 2018

It looks like @Qard is ok with the changes. However, the discussions on this issue don't look finalized. @roncohen, @alvarolobato could you please state your opinion regarding 64-bit vs 128-bit for error.id.

@roncohen
Copy link
Contributor

OK, let's stick to the decision to make the error ids 128-bit.

Some agents allow users to turn tracing off completely (or are planning to make that possible) and only do error logging, so an error happening outside a trace is not just a crazy edge case. I understand this comes with some complexity for some agents. I can understand that it's annoying to have to special case error ids, but i also understand from the comments here that it's not a huge problem.

@alvarolobato
Copy link

+1

@Qard
Copy link
Contributor

Qard commented Sep 19, 2018

Yep, no problem to me. It was just unexpected, as the spec previously stated 64-bit, so I built my initial error ID stuff to share the same trace context class as everything else, but I've since changed that. 👍

@felixbarny
Copy link
Member

FWIW, I use the same context class but have means to specify the length of the id: TraceContext.with64BitId() for spans and transactions vs TraceContext.with128BitId() for errors.

https://github.com/elastic/apm-agent-java/pull/217/files#diff-4c49e985e6e99cd0dfb577d72a2863f5

roncohen pushed a commit to roncohen/apm-server that referenced this pull request Oct 7, 2018
roncohen pushed a commit to roncohen/apm-server that referenced this pull request Oct 15, 2018
roncohen pushed a commit to roncohen/apm-server that referenced this pull request Oct 15, 2018
roncohen pushed a commit to roncohen/apm-server that referenced this pull request Oct 15, 2018
roncohen pushed a commit to roncohen/apm-server that referenced this pull request Oct 15, 2018
roncohen pushed a commit to roncohen/apm-server that referenced this pull request Oct 15, 2018
roncohen pushed a commit to roncohen/apm-server that referenced this pull request Oct 16, 2018
@alvarolobato alvarolobato added this to the 6.5 milestone Oct 26, 2018
@simitt simitt deleted the update-error-id branch November 8, 2018 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants