Require and change description for `error.id` #1384

simitt · 2018-09-14T15:14:29Z

cc @elastic/apm-agent-devs

Qard · 2018-09-14T17:53:11Z

Was there a discussion somewhere on the switch from 64-bit IDs to 128-bit? I can't seem to find it.

This change would make them incompatible with the TraceContext IDs used in spans and transactions, so we'd need to special-case ID generation for errors. 🤔

simitt · 2018-09-17T07:32:25Z

The agreement has always been to use 128-bit for error.id, I just overlooked that when implementing it. The argument for using 128-bit over 64-bit is that the trace_id is optional for errors, so the ID needs to be globally unique, while transactions and spans always live inside a trace, so their IDs only need to be unique within their trace.

In case we align error.id with transaction.id and span.id we risk that the error.id is not globally unique. If you think the risk is worth the alignment (e.g. in case errors should usually not happen outside of traces?) we can discuss this again.

watson · 2018-09-17T07:49:29Z

If the error id wasn't required, would the APM Server then just generate one before inserting the error into ES?

felixbarny · 2018-09-17T08:06:04Z

It takes 5,056,937,541 tries to have a 50% probability of a collision for 64 bits of randomness (see https://en.wikipedia.org/wiki/Universally_unique_identifier#Collisions). That means if 1000 Errors/s are created, it takes 58,5 days to reach that.

Whether that's good enough is of course debatable. But since Errors happening outside of a trace is rather the exception than the norm, I'd say it's probably worth to align the error id having 64 bits of randomness with span and transaction ids.

But I don't mean to veto against 128 bit ids for errors.

What would be the worst thing that can happen when there are collisions for an error id?

watson · 2018-09-17T08:12:45Z

I've seen a few comments elsewhere about that not using 64-bit id's for errors will complicate the code in the agents as it's no longer the same as spans and transactions. I can't see exactly how that can be though. Can someone elaborate?

felixbarny · 2018-09-17T08:22:29Z

In the Java agent, I have the TraceContext class which holds the fields traceId, id and parentId (and also the traceparent flags). Currently, Spans, Transactions and Errors all have a TraceContext field. When serializing the JSON, this field is flattened, so that I dont have transaction.traceContext.id but transaction.id, for example. When the error.id is not the same as transaction.id, I have to special case errors. Not a huge deal but aligning would be a bit nicer.

watson · 2018-09-17T08:41:27Z

@felixbarny I see, thanks for explaining. Does the Java agent support reporting of errors outside the scope of a trace? And if so, will it then generate a trace-id because of the errors association with TraceContext?

There's a lot of room for optimization in the Node.js agent, and I guess that's part of the reason why I had a hard time understanding the relation. In the Node.js agent errors live 100% outside the world of the instrumentation logic. It's only recently that the agent added an optional error.transaction property so that we had a way of tying the two worlds a littler closer together.

simitt · 2018-09-17T09:16:33Z

@watson the APM Server does not set an error.Id (v1) in case it is missing. Afaik in the APM UI the error.grouping_key is used for showing details. I'm not sure if the error.id is used anywhere for querying - probably @sqren knows more here.

felixbarny · 2018-09-17T11:38:32Z

Does the Java agent support reporting of errors outside the scope of a trace?

Yes

And if so, will it then generate a trace-id because of the errors association with TraceContext?

Currently, if an error is tracked outside of a transaction, it will not have an id at all. But if I were to add support that (for example to expose the error id so that users can display the error id for end-users of their application), I would not generate a new trace id as that would imply that this error is part of a particular trace.

simitt · 2018-09-18T10:48:20Z

It looks like @Qard is ok with the changes. However, the discussions on this issue don't look finalized. @roncohen, @alvarolobato could you please state your opinion regarding 64-bit vs 128-bit for error.id.

roncohen · 2018-09-19T07:43:30Z

OK, let's stick to the decision to make the error ids 128-bit.

Some agents allow users to turn tracing off completely (or are planning to make that possible) and only do error logging, so an error happening outside a trace is not just a crazy edge case. I understand this comes with some complexity for some agents. I can understand that it's annoying to have to special case error ids, but i also understand from the comments here that it's not a huge problem.

alvarolobato · 2018-09-19T07:51:51Z

+1

Qard · 2018-09-19T17:32:46Z

Yep, no problem to me. It was just unexpected, as the spec previously stated 64-bit, so I built my initial error ID stuff to share the same trace context class as everything else, but I've since changed that. 👍

felixbarny · 2018-09-20T07:29:07Z

FWIW, I use the same context class but have means to specify the length of the id: TraceContext.with64BitId() for spans and transactions vs TraceContext.with128BitId() for errors.

https://github.com/elastic/apm-agent-java/pull/217/files#diff-4c49e985e6e99cd0dfb577d72a2863f5

simitt requested a review from watson September 14, 2018 15:14

roncohen mentioned this pull request Sep 14, 2018

Intake v2 support #1237

Closed

30 tasks

mikker mentioned this pull request Sep 14, 2018

Update id formats elastic/apm-agent-ruby#210

Closed

graphaelli assigned simitt Sep 14, 2018

graphaelli added [zube]: Inbox and removed [zube]: Inbox labels Sep 14, 2018

axw mentioned this pull request Sep 17, 2018

Change error.id to 128-bit random ID elastic/apm-agent-go#225

Closed

Qard approved these changes Sep 17, 2018

View reviewed changes

simitt mentioned this pull request Sep 18, 2018

[v2] [dt] Add missing implementation and requirements for distributed tracing #1390

Closed

9 tasks

simitt force-pushed the update-error-id branch from 036462f to a17b385 Compare September 18, 2018 16:15

simitt force-pushed the update-error-id branch from a17b385 to 71b818b Compare September 19, 2018 07:57

felixbarny mentioned this pull request Sep 19, 2018

Change error.id to 128 bit, make it required elastic/apm-agent-java#213

Closed

[v2] Require and change description for error.id

f877537

simitt force-pushed the update-error-id branch from 71b818b to f877537 Compare September 19, 2018 08:20

simitt merged commit 553eb8d into elastic:v2 Sep 19, 2018

zube bot added [zube]: Done and removed [zube]: In Review labels Sep 19, 2018

felixbarny mentioned this pull request Sep 19, 2018

Error id adjustments for intake v2 elastic/apm-agent-java#217

Merged

roncohen pushed a commit to roncohen/apm-server that referenced this pull request Oct 7, 2018

[v2] Require and change description for error.id (elastic#1384)

4ba80a2

roncohen pushed a commit to roncohen/apm-server that referenced this pull request Oct 15, 2018

[v2] Require and change description for error.id (elastic#1384)

9b92d76

roncohen pushed a commit to roncohen/apm-server that referenced this pull request Oct 15, 2018

[v2] Require and change description for error.id (elastic#1384)

aa2c0da

roncohen pushed a commit to roncohen/apm-server that referenced this pull request Oct 15, 2018

[v2] Require and change description for error.id (elastic#1384)

308e046

roncohen pushed a commit to roncohen/apm-server that referenced this pull request Oct 15, 2018

[v2] Require and change description for error.id (elastic#1384)

f4948ae

roncohen pushed a commit to roncohen/apm-server that referenced this pull request Oct 15, 2018

[v2] Require and change description for error.id (elastic#1384)

d8f44af

roncohen pushed a commit to roncohen/apm-server that referenced this pull request Oct 16, 2018

[v2] Require and change description for error.id (elastic#1384)

f310338

roncohen pushed a commit that referenced this pull request Oct 16, 2018

[v2] Require and change description for error.id (#1384)

f2238a1

alvarolobato added this to the 6.5 milestone Oct 26, 2018

alvarolobato removed the [zube]: Done label Oct 26, 2018

simitt deleted the update-error-id branch November 8, 2018 09:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Require and change description for `error.id` #1384

Require and change description for `error.id` #1384

simitt commented Sep 14, 2018

Qard commented Sep 14, 2018

simitt commented Sep 17, 2018

watson commented Sep 17, 2018

felixbarny commented Sep 17, 2018

watson commented Sep 17, 2018

felixbarny commented Sep 17, 2018

watson commented Sep 17, 2018

simitt commented Sep 17, 2018

felixbarny commented Sep 17, 2018

simitt commented Sep 18, 2018

roncohen commented Sep 19, 2018

alvarolobato commented Sep 19, 2018

Qard commented Sep 19, 2018

felixbarny commented Sep 20, 2018

Require and change description for error.id #1384

Require and change description for error.id #1384

Conversation

simitt commented Sep 14, 2018

Qard commented Sep 14, 2018

simitt commented Sep 17, 2018

watson commented Sep 17, 2018

felixbarny commented Sep 17, 2018

watson commented Sep 17, 2018

felixbarny commented Sep 17, 2018

watson commented Sep 17, 2018

simitt commented Sep 17, 2018

felixbarny commented Sep 17, 2018

simitt commented Sep 18, 2018

roncohen commented Sep 19, 2018

alvarolobato commented Sep 19, 2018

Qard commented Sep 19, 2018

felixbarny commented Sep 20, 2018

Require and change description for `error.id` #1384

Require and change description for `error.id` #1384