v2: Require either `span.start` or `span.timestamp` #1340

simitt · 2018-08-30T09:03:47Z

A span's timestamp is an absolute value when the span has been captured, and start the offset relative to the transactions start.
At the moment span.start is required, but span.timestamp is not required, as the RUM agent cannot set a span.timestamp.

Except for the RUM agent all other agents do set the span.timestamp and thus span.start seems redundant there. Thus the suggestion is to require one of both fields to be set.

Is this ok with all @elastic/apm-agent-devs ?

And is it ok with @elastic/apm-ui ?

The text was updated successfully, but these errors were encountered:

hmdhk · 2018-08-30T09:14:35Z

A couple of considerations:

Currently the apm server sets all missing timestamps (transaction and it's spans) to the time it received the request.
In the RUM agent span.start is set instead.
because of these limitations, in the v2 spec, for the RUM agent, we agreed on sending a transaction and it's spans in the same request (so that all of them get the same timestamp).

axw · 2018-08-30T09:17:53Z

We can't really do this, turns out. I misunderstood, span's timestamp is meant to be exactly the same as the transaction timestamp. It should probably be called transaction_timestamp. Per @watson:

AFAIK the timestamp is only used in the query when retrieving the spans and transactions from Elasticsearch. When laying out the spans in the waterfall the offset and duration is used.

By using the same timestamp, the agents can re-use these timestamps across all spans belonging to the same transaction. This keeps resource usage at a minimum.

It also means that a time-restricted query will always return all spans belonging to a transactions instead of only the ones falling within the time window.

Additionally, in the v1 protocol the spans didn't have timestamps, and the server would pick the timestamp from the transaction and ad it to the span before storing the span in ES. So this basically mimics what the server is already doing in v1.

axw · 2018-08-30T09:45:34Z

Discussion still ongoing. A couple of counterpoints:

storing the transaction timestamp in the span doc the violates ECS definition of @timestamp
even if we have the same timestamp for a span and its transaction, we'll still need to consider transactions crossing time boundaries for a distributed trace

We'll take this to the weekly meeting to flesh it out.

felixbarny · 2018-08-30T09:45:40Z

From a chat conversation with @sqren

@felixbarny

AFAIK the timestamp is only used in the query when retrieving the spans and transactions from Elasticsearch.
Which query for which view in the APM UI are you referring to? The transaction detail view does not need a timeframe, it just needs the
transaction ID

@sqren

fyi: The view doesn’t need the timestamp but we currently limit the search to the selected time range:

https://github.com/elastic/kibana/blob/master/x-pack/plugins/apm/server/lib/transactions/spans/get_spans.js#L21-L37

Should be fine to remove the @timestamp range query though. (edited)

@felixbarny

is that the transaction detail view?

@sqren

Yeah, it’s the waterfall view for getting the spans

@felixbarny

so this is basically to improve the performance so that not all indices have to be searched?

@sqren

yes, at least theoretically it should be faster but I don’t have any data to back it up (edited)

@felixbarny

I see. But I think that’s not a good enough reason to set the span’s @timestamp to the transaction’s @timestamp.
I think most of the performance gains (if it yields any) can be achieved by only using the gte: start without the lte: end. This also accounts for the fact, that there could be spans created after the transaction ends.

@felixbarny

but probably the query is fast enough by only using the term query on transaction id

formgeist · 2018-08-30T11:39:34Z

@simitt I'm not sure what implications it has for the UI (yet), but if @sqren is onboard with it I'm sure it's 👌

sorenlouv · 2018-08-30T12:31:34Z

Currently the UI uses both transaction id and timestamp to query for spans but as mentioned by @felixbarny we can remove the timestamp constraint, and then it shouldn't make a big difference. If the timestamp is missing, I expect that I can calculate it by adding transaction @timestamp with span start.

Btw. am I the only one who would prefer offset or transaction_offset instead of start?

watson · 2018-08-30T12:34:54Z

I agree offset is a better term than start. Consider adding it to the list of API changes we want to consider for v2

mikker · 2018-08-30T12:37:56Z

offset 👍🏼. relative_start also ok

simitt · 2018-08-30T14:50:20Z

@sqren I suggest to avoid changes in ES naming if not necessary in minor releases. We can rename the attributes at the Intake API level.

sorenlouv · 2018-08-30T15:20:02Z

That's fair :)

Qard · 2018-08-30T18:02:06Z

How would offset-based times work in a distributed trace? We don't have a way to pass the outer transaction start time into inner services, so I guess the services would need an inner transaction start time to work with? 🤔

watson · 2018-08-30T18:55:21Z

@Qard It's offset from the start time of the transaction in the same service only

Until elastic/apm-server#1340 is resolved, keep setting span.start. The way we set it is unsound, assuming the transaction timestamp cannot change after construction. If we do not go ahead with making span.start optional, we'll have to adjust the API.

alvarolobato · 2018-09-10T13:54:54Z

Decided on the APM weekly meeting:

More investigation required: performance concerns, millisecond precision. @simitt and @sqren to investigate.
Will need to cater for no-upper-bound on time for DT anyway

felixbarny · 2018-09-13T06:58:52Z

We could accept epoch micros or an ISO timestamp with microsecond precision. The @timestamp can still only hold milliseconds, but we could also store the epoch micros timestamp in a separate field so that the UI can more accurately align the spans.
With version 7 (IIRC), ES will be able to store timestamps in up to nanosecond precision.

sorenlouv · 2018-09-19T16:34:23Z

Writeup from the meeting today:

Problem
For distributed tracing, spans and transactions will need to be queried and sorted according to their absolute start time.
Transactions currently have @timestamp with millisecond precision since ES doesn't support microsecond precision (will change in 7.0.0). Spans have a relative offset start in microsecond precision. The absolute start time of a span is the sum of transaction.@timestamp and span.start, which results in a timestamp with less than microsecond precision.

Query performance
To query for spans and transaction it is expected that a range filter on @timestamp will improve the query performance (yet to be verified). This necessitates an accurate absolute timestamp on spans.

Ordering of transactions and spans
To visualize transactions and spans for DT in the UI, they are ordered according to their parent/child relationship, and their absolute start time. To avoid incorrect/inconsistent ordering for transactions/spans started within the same millisecond, they should have timestamps with microsecond precision.

Suggestion for ES data model
Transaction and span documents should have @timestamp (millisecond precision) and timestamp.us (microsecond precision).
In addition the field start should eventually be removed from spans (possibly breaking change)

Changes for agents and apm-server
Agents should send timestamp with microsecond precision, and the apm-server will convert this to millisecond precision for @timestamp, while keeping microsecond precision for timestamp.us.

Since the RUM agent cannot provide a reliable absolute timestamp it will continue sending spans with relative offset (start), which will be converted to an absolute timestamp by APM-Server. This is possible because the RUM agent will always send spans together with the parent transaction.

Cons:

Having two timestamp fields for every document will increase storage requirements
Increased complexity in the APM Server logic converting relative offsets to absolute timestamp

Pros:

Query performance should improve when adding a range filter
Ordering of transactions and spans will be consistent with microsecond precision timestamps
Agents no longer need to keep a reference to the parent transaction to set the relative start field
This improves consistency between spans and transactions, and makes it easier to merge them, if we feel so inclined

watson · 2018-09-19T20:49:06Z

ISO 8601 format

For reference, this is what the ISO 8601 timestamp spec have to say about adding fraction precision to the timestamp:

Decimal fractions may be added to any of the three time elements. However, a fraction may only be added to the lowest order time element in the representation. A decimal mark, either a comma or a dot (without any preference as stated in resolution 10 of the 22nd General Conference CGPM in 2003, but with a preference for a comma according to ISO 8601:2004) is used as a separator between the time element and its fraction. To denote "14 hours, 30 and one half minutes", do not include a seconds figure. Represent it as "14:30,5", "1430,5", "14:30.5", or "1430.5". There is no limit on the number of decimal places for the decimal fraction. However, the number of decimal places needs to be agreed to by the communicating parties. For example, in Microsoft SQL Server, the precision of a decimal fraction is 3, i.e., "yyyy-mm-ddThh:mm:ss[.mmm]".

So to achieve microsecond precision, the timestamp sent from the agents should look like this:

yyyy-mm-ddThh:mm:ss.mmmmmmZ

E.g. a timestamp for 2018-09-19 at 21 minutes, 33 seconds, 912421 microseconds past 20, would look like this:

2018-09-19T20:21:33.912421Z

Integer/Float format

I would however prefer if we could switch to transmit the timestamp as a UNIX epoch float or integer value instead (we should choose one or the other of course).

We have 3 options for how to encode the timestamp:

As an integer representing microsecond: 1537388964792422
As a float representing milliseconds: 1537388964792.422
As a float representing seconds: 1537388964.792422

I think many agents would prefer to transmit the timestamp in some form of number format because:

To not generate the timestamp string which might come with a certain overhead
The built-in timestamp generator in the language might only support millisecond precision (this is the case for JavaScript), and so requiring us to use microsecond timestamps would mean that we would have to build a custom timestamp generator, which in turn might also incur an performance hit

axw · 2018-09-20T01:06:02Z

+1 to number since epoch. I'd prefer an integer representing nanos just because that's how Go stores things, but I suppose that'll be an issue for JS?

watson · 2018-09-20T06:37:25Z

@axw Yes unfortunately. All numbers in JavaScript are actually IEEE 754 floats. So the largest integer that can safely be represented is 9007199254740991 which would not allow for a nanosecond epoch.

Side note: BigInt support was just added to the language to combat this limitation, but not all browsers and versions of Node.js support it, and to my knowledge the built in JSON serializer doesn't work with it either.

The way we normally get around this limitation is to store the number as two integers in a tuple: [seconds, nanoseconds]. That way you could technically calculate the full number by seconds * 1e9 + nanoseconds. But I think it would complicate the API too much to use this way of transferring the timestamp.

watson · 2018-09-20T06:52:36Z

🤔 I might actually have suggested more than I can deliver. The UNIX epoch in JavaScript only gives you millisecond precision (Date.now()). Node.js does have a sub-millisecond precision timer, but it works based on time offset.

In Node.js it's represented as the number of nanoseconds since the process started. This way we can still work with the number in an IEEE 754 float as it's normally not too big. It also have the added benefit of not being affected by time-shift.

So I'm actually not sure how to get a UNIX epoch number (or a timestamp for that matter) with sub-millisecond precision at the moment. And if I'm not mistaken we would then have to deal with NTP changes, summer/winter time etc. I'd have to investigate further.

felixbarny · 2018-09-20T09:57:55Z

Before Java 9, there was also no built-in way to generate an epoch timestamp in microseconds. However, combining an epoch ms timestamp with a nanosecond precision timer should work. That makes the timestamp consistent in the same process. The timestamp of another process might even be shifted in the order of milliseconds, so no need to perfectly calibrate the nano timer.

If the platform only supports a millisecond timer and no nanosecond timer at all, just multiply the epoch millis by 1000.

@watson would an integer microsecond epoch timestamp work for nodejs?

felixbarny · 2018-09-20T10:03:24Z

In Java, I'd implement it like that:

public long getEpochMicros() {
    long epochMillis = System.currentTimeMillis();
    long nanoTime = System.nanoTime();
    return epochMillis * 1000 + nanoTime / 1000 % 1000;
}

roncohen · 2018-09-27T15:58:32Z

@sqren OK so the idea was that the UI should could use the timestamp.us field to draw the spans in the timeline regardless of whether the data came from RUM or from regular backends. Unfortunately, this will not work for old data, because old data does not have that field. In other words, the UI will probably need know how to use start and timestamp.us depending on what is available until we can remove start from the Elasticsearch documents in 7.0

sorenlouv · 2018-09-28T09:13:23Z

@roncohen That's a good point. Instead of handling both cases, can we perhaps until 7.0.0 just use @timestamp and span.start.us since they will both be available?

felixbarny · 2018-09-28T09:29:54Z

The problem is that currently, spans from the RUM agent will have the @timestamps set to the same value as their transaction, so I'm afraid we'll need to consider the offset, if present.

sorenlouv · 2018-09-28T09:51:55Z

The problem is that currently, spans from the RUM agent will have the @timestamps set to the same value as their transaction, so I'm afraid we'll need to consider the offset, if present.

So you agree with #1340 (comment), right?

To make it clear how I'm proposing to calculate the absolute timestamps for transactions and spans:

Until 7.0.0:
For transactions: @timestamp
For spans: Parent transaction @timestamp + span.start.us

After 7.0.0:
For transactions: timestamp.us
For spans: timestamp.us

(I'm aware that @timestamp is in ms and span.start.us is in us)

roncohen · 2018-09-28T09:56:16Z

this means agents will need to keep sending span.start, right? AFAIK, this discussion was partly motivated by the fact that sending start complicated the go agent (ping @axw).

axw · 2018-09-28T09:58:58Z

Until 7.0.0:
For transactions: @timestamp
For spans: Parent transaction @timestamp + span.start.us

I was under the impression that starting from 6.5 we would store @timestamp and timestamp.us, and the issue Ron was raising is that data from an upgraded 6.4 will still have the transaction timestamp and span offset.

this means agents will need to keep sending span.start, right? AFAIK, this discussion was partly motivated by the fact that sending start complicated the go agent (ping @axw).

Right. Not overly complicated, but it changes the exposed API.

felixbarny · 2018-09-28T10:02:25Z

I'd to:

Until 7.0.0:
For transactions: timestamp.us if available, otherwise @timestamp
For spans: timestamp.us if available, otherwise Parent transaction @timestamp + span.start.us

After 7.0.0:
For transactions: @timestamp
For spans: @timestamp

In 7.0, the timestamps are able to hold up to nanosecond precision, so we can remove timestamp.us

this means agents will need to keep sending span.start, right?

To be compatible with older agent version, span.start should be supported. Newer versions can just send a micros epoch timestamp instead of span.start.

sorenlouv · 2018-09-28T10:25:47Z

@roncohen @axw @felixbarny Okay, you all make sense to me now :)
I'm good with @felixbarny suggestion (#1340 (comment)).

One minor change: From @simitt I can understand that pre 6.5, apm-server would set the span's @timestamp to the same as the transaction's@timestamp. Therefore:

Until 7.0.0:
For transactions: timestamp.us if available, otherwise @timestamp
~~For spans: timestamp.us if available, otherwise Parent transaction @timestamp + span.start.us~~
For spans: timestamp.us if available, otherwise @timestamp (of span) + span.start.us

After 7.0.0:
For transactions: @timestamp
For spans: @timestamp

@roncohen I'm assuming that apm-server will store spans and transactions with timestamp.us regardless if they came over v1 and v2. Is that correct?

roncohen · 2018-09-28T11:48:02Z

otherwise @timestamp (of span) + span.start.us

That's not how it works in v2 at the moment. In V2, we'll just take whatever the agent sent.
I think we'll need to stick to "otherwise Parent transaction @timestamp + span.start.us" when timestamp.us is not available. I guess that could make things quite complicated?

I'm assuming that apm-server will store spans and transactions with timestamp.us

we can certainly do that.

sorenlouv · 2018-09-28T12:00:52Z

That's not how it works in v2 at the moment. In V2, we'll just take whatever the agent sent.

That's not how I understood it from @simitt .
@simitt can you chime in please :)

I guess that could make things quite complicated?

Yes, it would make it more complex so I would prefer if I could get the transaction timestamp from the span itself.

simitt · 2018-09-28T12:08:55Z

From my understanding we have 3 cases to distinguish:

(1) old data (ingested with server <= 6.5): the only case where no timestamp.us is set. So far the server always sets the span.timestamp=transaction.timestamp, see setting timestamp from transaction and timestamp not set when decoding span. Therefore it is safe to use the spans @timestamp for calculation.

(2) v1 (server >= 6.5): agents are not sending timestamp in microsecond precision, but the server can still store the timestamp.us with padding microseconds with 0s.

(3) v2 (server >= 6.5): server will set timestamp.us as previously discussed with full precision.

roncohen · 2018-09-28T12:17:20Z

good point @simitt. Sounds like we have a plan 👍

felixbarny · 2018-10-01T17:06:05Z

@watson I think your approach for the microsecond epoch timestamp does not fully work out. The problem is that the nanosecond-precision timestamp can't be used to measure wallclock time. NTP adjustments or leap seconds since the start of the application would not be taken into account. So I'm afraid there is no other option than to get both the epoch millis timestamp as well as the nano timer to compute an estimate of the epoch micros, like that: #1340 (comment).

felixbarny · 2018-10-02T07:36:27Z

Turns out if done right, it's a bit more complicated. During a transaction, there should not be clock drifts. Consider this scenario:

startTransaction(getEpochMicros());
// NTP update sets clock 5 sec behind
startSpan(getEpochMicros());
// start timestamp of span is before start timestamp of transaction!
stopSpan(getEpochMicros());
stopTransaction(getEpochMicros());
// duration = stopEpochMicros - startEpochMicros;
// duration can be negative!

Besides NTP updates, leap seconds can also drift the wallclock time. The general problem is that wallclock time is not suitable for measuring durations.

Therefore, the clock has to be initialized the way @watson proposed here #1340 (comment) for every transaction. Subsequent spans use the same instance of the transaction-scoped clock which measures elapsed time instead of wallclock time. In other words, use wallclock time to initialize a clock which advances in elapsed time as opposed to wallclock time for each transaction and its child spans.

If a span is started after the transaction has ended, there might be no other way than to re-initialize the clock for this span.

watson · 2018-10-02T09:34:26Z

Very good points @felixbarny. For Node.js I think we can share an instance of a timer object between spans and transactions. We can initialize it when the transaction starts and then reuse it for all subsequent spans

- Prepares for elastic/apm-server#1340 - Includes guards against clock drift

felixbarny · 2018-10-02T16:57:51Z

For Node.js I think we can share an instance of a timer object between spans and transactions. We can initialize it when the transaction starts and then reuse it for all subsequent spans

That's what I also ended up doing in the Java agent: elastic/apm-agent-java#232

- Prepares for elastic/apm-server#1340 - Includes guards against clock drift

roncohen · 2018-10-07T23:30:37Z

closed by #1413

axw mentioned this issue Sep 3, 2018

elasticapm: set model.Span.Start elastic/apm-agent-go#197

Merged

simitt added [zube]: Ready and removed [zube]: Ready labels Sep 4, 2018

alvarolobato added [zube]: Backlog and removed [zube]: Inbox labels Sep 10, 2018

roncohen mentioned this issue Sep 10, 2018

Possible data issues due to timestamp being optional. #1314

Closed

felixbarny mentioned this issue Sep 19, 2018

[v2] Require dt ids for errors if one is given. #1398

Merged

roncohen mentioned this issue Oct 1, 2018

[v2] accept unix epoch timestamps in microseconds #1413

Merged

alvarolobato added [zube]: Ready and removed [zube]: Backlog labels Oct 2, 2018

felixbarny added a commit to felixbarny/apm-agent-java that referenced this issue Oct 2, 2018

Use microsecond epoch timestamps

f37a89b

- Prepares for elastic/apm-server#1340 - Includes guards against clock drift

felixbarny mentioned this issue Oct 2, 2018

Use microsecond epoch timestamps elastic/apm-agent-java#232

Merged

felixbarny added a commit to elastic/apm-agent-java that referenced this issue Oct 4, 2018

Use microsecond epoch timestamps (#232)

2e64fbf

- Prepares for elastic/apm-server#1340 - Includes guards against clock drift

roncohen closed this as completed Oct 7, 2018

zube bot added [zube]: Done and removed [zube]: In Progress labels Oct 7, 2018

alvarolobato added this to the 6.5 milestone Oct 26, 2018

jalvz removed the [zube]: Done label Nov 5, 2018

v2: Require either span.start or span.timestamp #1340

v2: Require either span.start or span.timestamp #1340

Comments

simitt commented Aug 30, 2018 • edited by beniwohli Loading

hmdhk commented Aug 30, 2018

axw commented Aug 30, 2018

axw commented Aug 30, 2018

felixbarny commented Aug 30, 2018

formgeist commented Aug 30, 2018

sorenlouv commented Aug 30, 2018 • edited Loading

watson commented Aug 30, 2018

mikker commented Aug 30, 2018

simitt commented Aug 30, 2018

sorenlouv commented Aug 30, 2018

Qard commented Aug 30, 2018

watson commented Aug 30, 2018

alvarolobato commented Sep 10, 2018

felixbarny commented Sep 13, 2018

sorenlouv commented Sep 19, 2018 • edited Loading

watson commented Sep 19, 2018 • edited Loading

ISO 8601 format

Integer/Float format

axw commented Sep 20, 2018

watson commented Sep 20, 2018

watson commented Sep 20, 2018 • edited Loading

felixbarny commented Sep 20, 2018

felixbarny commented Sep 20, 2018

roncohen commented Sep 27, 2018

sorenlouv commented Sep 28, 2018 • edited Loading

felixbarny commented Sep 28, 2018

sorenlouv commented Sep 28, 2018

roncohen commented Sep 28, 2018

axw commented Sep 28, 2018 • edited Loading

felixbarny commented Sep 28, 2018

sorenlouv commented Sep 28, 2018

roncohen commented Sep 28, 2018

sorenlouv commented Sep 28, 2018

simitt commented Sep 28, 2018

roncohen commented Sep 28, 2018

felixbarny commented Oct 1, 2018

felixbarny commented Oct 2, 2018

watson commented Oct 2, 2018

felixbarny commented Oct 2, 2018

roncohen commented Oct 7, 2018

v2: Require either `span.start` or `span.timestamp` #1340

v2: Require either `span.start` or `span.timestamp` #1340

simitt commented Aug 30, 2018 •

edited by beniwohli

Loading

sorenlouv commented Aug 30, 2018 •

edited

Loading

sorenlouv commented Sep 19, 2018 •

edited

Loading

watson commented Sep 19, 2018 •

edited

Loading

watson commented Sep 20, 2018 •

edited

Loading

sorenlouv commented Sep 28, 2018 •

edited

Loading

axw commented Sep 28, 2018 •

edited

Loading