-
Notifications
You must be signed in to change notification settings - Fork 527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v2: Require either span.start
or span.timestamp
#1340
Comments
A couple of considerations:
|
We can't really do this, turns out. I misunderstood, span's timestamp is meant to be exactly the same as the transaction timestamp. It should probably be called transaction_timestamp. Per @watson:
|
Discussion still ongoing. A couple of counterpoints:
We'll take this to the weekly meeting to flesh it out. |
From a chat conversation with @sqren
|
Currently the UI uses both transaction id and timestamp to query for spans but as mentioned by @felixbarny we can remove the timestamp constraint, and then it shouldn't make a big difference. If the timestamp is missing, I expect that I can calculate it by adding transaction Btw. am I the only one who would prefer |
I agree |
|
@sqren I suggest to avoid changes in ES naming if not necessary in minor releases. We can rename the attributes at the Intake API level. |
That's fair :) |
How would offset-based times work in a distributed trace? We don't have a way to pass the outer transaction start time into inner services, so I guess the services would need an inner transaction start time to work with? 🤔 |
@Qard It's offset from the start time of the transaction in the same service only |
Until elastic/apm-server#1340 is resolved, keep setting span.start. The way we set it is unsound, assuming the transaction timestamp cannot change after construction. If we do not go ahead with making span.start optional, we'll have to adjust the API.
We could accept epoch micros or an ISO timestamp with microsecond precision. The |
Writeup from the meeting today: Problem Query performance Ordering of transactions and spans Suggestion for ES data model Changes for agents and apm-server Since the RUM agent cannot provide a reliable absolute timestamp it will continue sending spans with relative offset ( Cons:
Pros:
|
ISO 8601 formatFor reference, this is what the ISO 8601 timestamp spec have to say about adding fraction precision to the timestamp:
So to achieve microsecond precision, the timestamp sent from the agents should look like this:
E.g. a timestamp for 2018-09-19 at 21 minutes, 33 seconds, 912421 microseconds past 20, would look like this:
Integer/Float formatI would however prefer if we could switch to transmit the timestamp as a UNIX epoch float or integer value instead (we should choose one or the other of course). We have 3 options for how to encode the timestamp:
I think many agents would prefer to transmit the timestamp in some form of number format because:
|
+1 to number since epoch. I'd prefer an integer representing nanos just because that's how Go stores things, but I suppose that'll be an issue for JS? |
@axw Yes unfortunately. All numbers in JavaScript are actually IEEE 754 floats. So the largest integer that can safely be represented is 9007199254740991 which would not allow for a nanosecond epoch. Side note: BigInt support was just added to the language to combat this limitation, but not all browsers and versions of Node.js support it, and to my knowledge the built in JSON serializer doesn't work with it either. The way we normally get around this limitation is to store the number as two integers in a tuple: |
🤔 I might actually have suggested more than I can deliver. The UNIX epoch in JavaScript only gives you millisecond precision ( In Node.js it's represented as the number of nanoseconds since the process started. This way we can still work with the number in an IEEE 754 float as it's normally not too big. It also have the added benefit of not being affected by time-shift. So I'm actually not sure how to get a UNIX epoch number (or a timestamp for that matter) with sub-millisecond precision at the moment. And if I'm not mistaken we would then have to deal with NTP changes, summer/winter time etc. I'd have to investigate further. |
Before Java 9, there was also no built-in way to generate an epoch timestamp in microseconds. However, combining an epoch ms timestamp with a nanosecond precision timer should work. That makes the timestamp consistent in the same process. The timestamp of another process might even be shifted in the order of milliseconds, so no need to perfectly calibrate the nano timer. If the platform only supports a millisecond timer and no nanosecond timer at all, just multiply the epoch millis by 1000. @watson would an integer microsecond epoch timestamp work for nodejs? |
In Java, I'd implement it like that: public long getEpochMicros() {
long epochMillis = System.currentTimeMillis();
long nanoTime = System.nanoTime();
return epochMillis * 1000 + nanoTime / 1000 % 1000;
} |
@sqren OK so the idea was that the UI should could use the |
@roncohen That's a good point. Instead of handling both cases, can we perhaps until 7.0.0 just use |
The problem is that currently, spans from the RUM agent will have the |
So you agree with #1340 (comment), right? To make it clear how I'm proposing to calculate the absolute timestamps for transactions and spans: Until 7.0.0: After 7.0.0: (I'm aware that |
this means agents will need to keep sending |
I was under the impression that starting from 6.5 we would store @timestamp and timestamp.us, and the issue Ron was raising is that data from an upgraded 6.4 will still have the transaction timestamp and span offset.
Right. Not overly complicated, but it changes the exposed API. |
I'd to: Until 7.0.0: After 7.0.0: In 7.0, the timestamps are able to hold up to nanosecond precision, so we can remove
To be compatible with older agent version, |
@roncohen @axw @felixbarny Okay, you all make sense to me now :) One minor change: From @simitt I can understand that pre 6.5, apm-server would set the span's Until 7.0.0: After 7.0.0: @roncohen I'm assuming that apm-server will store spans and transactions with |
That's not how it works in v2 at the moment. In V2, we'll just take whatever the agent sent.
we can certainly do that. |
That's not how I understood it from @simitt .
Yes, it would make it more complex so I would prefer if I could get the transaction timestamp from the span itself. |
From my understanding we have 3 cases to distinguish: (1) old data (ingested with server <= 6.5): the only case where no (2) v1 (server >= 6.5): agents are not sending timestamp in microsecond precision, but the server can still store the (3) v2 (server >= 6.5): server will set |
good point @simitt. Sounds like we have a plan 👍 |
@watson I think your approach for the microsecond epoch timestamp does not fully work out. The problem is that the nanosecond-precision timestamp can't be used to measure wallclock time. NTP adjustments or leap seconds since the start of the application would not be taken into account. So I'm afraid there is no other option than to get both the epoch millis timestamp as well as the nano timer to compute an estimate of the epoch micros, like that: #1340 (comment). |
Turns out if done right, it's a bit more complicated. During a transaction, there should not be clock drifts. Consider this scenario: startTransaction(getEpochMicros());
// NTP update sets clock 5 sec behind
startSpan(getEpochMicros());
// start timestamp of span is before start timestamp of transaction!
stopSpan(getEpochMicros());
stopTransaction(getEpochMicros());
// duration = stopEpochMicros - startEpochMicros;
// duration can be negative! Besides NTP updates, leap seconds can also drift the wallclock time. The general problem is that wallclock time is not suitable for measuring durations. Therefore, the clock has to be initialized the way @watson proposed here #1340 (comment) for every transaction. Subsequent spans use the same instance of the transaction-scoped clock which measures elapsed time instead of wallclock time. In other words, use wallclock time to initialize a clock which advances in elapsed time as opposed to wallclock time for each transaction and its child spans. If a span is started after the transaction has ended, there might be no other way than to re-initialize the clock for this span. |
Very good points @felixbarny. For Node.js I think we can share an instance of a timer object between spans and transactions. We can initialize it when the transaction starts and then reuse it for all subsequent spans |
- Prepares for elastic/apm-server#1340 - Includes guards against clock drift
That's what I also ended up doing in the Java agent: elastic/apm-agent-java#232 |
- Prepares for elastic/apm-server#1340 - Includes guards against clock drift
closed by #1413 |
A span's
timestamp
is an absolute value when the span has been captured, andstart
the offset relative to the transactions start.At the moment
span.start
is required, butspan.timestamp
is not required, as the RUM agent cannot set aspan.timestamp
.Except for the RUM agent all other agents do set the
span.timestamp
and thusspan.start
seems redundant there. Thus the suggestion is to require one of both fields to be set.Is this ok with all @elastic/apm-agent-devs ?
And is it ok with @elastic/apm-ui ?
The text was updated successfully, but these errors were encountered: