-
Notifications
You must be signed in to change notification settings - Fork 231
Side effects of imprecise clock #192
Comments
Nice catch. Are you using java client that reports jaeger-thrift model (recently released)? It should not have the problem with "ip" tag. The behavior of clock skew adjustment aside, the real issue seems to be the difference in precision between startTime and duration. Unfortunately, I don't know of a good way to get accurate startTime precision. The System.nanoSeconds() we're using for duration is not an offset from epoch, but from some random point in time, so not suitable for startTime. |
Yes, I use the thrift Sender - As for the clock - I'm not sure. I'd consider keeping offset between currentTimeMillis and nanoTime and use nanoTime + offset, but the offset is not fixed - ntpd adjusts the clock, there might be leap seconds. So it should be updated from time to time... It might be tricky to get it right. |
I mean v0.19.0 of the client - it sends |
Yes, this one. |
@mabn is it actually necessary to adjust the offset? The main requirements for the trace are:
It does not matter if both startTime and endTime are "off" by some delta, clocks across processes are skewed anyway, and the ClockSkewAdjuster is expected to fix those. As long as nanoTime() is monotonic, it doesn't even seem to matter if it's consistently faster or slower than the wall clock time, since nanoTime() is defined as the correct way for measuring elapsed time. So would the following work?
|
That approach comes to mind, but I'm not sure if it works. My concern is that clock drifts and is adjusted from time to time - e.g. by nptd. The rate of drift changes - e.g it depends on the temperature but in the cloud environment there are additional factors caused by virtualization. There are also edge cases like leap seconds. The approach you mention basically records the offset between On linux It would be good to test it, but for example POSIX allows CLOCK_MONOTONIC frequency to vary by up to 500ppm (8ms/day). If the offset is not constant it might lead to significant inaccuracies in start_time for long running processes - which is the normal case for services. So it gets tricky and OS-specific. I'd rather see one of following solutions:
I also remember some discussion about implementing better time source in some opentracing-related project, but I wasn't able to find it. |
yes, but it still remains monotonic, the adjustments are probably done as smearing of the delta over some larger time interval, making the change unnoticeable to the application.
Now sure how big of a problem this is. In some cases the system's HW timer can be so out of whack that measuring precise elapsed time for tracing is probably not the biggest problem. And for long running processes, 8ms error per day is nothing, and again we'd expect the system to have a proper NTP configuration to compensate for the frequency drift: "This clock's frequency might be adjusted in a PLL control loop once an external reference (NTP, GPS, etc.) has been available long enough to measure the ±500 ppm frequency error and instability of typical motherboard oscillators." So my preference would be to start with this simplified approach. Of the two other approaches you listed, I think the first one is doable, but certainly more complicated, while the second may still result in unaccounted jumps and bad timing for short spans. |
btw, thanks for the references. |
btw. java 9 should provide nanoseconds accuracy: |
Lightstep tracer is doing something funky with time: https://github.com/lightstep/lightstep-tracer-java/blob/master/common/src/main/java/com/lightstep/tracer/shared/ClockState.java#L86 |
I did this: |
@mabn we're thinking of going with your option 1 where the millis wallclock timestamp is captured in the SpanContext and all other timestamps are calculated as offsets using nanos() (e.g. jaegertracing/jaeger-client-node#122 (comment)) Your gist is a simpler version, which might be good enough. I wouldn't expect it to log the differences unless your machine's HW timer is seriously faster or slower than a true clock AND you have ntpd adjustments happening. |
I'm pretty sure that a leap second will make my clock 1-second off until application restart. Same with any manual (non-ntpd) adjustments. |
Doesn't the code in the Gist require Java 8? Or is it about a different clock? https://docs.oracle.com/javase/8/docs/api/java/time/Clock.html I'd be curious to see if this behavior would also be present for recent JVMs (8+), as date/time manipulation is really better and more precise with the new APIs. |
@jpkrohling It's not JDK8 Clock, it's Yes, this behaviour it's present in JDK8 and it will also be present in newer ones because On JDK9+ it might be a good idea to simply use system clock - it has now sufficient precision due to https://bugs.openjdk.java.net/browse/JDK-8068730 - Increase the precision of the implementation of java.time.Clock.systemUTC(). |
@olivercf this appears to be a separate, off-topic question. How is it related to the clocks? If it's not, please move to another issue. |
@yurishkuro The start times appear to snapping/rounding to incorrect times as described in the initial message in this issue:
As you can see from my first post, some of the children are 0ms, and the parent is relatively long. All of the children should be contained within the parent in my example, as the parent is a for loop and the children are each iteration, however it seems all but the first child has been snapped to a particular start time, resulting in a completely incorrect graph. You can see some children's finish times are after the parent's. As they happen consecutively, you'd expect each child's start time to be the finish time of the last. The first post mentions that if the "ip" tag is the same, the clock skew adjuster shouldn't kick in, but it looks like it does in my case. It seems that is relevant information for this issue, and I'm also wondering if there is a workaround for the meantime until it is fixed. |
@olivercf what you are experiencing is because your spans are very short (sub 1ms) but Java system clock has 1ms precision so the start time is truncated and they all seem to start at the same time. The workaround I use is this clock implementation: |
@mabn Thanks, that worked. |
I found an interesting problem that took some time to track down.
Basically spans in the jaeger UI were "centered" in a weird way. Initially I thought that it's some UI issue, but after comparing the json returned by jaeger-query with cassandra it turned out that jaeger-query completely messes span start times. And by completely I mean that span taking 1ms was shifted by 3ms. Here's how it looks like:
In this example "child 2" should happen as the last span inside the "parent" - should be stuck to the right edge of the parent. There's no delay because all those spans are reported by the same process.
After digging through the jaeger code I found
ClockSkew
and there are two problems here:ClockSkew
adjust span start times if they overlap with their parent. It does it only if parent and child are reported by different hosts, but the host is taken from "ip" tag of the process. If it's missing the default is to assume that these are different hosts.jaeger-client-java by default uses
SystemClock
which has millisecond precision, but timestamps reported to jaeger are in microseconds. To convert the time to micros it multiplies it by 1000:One can think of it as rounding down exact start time to milliseconds.
Span duration on the other hand is calculated with more precise
System.nanoTime
.As a result of this rounding it's possible that:
for children which end at the same time as the parent it happens very often.
So - child span end time exceeds parent's finish time by between 0 and 1000 micros which in combination with missing process "ip" tag triggers clock skew adjuster. And for very long parents and very short children the adjustment is drastic (around 50% of parent duration).
There are two options:
I guess that the default make sense so better clock implementation would be a good improvement.
The text was updated successfully, but these errors were encountered: