Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std: Force Instant::now() to be monotonic #56988

Merged
merged 1 commit into from
Jan 8, 2019

Conversation

alexcrichton
Copy link
Member

This commit is an attempt to force Instant::now to be monotonic
through any means possible. We tried relying on OS/hardware/clock
implementations, but those seem buggy enough that we can't rely on them
in practice. This commit implements the same hammer Firefox recently
implemented (noted in #56612) which is to just keep whatever the lastest
Instant::now() return value was in memory, returning that instead of
the OS looks like it's moving backwards.

Closes #48514
Closes #49281
cc #51648
cc #56560
Closes #56612
Closes #56940

@rust-highfive
Copy link
Collaborator

r? @Kimundi

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 19, 2018
@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:353f0f89:start=1545244467577054256,finish=1545244468694615649,duration=1117561393
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
Setting environment variables from .travis.yml
$ export IMAGE=x86_64-gnu-llvm-6.0
---
tidy check
[00:03:00] * 568 error codes
[00:03:00] * highest error code: E0721
[00:03:00] * 244 features
[00:03:00] tidy error: /checkout/src/libstd/time.rs:192: platform-specific cfg: cfg!(target_os = "macos")
[00:03:00] tidy error: /checkout/src/libstd/time.rs:193: platform-specific cfg: cfg!(target_os = "linux")
[00:03:00] tidy error: /checkout/src/libstd/time.rs:194: platform-specific cfg: cfg!(target_os = "linux")
[00:03:01] some tidy checks failed
[00:03:01] 
[00:03:01] 
[00:03:01] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/tidy" "/checkout/src" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "--no-vendor" "--quiet"
[00:03:01] 
[00:03:01] 
[00:03:01] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
[00:03:01] Build completed unsuccessfully in 0:00:45
[00:03:01] Build completed unsuccessfully in 0:00:45
[00:03:01] Makefile:79: recipe for target 'tidy' failed
[00:03:01] make: *** [tidy] Error 1
The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:262a6f76
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
Wed Dec 19 18:37:39 UTC 2018
---
travis_time:end:0cee41d8:start=1545244659972411142,finish=1545244659977875228,duration=5464086
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:0ad9f916
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb --batch -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:116c0f86
travis_time:start:116c0f86
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:09d86d7a
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@wesleywiser
Copy link
Member

Thanks for fixing this so quickly @alexcrichton!

let now = cmp::max(LAST_NOW, os_now);
LAST_NOW = now;
Instant(now)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A mutex seems a bit heavy-handed. Many uses of rdtsc (where available) are for minimal-overhead, thread-local timing of functions.
Wouldn't rdtsc + some atomic ops that prevent things from going backwards be much lighter than potential thread suspension?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that a full-on mutex is quite a heavy hammer for this use case! I wasn't sure though how to best to minimize the cost here.

The Windows documentation at least "strongly discourages" rdtsc for handling VM migration issues as well as some supposed hardware. If that's the case I think we probably want to avoid that?

I figured it'd probably be best to start from a conservative position and we can always come in later as necessary and try to use atomics and/or different tricks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't mean to use rdtsc directly. We can still defer to QueryPerformanceCounter/clock_gettime, i.e. the current Instant implementation which in the end boils down to rdtsc on many x86 systems.
Just tack on a sanity check/correction with atomics instead of a mutex.

I'm mostly concerned that thread-contention might unexpectedly hit people if they litter their code with instants because it used to be fast. I have no concrete examples, just experience with gratuitous use of timing functions that were fast on linux suddenly making an application slow on windows because it decided to not use rdtsc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I'm all for adding a more lightweight implementation, do you have one in mind? Instant has a varying size across platforms, which makes it difficult to select an appropriate atomic and/or more lightweight method

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had sizeof checks, some type punning and AtomicU128/U64 in mind. Beyond that it would be your standard read, check, CAS. similar to what you're now doing in the lock's critical section, except in a loop.

The mutex would still be needed as fallback if the checks don't work out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, that's what I thought, and yeah my worry about that is that it wouldn't solve the original problem of monotonic clocks going backwards, so I'm afraid we'd still end up a the solution proposed in this PR.

We, as far as I know, don't have a great handle on how big the errors are.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have thought some more about optimization potential

  1. we can use relaxed atomics everywhere. Justification: One thread cannot observe another thread's Instants without some external synchronization happening, e.g. other ordered loads and stores. So until those happen only intra-thread ordering is relevant, Relaxed is sufficient for that.
  2. in the good case we only need to do a test and return from the perspective of the main sequence of instructions. There's no dependency on the writes to global state happening, so this should be friendly to instruction parallelism.
  3. we can limit the XCHG loop by bailing out early if it fails because another thread updated it to a larger value than we are trying. it doesn't prevent the cache-line from bouncing around but at least can allow multiple threads to make progress simultaneously.

It could approximately look like this:

static mut LAST_NOW: AtomicU128 = 0.into();
let last_now = LAST_NOW.load(Relaxed);  // insert type punning here
let os_now = time::Instant::now();
if likely(os_now > last_now) {
  loop {
    match LAST_NOW.compare_exchange_weak(last_now, os_now, Relaxed, Relaxed) {
      Ok(_) => break,
      Err(x) if x >= os_now => break, // some other thread is ahead of us, no need to update
      _ => {}
    }
  return os_now;
}
return last_now

It's a bit smaller hammer but still not the rubber kind. To soften it further we either need a I (don't) care about broken systems switch somewhere or use better platform detection.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@the8472 yes that's all possible but 128-bit atomics are only available on a few platforms, so we can't use them generally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, that's what I thought, and yeah my worry about that is that it wouldn't solve the original problem of monotonic clocks going backwards, so I'm afraid we'd still end up a the solution proposed in this PR.

Wouldn't it still make sense to try the cheaper thread-local version first and switch to a full lock if it does turn out to be insufficient? If we directly go to the lock, then we will not be able to determine whether a cheaper thread-local variant would also have been sufficient. @the8472's theory that this arises only due to migration between cores at least sounds very plausible, so I think it would be worthwhile to try this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikic I think it's incorrect to avoid the full lock though? If the time is less than a thread-local version than you definitely have to acquire the lock, but even if it's greater than a thread local version you need to check the lock for the global one as well. Right now the bug primarily happens on one thread, but the documented guarantees of this API are that it works across all threads

@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:15d5c828:start=1545245110625471104,finish=1545245113003082919,duration=2377611815
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
Setting environment variables from .travis.yml
$ export IMAGE=x86_64-gnu-llvm-6.0
---
[00:06:03]    Compiling syntax_ext v0.0.0 (/checkout/src/libsyntax_ext)
[00:06:08] error: unused import: `Duration`
[00:06:08]   --> src/librustc/util/profiling.rs:15:17
[00:06:08]    |
[00:06:08] 15 | use std::time::{Duration, Instant};
[00:06:08]    |
[00:06:08]    = note: `-D unused-imports` implied by `-D warnings`
[00:06:08] 
[00:06:34] error: aborting due to previous error
---
[00:06:34] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "build" "--target" "x86_64-unknown-linux-gnu" "-j" "4" "--release" "--locked" "--color" "always" "--features" "" "--manifest-path" "/checkout/src/rustc/Cargo.toml" "--message-format" "json"
[00:06:34] expected success, got: exit code: 101
[00:06:34] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap build
[00:06:34] Build completed unsuccessfully in 0:03:46
[00:06:34] make: *** [all] Error 1
[00:06:34] Makefile:28: recipe for target 'all' failed
The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:0af64a28
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
Wed Dec 19 18:51:56 UTC 2018

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@Mark-Simulacrum
Copy link
Member

Is there a reason we're using the "internal" Mutex for the implementation though? I'd kind of expect that we could just use std::sync::Mutex?

@alexcrichton
Copy link
Member Author

@Mark-Simulacrum the sync::Mutex type doesn't have a const constructor, whereas the internal mutex type does

// * https://bugzilla.mozilla.org/show_bug.cgi?id=1487778 - a similar
// Firefox bug
//
// It simply seems that this it just happens so that a lot in the wild
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete 'this'?

@the8472
Copy link
Member

the8472 commented Dec 27, 2018

Maybe we should get a perf run on this for a system where actually_monotonic == false?

return Instant(os_now)
}

static LOCK: Mutex = Mutex::new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have dead-code elimination in MIR debug builds ? Otherwise LLVM-IR for this code will always be emitted independently of the result of actually_monotonic, and whether that code will end up generating machine code will depend on the optimization level.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't, now, but actually_monotonic is a function that'll be trivially inlined so LLVM will optimize this away

@alexcrichton
Copy link
Member Author

r? @sfackler

@rust-highfive rust-highfive assigned sfackler and unassigned Kimundi Jan 3, 2019
@sfackler
Copy link
Member

sfackler commented Jan 3, 2019

Hardware Is Bad

@bors r+

@bors
Copy link
Contributor

bors commented Jan 3, 2019

📌 Commit 4ce8d27b0d76157c3c02125c519c08a99c6ef4ed has been approved by sfackler

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 3, 2019
@bors
Copy link
Contributor

bors commented Jan 5, 2019

⌛ Testing commit 4ce8d27b0d76157c3c02125c519c08a99c6ef4ed with merge 1896be8145237e298d75c1b685fd2ae7dea733f5...

@bors
Copy link
Contributor

bors commented Jan 5, 2019

💔 Test failed - status-travis

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jan 5, 2019
@rust-highfive
Copy link
Collaborator

The job dist-various-2 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
[01:00:21]    Compiling panic_unwind v0.0.0 (/checkout/src/libpanic_unwind)
[01:00:21] [RUSTC-TIMING] panic_unwind test:false 0.280
[01:00:21] warning: dropping unsupported crate type `dylib` for target `x86_64-unknown-cloudabi`
[01:00:21] 
[01:00:24] error[E0599]: no function or associated item named `actually_monotonic` found for type `sys::cloudabi::time::Instant` in the current scope
[01:00:24]    --> src/libstd/time.rs:182:27
[01:00:24]     |
[01:00:24] 182 |         if time::Instant::actually_monotonic() {
[01:00:24]     |            |
[01:00:24]     |            |
[01:00:24]     |            function or associated item not found in `sys::cloudabi::time::Instant`
[01:00:24]     | 
[01:00:24]    ::: src/libstd/sys/cloudabi/time.rs:8:1
[01:00:24] 8   | pub struct Instant {
[01:00:24] 8   | pub struct Instant {
[01:00:24]     | ------------------ function or associated item `actually_monotonic` not found for this
[01:00:26] error: aborting due to previous error
[01:00:26] 
[01:00:26] For more information about this error, try `rustc --explain E0599`.
[01:00:26] [RUSTC-TIMING] std test:false 5.086
---
travis_time:end:0119c2a7:start=1546672709559221610,finish=1546672709566034009,duration=6812399
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:001f5858
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb --batch -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:01862410
travis_time:start:01862410
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:082b124f
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@kennytm kennytm added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Jan 5, 2019
@rust-highfive
Copy link
Collaborator

📣 Toolstate changed by #56988!

Tested on commit 2f19f8c.
Direct link to PR: #56988

🎉 rls on linux: test-fail → test-pass (cc @nrc @Xanewok, @rust-lang/infra).

rust-highfive added a commit to rust-lang-nursery/rust-toolstate that referenced this pull request Jan 8, 2019
Tested on commit rust-lang/rust@2f19f8c.
Direct link to PR: <rust-lang/rust#56988>

🎉 rls on linux: test-fail → test-pass (cc @nrc @Xanewok, @rust-lang/infra).
VardhanThigle pushed a commit to VardhanThigle/rust that referenced this pull request Jan 13, 2019
}
}

pub fn actually_monotonic() -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edelsonh Can you confirm ppc/ppc64 has a reliable monotonic clock?

@Forty-Bot
Copy link

Why not switch to CLOCK_MONOTONIC_RAW on linux? CLOCK_MONOTONIC is affected by "... the incremental adjustments performed by adjtime(3) and NTP," which is likely the cause of some of your monotonicity problems.

@sanxiyn
Copy link
Member

sanxiyn commented Jan 14, 2019

My understanding of adjtime is that they specifically do not cause any monotonicity problems. That is, adjtime can't be the cause. If we trust manual pages.

@Forty-Bot
Copy link

FWIW CLOCK_MONOTONIC is derived from CLOCK_REALTIME with an offset to keep it monotonic, and CLOCK_MONOTONIC_RAW is from a different tk_read_base.

@spacejam
Copy link

NTP can just yank a clock backwards of the higher stratum side if the delta is over 128ms. https://github.com/ntp-project/ntp/blob/stable/parseutil/dcfd.c#L1059

@Forty-Bot
Copy link

Forty-Bot commented Jan 14, 2019

Hm, that should fail with -EINVAL if the new offset is less than the current monotonic offset...

If anyone wants to investigate this further, I made a short program to test the different clocks on linux. Specifically, @IntrepidPig seems to be able to reproduce it.

@programmerjake
Copy link
Member

I know this is a little late, but an atomic umax would be better than the compare-exchange this currently uses. LLVM has the atomicrmw umax instruction and RISC-V and probably other architectures have an instruction for that. Using the atomic umax instruction allows the atomic operation to be executed wherever the memory is cached (TileLink has special support for that) instead of having to move the cached memory, saving time.

@vi
Copy link
Contributor

vi commented Jan 17, 2019

Does it affect cargo bench? Will it start benching a synchronisation primitive instead of actual code?

@programmerjake
Copy link
Member

It would have an atomic op either way.

@alexcrichton alexcrichton deleted the monotonic-instant branch January 18, 2019 00:53
@alexcrichton
Copy link
Member Author

@vi cargo bench does indeed use Instant::now(), and afaik measurements haven't been done to evaluate the impact of this.

@Saruspete
Copy link

Hello there,

It's a bit late, and I'm more on the system than dev side, but anyway here's some hints / events / configurations on x86 & Linux that may (or may not) help you work on this matter (to be aware of the bad things that can happen).

So for Linux, you have multiple timing functions (some are available since a more recent kernel)
Each of these have its caveat and the choice depends on your use case:

CS = Clock Source (incremental monotonic counter)
TO = Time offset (value to add to CS to get the real human time)
ADJ = Minor Adjustments to CS (ntp, ptp)
TZ = Timezone

Source Precision Get from Value
clock_gettime( CLOCK_REALTIME) ns vdso & syscall fallback CS + TO + ADJ
clock_gettime( CLOCK_REALTIME_COARSE) ms vdso CS + TO + ADJ
clock_gettime( CLOCK_MONOTONIC) ns vdso CS + ADJ
clock_gettime( CLOCK_MONOTONIC_COARSE) ms vdso CS + ADJ
clock_gettime( CLOCK_MONOTONIC_RAW) vdso CS
gettimeofday() us vsdo override CS + TO + ADJ + TZ
time() sec syscall compat over gettimeofday
Assembly RDTSC cpu base freq, eg 3GHz = 0.33ns memory read CS
Assembly RDTSCP cpu base freq memory read CS

On modern x86, most of the selection work is done in arch/x86/entry/vdso/vclock_gettime.c :: __vdso_clock_gettime().
vDSO aims to give faster than syscall results when available, using common values contained in the structure arch/x86/include/asm/vgtod.h:: vsyscall_gtod_data (for timing, among other things).
This structure and system-wide clocks are managed by timekeeper kernel' structure, located in time/timekeeper.c.
You may want to check the timekeeping_update() code (called by settimeofday, change_clocksource, and many other events) to update the system time values.
You can check/set the underlying timekeeper clocksource through /sys/devices/system/clocksource/clocksource0/current_clocksource.

Standard calls for time (so, the non-coarse) means to deliver precise timing, and calling for a time delta between the last timekeeper update and the effective function call. To speedup things, these standard calls also try to use direct hardware: vread_pvclock (para-virtualized) vread_hvclock (hyper-v), vread_tsc() or vendor-supplied timeeking in VMware
The coarse types will simply return the value of the last timekeeper update (which is defined by constant HZ of the kernel, most of the time 1000Hz). This means just 1 or 2 values to read, no syscall and time diff, so super fast... but also way less precise.

So, here's how I choose which one depending on the use-case:

  • REALTIME : precise wall-clock time (precise log, events timediff)
  • REALTIME_COARSE : simple wall-clock time (to the second or ms) like for system logs.
  • MONOTONIC : precise duration measurement (benchmark, security related...)
  • MONOTONIC_COARSE : Not used.
  • MONOTONIC_RAW : system-wide stable counter.
  • RDTSC : precise function timing (but not for micro-benchmark) on stable CPUs (avoid process migration and VM)
  • RDTSCP / RDTSC + [lm]fence : precise benchmark. The serializing instruction avoid CPU Reordering, that would screw the micro-benchmark results.

some pitfalls

  • Beware of "human time clocks", as they will include the Leap-Second special case. It'll also depend on the type of adjustment requested by sysadmin: let kernel step back, let the ntp daemon step back, slew the clock during the whole day, add a new second (23:59:60)
  • The 128ms clock limit for slowing the time is only valid for ntpd, but some other daemons does not have this limit, like chrony.
  • The RealTime Linux variant (whose many features were included upstream) also have special features, like a dedicated thread for doing the timekeeping, tickless CPUs, and other constraints relative to the handling of time. You might want to ask these guys, especially Thomas Gleixner and Steven Rostedt.
  • Some CPU might go offline, and the processes that were in his runqueue are migrated to another one.
  • the clock calls might get in unbound retry
  • Some very exotic x86 hardware, like Bullion or SuperDome are multiple independant systems / blades made into a single one. This might have implications on the timing measurement depending on the OS & version

Whatever the OS, at the lower level, timing may be found from multiple hardware sources:

TimeStampCounter (TSC)

An incremental counter based on each cpu, value read by RDTSC or RDTSCP
As it's CPU based, when your reading process is migrated from a socket to another, or when the underlying hardware changes (like a Virtual Machine), the TSC value may have a very different value.
Some other scenarios can also lead to issues when using it (+ cpuflags to check if the feature is safe)

  • CPU provides frequency-scaling (PState) or CPU Sleep States (CState) to dynamically adjust frequency and/or power-saving, as long as CPU is kept in ACP-S0 (cpuflag: constant_tsc).
  • TSC will stop and reset upon standby (S3) or hybernation (S4) (cpuflag: nonstop_tsc).
  • The underlying Hypervisor will stop the VM and resume it, like for motion, snapshot, etc...

High Precision Event Timer (HPET)

It's composed of a single counter (fixed frequency), and many comparators that will generate an interrupt when the counter reaches a value they are waiting for. The counter frequency should just be higher than 10MHz.
This have multiple issues, like time drift, skew, missed interrupts,

Real Time Clock (RTC)

Well, not used anymore...

I know you have to cover multiple arches, and OS types, and OS versions, so always keep a skeptical eye when you implement a low-level feature :)

A feature I do on all language I use: a timing function/class that asks for what I intend to measure, and choose the right function for me. Maybe you can integrate it directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.