Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

time: clock drift on Windows 2008r2 w/ version >= 1.9 #24489

Closed
dennisdupont opened this issue Mar 22, 2018 · 11 comments
Closed

time: clock drift on Windows 2008r2 w/ version >= 1.9 #24489

dennisdupont opened this issue Mar 22, 2018 · 11 comments
Labels
FrozenDueToAge help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows
Milestone

Comments

@dennisdupont
Copy link

What version of Go are you using (go version)?

Occurs on v1.9 and above

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

windows/amd64
Issue occurs on win2008r2, but not on win2012 (tested) or win2016 (according to consul forum comments)
Issue occurs on domain attached or standalone server.

What did you do?

Clock drift was noticed on a deployed consul cluster (see hashicorp/consul#3925 for excruciating details). Determined it started with consul v0.9.3 and existed in the latest. That was when they switched to go v1.9. So we downgraded to v0.9.2 and problem disappeared.

The major applicable change in go 1.9 seemed to be the monotonic clock changes, so I experimented with the go version. If consul v0.9.3 is built with go 1.8 the problem also does not exist.

With help from a consul contributor we were able to create a small snippet to reproduce the issue:
https://play.golang.org/p/4y79262HSrJ

The clock drift is measured the same way as with the production servers, running w32tm:
>w32tm /stripchart /computer:10.60.1.25 /dataonly /samples:100
As soon as the test starts running you can see drift.

What did you expect to see?

A stable clock

What did you see instead?

Significant clock drift
Here is an example run:

C:\Users\Administrator>w32tm /stripchart /computer:208.88.126.235 /dataonly /samples:100
Tracking 208.88.126.235 [208.88.126.235:123].
Collecting 100 samples.
The current time is 3/22/2018 9:45:25 AM.
09:45:25, +00.0104974s
09:45:27, +00.0085572s
09:45:29, +00.0080007s
09:45:31, +00.0022288s
09:45:33, +00.0070934s
09:45:35, -00.0778244s <== test started
09:45:38, -00.1392391s
09:45:40, -00.3150037s
09:45:42, -00.4225186s
09:45:44, -00.4935759s
09:45:46, -00.6112448s
09:45:48, -00.7180814s
09:45:50, -00.8264958s
09:45:52, -00.9447071s
09:45:54, -01.0553810s <== over 1 second offset in ~20 seconds
09:45:56, -01.1570893s
09:45:58, -01.2324556s

This keeps growing until (S)NTP starts fighting the drift, but we have seen it as high as ~180 seconds, enough to cause kerberos auth failures.

@ALTree ALTree added OS-Windows NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Mar 22, 2018
@ALTree ALTree changed the title Clock drift on Windows 2008r2 w/ version >= 1.9 time: clock drift on Windows 2008r2 w/ version >= 1.9 Mar 22, 2018
@dgryski
Copy link
Contributor

dgryski commented Mar 22, 2018

@alexbrainman

@dennisdupont
Copy link
Author

dennisdupont commented Mar 22, 2018

There are a couple of things I tried against a 1.9 version (in order):

  • commented out osRelax and changed osinit to set _timeBeginPeriod to 1 (as in v1.8)
  • altered os_windows.go to use systime(), nanotime() and unixnano() implementation from 1.8
  • commented out the set _timeBeginPeriod in osinit

None of these fixed the issue (although I am a neophyte, so take it with a grain of salt).

@alexbrainman
Copy link
Member

09:45:54, -01.0553810s <== over 1 second offset in ~20 seconds

That's terrible Muriel.

Reading

hashicorp/consul#3925 (comment)

https://stackoverflow.com/questions/102064/clock-drift-on-windows

https://bugs.java.com/view_bug.do?bug_id=5005837

https://support.microsoft.com/en-us/help/821893/the-system-clock-may-run-fast-when-you-use-the-acpi-power-management-t

the only suggestion that comes to my mind is that staring from go1.9 we call timeBeginPeriod / timeEndPeriod much more often. Perhaps that makes your computer time drift. You can easily test that theory by changing osRelax function in runtime to do nothing.

commented out osRelax and changed osinit to set _timeBeginPeriod to 1 (as in v1.8)

Have you tried to do something like that:

diff --git a/src/runtime/os_windows.go b/src/runtime/os_windows.go
index 415ec0c..4295947 100644
--- a/src/runtime/os_windows.go
+++ b/src/runtime/os_windows.go
@@ -284,6 +284,8 @@ const osRelaxMinNS = 60 * 1e6
 // if we're already using the CPU, but if all Ps are idle there's no
 // need to consume extra power to drive the high-res timer.
 func osRelax(relax bool) uint32 {
+       return
+
        if relax {
                return uint32(stdcall1(_timeEndPeriod, 1))
        } else {

?

But I am not a real doctor. Perhaps @aclements will be more helpful.

Alex

@andybons andybons added this to the Go1.11 milestone Mar 26, 2018
@Carrotman42
Copy link

I haven't run the example playground, but looking at the code there are two places the code comments don't match the implementation (afaict). I can't speak to what that means for this issue; I just wanted to mention what I noticed.

There's a "default" branch in the select where you're supposed to be waiting for that timer to send something down the channel. This causes the select to not block, and you won't wait for the tick.

At the end of the for loop it says you're trying to clear the timers. I suggest setting the slice to nil instead: setting it to timers[:] unfortunately doesn't do anything useful.

@as
Copy link
Contributor

as commented Mar 28, 2018

Comment doesn't match the code.

// Clear timers slice
timers = timers[:]

Should be

timers = timers[:0]

Additionally, this program seems to create an ever-increasing amount of timers that it iterates through checking for done-ness. As I now see is already stated by @Carrotman42, the default case is triggered if the uncleared value is checked or the timer hasn't fired yet, if the timer hasn't fired yet, it likely gets read on the next iteration and contains the stale timestamp.

@alexbrainman
Copy link
Member

Comment doesn't match the code.

True, but that still does not explain computer clock drift.

@dennisdupont could you change your program to remove lines 30-31 and replace line 38 with timers = timers[:0], and tell us if that makes any difference to computer clock.

Thank you.

A;ex

@aclements
Copy link
Member

I may be confused here, but surely this can't be anything but a Windows kernel bug (not saying that Go isn't tickling it)? No user process should be capable of causing the system clock to skew.

If I were to guess what would trigger a kernel timekeeping issue, I would definitely start with the timeBeginPeriod/timeEndPeriod. But Go definitely isn't the only thing that's constantly switching the time period. Simply retrieving the time would be far down my list of suspects, since that doesn't even enter the kernel except on Wine.

How loaded is the system? Could it be thrashing so bad that it causes huge delays? Do we trust w32tm's report?

@dennisdupont
Copy link
Author

@aclements - the system I test on is not loaded at all, very idle. I have been using w32tm on quite a few servers around the center and pretty sure it is accurate. Also the drift's are consistent with the visible clocks (remote windows vs my desktop vs other servers, etc.)

@dennisdupont
Copy link
Author

@ianlancetaylor - This was marked for a couple of milestones, but I don't see any comments regarding a root cause or solution. Seems it also has been referenced by a couple of others (albeit one was on win2003).

@ianlancetaylor
Copy link
Member

@dennisdupont I don't think anyone knows. Like @aclements , this seems to me like a Windows kernel bug. I don't see how simply fetching the time could cause clock drift.

I also don't see other reports of this problem. If this only affects a ten year old version of Windows, then the reality is that while we would be happy to accept a fix we're unlikely to develop a fix ourselves.

@ianlancetaylor ianlancetaylor modified the milestones: Go1.12, Go1.13 Dec 11, 2018
@andybons andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019
@rsc rsc modified the milestones: Go1.14, Backlog Oct 9, 2019
@seankhliao
Copy link
Member

Windows 2008 R2 support will be removed in 1.21 #57003

@seankhliao seankhliao closed this as not planned Won't fix, can't repro, duplicate, stale Jan 28, 2023
@golang golang locked and limited conversation to collaborators Jan 28, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows
Projects
None yet
Development

No branches or pull requests