-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
time: clock drift on Windows 2008r2 w/ version >= 1.9 #24489
Comments
There are a couple of things I tried against a 1.9 version (in order):
None of these fixed the issue (although I am a neophyte, so take it with a grain of salt). |
That's terrible Muriel. Reading hashicorp/consul#3925 (comment) https://stackoverflow.com/questions/102064/clock-drift-on-windows https://bugs.java.com/view_bug.do?bug_id=5005837 the only suggestion that comes to my mind is that staring from go1.9 we call timeBeginPeriod / timeEndPeriod much more often. Perhaps that makes your computer time drift. You can easily test that theory by changing osRelax function in runtime to do nothing.
Have you tried to do something like that: diff --git a/src/runtime/os_windows.go b/src/runtime/os_windows.go
index 415ec0c..4295947 100644
--- a/src/runtime/os_windows.go
+++ b/src/runtime/os_windows.go
@@ -284,6 +284,8 @@ const osRelaxMinNS = 60 * 1e6
// if we're already using the CPU, but if all Ps are idle there's no
// need to consume extra power to drive the high-res timer.
func osRelax(relax bool) uint32 {
+ return
+
if relax {
return uint32(stdcall1(_timeEndPeriod, 1))
} else { ? But I am not a real doctor. Perhaps @aclements will be more helpful. Alex |
I haven't run the example playground, but looking at the code there are two places the code comments don't match the implementation (afaict). I can't speak to what that means for this issue; I just wanted to mention what I noticed. There's a "default" branch in the select where you're supposed to be waiting for that timer to send something down the channel. This causes the select to not block, and you won't wait for the tick. At the end of the for loop it says you're trying to clear the timers. I suggest setting the slice to nil instead: setting it to |
Comment doesn't match the code.
Should be
Additionally, this program seems to create an ever-increasing amount of timers that it iterates through checking for done-ness. As I now see is already stated by @Carrotman42, the default case is triggered if the uncleared value is checked or the timer hasn't fired yet, if the timer hasn't fired yet, it likely gets read on the next iteration and contains the stale timestamp. |
True, but that still does not explain computer clock drift. @dennisdupont could you change your program to remove lines 30-31 and replace line 38 with Thank you. A;ex |
I may be confused here, but surely this can't be anything but a Windows kernel bug (not saying that Go isn't tickling it)? No user process should be capable of causing the system clock to skew. If I were to guess what would trigger a kernel timekeeping issue, I would definitely start with the timeBeginPeriod/timeEndPeriod. But Go definitely isn't the only thing that's constantly switching the time period. Simply retrieving the time would be far down my list of suspects, since that doesn't even enter the kernel except on Wine. How loaded is the system? Could it be thrashing so bad that it causes huge delays? Do we trust w32tm's report? |
@aclements - the system I test on is not loaded at all, very idle. I have been using w32tm on quite a few servers around the center and pretty sure it is accurate. Also the drift's are consistent with the visible clocks (remote windows vs my desktop vs other servers, etc.) |
@ianlancetaylor - This was marked for a couple of milestones, but I don't see any comments regarding a root cause or solution. Seems it also has been referenced by a couple of others (albeit one was on win2003). |
@dennisdupont I don't think anyone knows. Like @aclements , this seems to me like a Windows kernel bug. I don't see how simply fetching the time could cause clock drift. I also don't see other reports of this problem. If this only affects a ten year old version of Windows, then the reality is that while we would be happy to accept a fix we're unlikely to develop a fix ourselves. |
Windows 2008 R2 support will be removed in 1.21 #57003 |
What version of Go are you using (
go version
)?Occurs on v1.9 and above
Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?windows/amd64
Issue occurs on win2008r2, but not on win2012 (tested) or win2016 (according to consul forum comments)
Issue occurs on domain attached or standalone server.
What did you do?
Clock drift was noticed on a deployed consul cluster (see hashicorp/consul#3925 for excruciating details). Determined it started with consul v0.9.3 and existed in the latest. That was when they switched to go v1.9. So we downgraded to v0.9.2 and problem disappeared.
The major applicable change in go 1.9 seemed to be the monotonic clock changes, so I experimented with the go version. If consul v0.9.3 is built with go 1.8 the problem also does not exist.
With help from a consul contributor we were able to create a small snippet to reproduce the issue:
https://play.golang.org/p/4y79262HSrJ
The clock drift is measured the same way as with the production servers, running w32tm:
>w32tm /stripchart /computer:10.60.1.25 /dataonly /samples:100
As soon as the test starts running you can see drift.
What did you expect to see?
A stable clock
What did you see instead?
Significant clock drift
Here is an example run:
C:\Users\Administrator>w32tm /stripchart /computer:208.88.126.235 /dataonly /samples:100
Tracking 208.88.126.235 [208.88.126.235:123].
Collecting 100 samples.
The current time is 3/22/2018 9:45:25 AM.
09:45:25, +00.0104974s
09:45:27, +00.0085572s
09:45:29, +00.0080007s
09:45:31, +00.0022288s
09:45:33, +00.0070934s
09:45:35, -00.0778244s <== test started
09:45:38, -00.1392391s
09:45:40, -00.3150037s
09:45:42, -00.4225186s
09:45:44, -00.4935759s
09:45:46, -00.6112448s
09:45:48, -00.7180814s
09:45:50, -00.8264958s
09:45:52, -00.9447071s
09:45:54, -01.0553810s <== over 1 second offset in ~20 seconds
09:45:56, -01.1570893s
09:45:58, -01.2324556s
This keeps growing until (S)NTP starts fighting the drift, but we have seen it as high as ~180 seconds, enough to cause kerberos auth failures.
The text was updated successfully, but these errors were encountered: