-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
time: Timer reset broken under heavy use since go1.16 timer optimizations added #47329
Comments
Thanks for the good test case and analysis. |
@gopherbot Please open backport to 1.16. This bug also exists in 1.16. It can cause programs that use |
Backport issue(s) opened: #47332 (for 1.16). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases. |
Change https://golang.org/cl/336432 mentions this issue: |
Fixes #26205 Fixes deadlocks in heartbeat scheduler. No changelog needed because this bug only exists in go 1.16+ and v7.14.0 is the first affected beats version The underlying cause is a likely bug in golang, reported here by myself here: golang/go#47329
Fixes #26205 Fixes deadlocks in heartbeat scheduler. No changelog needed because this bug only exists in go 1.16+ and v7.14.0 is the first affected beats version The underlying cause is a likely bug in golang, reported here by myself here: golang/go#47329 (cherry picked from commit 2f62e8f)
Fixes #26205 Fixes deadlocks in heartbeat scheduler. No changelog needed because this bug only exists in go 1.16+ and v7.14.0 is the first affected beats version The underlying cause is a likely bug in golang, reported here by myself here: golang/go#47329 (cherry picked from commit 2f62e8f)
Fixes #26205 Fixes deadlocks in heartbeat scheduler. No changelog needed because this bug only exists in go 1.16+ and v7.14.0 is the first affected beats version The underlying cause is a likely bug in golang, reported here by myself here: golang/go#47329 (cherry picked from commit 2f62e8f) Co-authored-by: Andrew Cholakian <[email protected]>
Change https://golang.org/cl/336689 mentions this issue: |
… adjustTimers is 0 This avoids a race when a new timerModifiedEarlier timer is created by a different goroutine. For #47329 Fixes #47332 Change-Id: I6f6c87b4a9b5491b201c725c10bc98e23e0ed9d1 Reviewed-on: https://go-review.googlesource.com/c/go/+/336432 Trust: Ian Lance Taylor <[email protected]> Run-TryBot: Ian Lance Taylor <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Michael Knyszek <[email protected]> (cherry picked from commit 798ec73) Reviewed-on: https://go-review.googlesource.com/c/go/+/336689
Fixes #26205 Fixes deadlocks in heartbeat scheduler. No changelog needed because this bug only exists in go 1.16+ and v7.14.0 is the first affected beats version The underlying cause is a likely bug in golang, reported here by myself here: golang/go#47329 (cherry picked from commit 2f62e8f) Co-authored-by: Andrew Cholakian <[email protected]> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Change https://golang.org/cl/337309 mentions this issue: |
In CL 336432 we changed adjusttimers so that it no longer cleared timerModifiedEarliest if there were no timersModifiedEarlier timers. This caused some Google internal tests to time out, presumably due to the increased contention on timersLock. We can avoid that by simply not skipping the loop in adjusttimers, which lets us safely clear timerModifiedEarliest. And if we don't skip the loop, then there isn't much reason to keep the count of timerModifiedEarlier timers at all. So remove it. The effect will be that for programs that create some timerModifiedEarlier timers and then remove them all, the program will do an occasional additional loop over all the timers. And, programs that have some timerModifiedEarlier timers will always loop over all the timers, without the quicker exit when they have all been seen. But the loops should not occur all that often, due to timerModifiedEarliest. For #47329 Change-Id: I7b244c1244d97b169a3c7fbc8f8a8b115731ddee Reviewed-on: https://go-review.googlesource.com/c/go/+/337309 Trust: Ian Lance Taylor <[email protected]> Run-TryBot: Ian Lance Taylor <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Michael Pratt <[email protected]>
Change https://golang.org/cl/338649 mentions this issue: |
In CL 336432 we changed adjusttimers so that it no longer cleared timerModifiedEarliest if there were no timersModifiedEarlier timers. This caused some Google internal tests to time out, presumably due to the increased contention on timersLock. We can avoid that by simply not skipping the loop in adjusttimers, which lets us safely clear timerModifiedEarliest. And if we don't skip the loop, then there isn't much reason to keep the count of timerModifiedEarlier timers at all. So remove it. The effect will be that for programs that create some timerModifiedEarlier timers and then remove them all, the program will do an occasional additional loop over all the timers. And, programs that have some timerModifiedEarlier timers will always loop over all the timers, without the quicker exit when they have all been seen. But the loops should not occur all that often, due to timerModifiedEarliest. For #47329 For #47332 Change-Id: I7b244c1244d97b169a3c7fbc8f8a8b115731ddee Reviewed-on: https://go-review.googlesource.com/c/go/+/337309 Trust: Ian Lance Taylor <[email protected]> Run-TryBot: Ian Lance Taylor <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Michael Pratt <[email protected]> (cherry picked from commit bfbb288) Reviewed-on: https://go-review.googlesource.com/c/go/+/338649
Undoes https://github.com/elastic/beats/pull/27006/files (while preserving the new test), and also cleaning up the syntax using `time.Until`. Since the The golang bug I reported in golang/go#47329 has been fixed since somewhere in go 1.16.x (it's hard to track the exact version). This should be backported to 7.16.x since that already uses go 1.17.x and is safe.
Undoes https://github.com/elastic/beats/pull/27006/files (while preserving the new test), and also cleaning up the syntax using `time.Until`. Since the The golang bug I reported in golang/go#47329 has been fixed since somewhere in go 1.16.x (it's hard to track the exact version). This should be backported to 7.16.x since that already uses go 1.17.x and is safe.
Undoes https://github.com/elastic/beats/pull/27006/files (while preserving the new test), and also cleaning up the syntax using `time.Until`. Since the The golang bug I reported in golang/go#47329 has been fixed since somewhere in go 1.16.x (it's hard to track the exact version). This should be backported to 7.16.x since that already uses go 1.17.x and is safe. (cherry picked from commit 7b095da)
Undoes https://github.com/elastic/beats/pull/27006/files (while preserving the new test), and also cleaning up the syntax using `time.Until`. Since the The golang bug I reported in golang/go#47329 has been fixed since somewhere in go 1.16.x (it's hard to track the exact version). This should be backported to 7.16.x since that already uses go 1.17.x and is safe. (cherry picked from commit 7b095da)
…#29733) Undoes https://github.com/elastic/beats/pull/27006/files (while preserving the new test), and also cleaning up the syntax using `time.Until`. Since the The golang bug I reported in golang/go#47329 has been fixed since somewhere in go 1.16.x (it's hard to track the exact version). This should be backported to 7.16.x since that already uses go 1.17.x and is safe. (cherry picked from commit 7b095da) Co-authored-by: Andrew Cholakian <[email protected]>
What version of Go are you using (
go version
)?Tracked the specific issue to commit b4b0144 via
git bisect
Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Longstanding unit tests (~1.5 years old) started sporadically failing against go 1.16.x
Our program (Heartbeat) rapidly stops and resets a single timer based on user submitted jobs, effectively testing the golang timer. Further investigation revealed that
timer.Reset
was no longer resetting the timer consistently. Every 10-40k iterations or so it would have no effect, resulting in a non-triggering timer, and in our case, a deadlocked program.We tracked the specific issue to a change introduced in golang commit b4b0144 via
git bisect
The failure can be reproduced by running from the special branch below, which contains an enhanced test suite for Heartbeat using a watchdog timer to catch the failed timer.
We are now avoiding
Reset
in favor ofNewTimer
in a workaround PR elastic/beats#27006 . You can validate this by deleting theReset
call here and replacing it with theNewTimer
call hereDigging into the golang source code I discovered that I could fix the issue by commenting out the optimization on this line inside adjusttimers . It seems that the accounting of that variable may have an issue somewhere. The code is quite tricky, heavily concurrent, etc, and could use the eye of someone familiar with it.
What did you expect to see?
I expected the timer to fire consistently when reset.
What did you see instead?
Nothing, after enhancing the test suite for heartbeat to dump traces it was apparent that the program was in an idle state, with no timer scheduled, and no other code blocked.
The text was updated successfully, but these errors were encountered: