fix: TimeoutTicker returns wrong value/timeout pair when timeouts are… #67
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
… scheduled at ~approximately the same time (backport cometbft#3092) (cometbft#3107)
cometbft#3091
The problem is we have an edge case where we should drain the timer channel, but we "let it slide" in certain race conditions when two timeouts are scheduled near each other. This means we can have unsafe timeout behavior as demonstrated in the github issue, and likely more spots in consensus.
Notice that aside from NewTimer and OnStop, all timer accesses are from the same thread. In NewTimer we can block until the timer is drained (very quickly up to goroutine scheduling). In OnStop we don't need to guarantee draining before the method ends, we can just launch something into the channel that will kill it.
In the main timer goroutine, we can safely maintain this "timerActive" variable, and force drain when its active. This removes the edge case.
The test I created does fail on main.
PR checklist
.changelog
(we use unclog to manage our changelog)docs/
orspec/
) and code commentsThis is an automatic backport of pull request fix: TimeoutTicker returns wrong value/timeout pair when timeouts are scheduled at ~approximately the same time cometbft/cometbft#3092 done by Mergify.
PR checklist
.changelog
(we use unclog to manage our changelog)docs/
orspec/
) and code comments