-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
etcd-runner: remove mutex on validate() and release() in global.go #7902
Conversation
@@ -69,12 +72,16 @@ func runRacerFunc(cmd *cobra.Command, args []string) { | |||
m := concurrency.NewMutex(s, racers) | |||
rcs[i].acquire = func() error { return m.Lock(ctx) } | |||
rcs[i].validate = func() error { | |||
mu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is this functionally different from what's in the code now?
The current code:
mu.Lock()
f()
mu.Unlock()
does the same thing as this new code:
func f() {
mu.Lock()
defer mu.Lock()
...
}
(panics not withstanding)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK nevermind the mutex isn't being applied to the election. Is this the right fix, though? The follower acquire
s the election but the doRounds code is assuming exclusive ownership and that's causing a deadlock. Should acquire
always mean exclusive ownership then and so the reuse for elections is a hack that should be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just tear our the entire doRounds acquire/release thing for elections. There's no way to sanely reason about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The follower acquires the election but the doRounds code is assuming exclusive ownership and that's causing a deadlock.
Follower can't win the election if it is a follower. Only one client can win the election. The deadlock is caused when the follower block on rcNextc
when it obtained mu.Lock()
before leader does. However, if leader obtains mu.Lock()
before follower does, then everything is fine.
remove the lock fixes the above issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was talking about whether the doRounds loop is the right abstraction here. I know how elections work.
Tried this with
|
@heyitsanthony good catch on the data race. let me see what's going on. |
The race is on var [ I'll have another pr to fix that. |
@fanminshi the race should be fixed here |
@heyitsanthony alright will fix the race #7903 in this pr too. |
Codecov Report
@@ Coverage Diff @@
## master #7902 +/- ##
=========================================
Coverage ? 75.65%
=========================================
Files ? 332
Lines ? 26316
Branches ? 0
=========================================
Hits ? 19910
Misses ? 4975
Partials ? 1431 Continue to review full report at Codecov.
|
election runner can deadlock in atomic release(). suppose election runner has two clients A and B. if A is a leader and B is a follower, B obtains lock for release() and waits for A to close(nextc) which signal next round is ready. However, A can only close(nextc) if it obtains lock for release(); hence deadlock. this pr removes atomicity of validate() and release() in global.go and gives the responsibility of locking to each runner. FIXES etcd-io#7891
fixed #7903. |
|
I am unsure how the above race can happen.
The validateWaiters loop ensures that all followers have been validated in which
before leader sets if observedLeader == v {
close(nextc)
nextc = make(chan struct{}) // race here
} edit: I observe the race locally as well. Investigating... |
Fixed race on if observedLeader == v {
close(nextc)
nextc = make(chan struct{}) // race here
} leader |
election runner can deadlock in atomic release().
suppose election runner has two clients A and B.
if A is a leader and B is a follower, B obtains lock
for release() and waits for A to close(nextc) which signal
next round is ready. However, A can only close(nextc) if it
obtains lock for release(); hence deadlock.
this pr removes atomicity of validate() and release() in global.go
and gives the responsibility of locking to each runner.
FIXES #7891