-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lint: add check to ensure mutex Unlock()
calls are deferred
#105366
Comments
This would've caught #106078 (comment) |
Returning to this issue due to #106568. Just reiterating that it could be high-value to introduce this lint since it might point us at the location of the replica mutex leak referred to in that issue. |
re: the UX that results from the linter, it is often cumbersome to introduce anonymous functions to guide the // package syncutil
type IdempotentUnlocker struct{mu interface{ Unlock() }}
func (iu *IdempotentUnlocker) IdempotentUnlock() {
if iu.mu == nil { return }
iu.mu.Unlock()
iu.mu = nil
}
func Lock(mu *sync.Mutex) IdempotentUnlocker() {
mu.Lock()
return IdempotentUnlocker{mu}
}
// something similar for RLock func (r *Replica) foo() {
mu := syncutil.Lock(r.mu)
defer mu.IdempotentUnlock()
if riskyWorkFailed() {
return nil
}
mu.IdempotentUnlock()
// do some more work outside of crit section
} I think that one will not allocate (have to check though) so it could be suitable for hot code paths as well. |
@tbg the team is picking this up as part of Code Yellow. Thanks for the pointers above 🙏 we will post updates here! |
I wrote a crude linter using revive framework. The linter performs super naive flow propagation to track if a lock is held during function calls. It doesn't perform a proper data flow propagation (needed for precision), nor is it aware of control-flow reachability. Instead, it uses an exclusion list (see
The linter also has another mode which performs a crude analysis to determine if a lock is potentially leaked. It does this by maintaining a stack (using lexical order) and popping each time
Despite being super naive, it was able to find a recently fixed leak [1], as well as, some new ones. (I'll open separate issues for those [2].) To run,
As you can imagine, this type of linter will produce a lot of false positives,
A manual audit is probably still warranted. However, with a bit more work (e.g., reachability analysis to exclude all functions that don't panic), I am confident we could reduce the false positives to a more manageable size. Attaching the output for both modes against master, |
@srosenberg this is great! So would the path forward on this issue be adding the reachability analysis, doing the manual audit, moving this logic into an analyzer so it can be used in CI? |
This commit adds a new pass which checks for `...Unlock()` expressions without `defer` which could result in a lock being held indefinitely. Resolves cockroachdb#105366 Release note: None
This commit adds a new pass which checks for `...Unlock()` expressions without `defer` which could result in a lock being held indefinitely. Resolves cockroachdb#105366 Release note: None
This commit adds a new pass which checks for `...Unlock()` expressions without `defer` which could result in a lock being held indefinitely. Resolves cockroachdb#105366 Release note: None
This commit adds a new pass which checks for `...Unlock()` expressions without `defer` which could result in a lock being held indefinitely. Resolves cockroachdb#105366 Release note: None
This commit adds a new pass which checks for `...Unlock()` expressions without `defer` which could result in a lock being held indefinitely. Resolves cockroachdb#105366 Release note: None
Extract a method to enable the use of deferred lock release. Release note: None Part of: cockroachdb#105366
Release note: None Part of: cockroachdb#105366
Allows the use of deferred unlock. Release note: None Part of: cockroachdb#105366
Allows the use of deferred unlock. Release note: None Part of: cockroachdb#105366
Allows the use of deferred unlocks. Release note: None Part of: cockroachdb#105366
Release note: None Part of: cockroachdb#105366
Extract into a method to allow the use of defer unlock. Part of: cockroachdb#105366 Release note: None
Release note: None Part of: cockroachdb#105366
Part of: cockroachdb#105366 Release note: None
Part of: cockroachdb#105366 Release note: None
This commit doesn't change the behavior of the method, but makes it more explicit that the unlock calls will be specifically tied to the replicas that are subsumed. This method intentionally "leaks" an unlocked raftMu, therefore it is still necessary to add `nolint:deferunlockcheck`. Part of: cockroachdb#105366 Release note: None
As part of cockroachdb#107577 an exclusion was added to replica_raft.go. After this PR this exclusion is no longer necessary. Part of: cockroachdb#105366 Release note: None
Is your feature request related to a problem? Please describe.
We recently dealt with some deadlocks due to code written in this style:
A safer way to write this code would be something like:
It would be nice to have a lint rule that enforces
.Unlock()
calls on mutexes use thedefer
keyword, to help avoid this pitfall in the future (and identify lingering cases of it).We recognize that in some cases, using
defer
is not ideal, so we should ensure that programmers have the option to uselint:ignore
for this rule. However, as the organization scales, we should set the default expectation to help avoid this pattern from being used in the future.Jira issue: CRDB-29001
The text was updated successfully, but these errors were encountered: