Skip to content

Commit

Permalink
storage: fix deadlock in consistency queue
Browse files Browse the repository at this point in the history
When `CheckConsistency` returns an error, the queue checks whether the
store is draining to decide whether the error is worth logging.

Unfortunately this check was incorrect and would block until the store
actually started draining.

A toy example of this problem is below (this will deadlock). The dual
return form of chan receive isn't non-blocking -- the second parameter
indicates whether the received value corresponds to a closing of the
channel.

Switch to a `select` instead.

```
package main

import (
	"fmt"
)

func main() {
	ch := make(chan struct{})
	_, ok := <-ch
	fmt.Println(ok)
}
```

Touches cockroachdb#21824.

Release note (bug fix): Prevent the consistency checker from
deadlocking. This would previously manifest itself as a steady number of
replicas queued for consistency checking on one or more nodes and would
resolve by restarting the affected nodes.
  • Loading branch information
tbg committed May 13, 2018
1 parent 1ca8e6b commit d4777be
Showing 1 changed file with 9 additions and 1 deletion.
10 changes: 9 additions & 1 deletion pkg/storage/consistency_queue.go
Original file line number Diff line number Diff line change
Expand Up @@ -102,9 +102,17 @@ func (q *consistencyQueue) process(
if q.interval() <= 0 {
return nil
}

// We only need this variable in the `pErr != nil` branch below, but for
// better testing coverage it was moved here.
var shouldQuiesce bool
select {
case <-repl.store.Stopper().ShouldQuiesce():
shouldQuiesce = true
}

req := roachpb.CheckConsistencyRequest{}
if _, pErr := repl.CheckConsistency(ctx, req); pErr != nil {
_, shouldQuiesce := <-repl.store.Stopper().ShouldQuiesce()
if !shouldQuiesce || !grpcutil.IsClosedConnection(pErr.GoError()) {
// Suppress noisy errors about closed GRPC connections when the
// server is quiescing.
Expand Down

0 comments on commit d4777be

Please sign in to comment.