Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock on Redpanda topic Recreation #403

Closed
AlexandrosKyriakakis opened this issue Mar 20, 2023 · 1 comment · Fixed by #404
Closed

Deadlock on Redpanda topic Recreation #403

AlexandrosKyriakakis opened this issue Mar 20, 2023 · 1 comment · Fixed by #404

Comments

@AlexandrosKyriakakis
Copy link

Context

In some very rare cases a deadlock takes place.

Versions

I use:

This is also related with an issue fixed on redpanda latest release -> https://github.com/redpanda-data/redpanda/releases/tag/v22.3.14

A rare issue is fixed where rapidly deleting a topic and recreating a topic with the same name could trigger an assertion by @jcsp in https://github.com/redpanda-data/redpanda/pull/9054

Even though I use older releases I'm thinking that the deadlock is still there waiting.

Ways to reproduce:

Most often this deadlock takes place when the sequence of actions happen.

  1. Produce some data into a topic
  2. Consume all data from that topic (using poll records)
  3. Delete the topic
  4. Rapidly recreate the topic with the same name
  5. Produce some data again into that topic (using produce without promise & same producerID)

Deadlock Code:

Next I will explain how and where the deadlock happened in the code path.

At first here is the dumb of goroutines that got the deadlock

Process 1

goroutine 3445 [semacquire, 9 minutes]:
sync.runtime_SemacquireMutex(0xc00020ef00?, 0xb0?, 0x12ce814?)
	/usr/local/go/src/runtime/sema.go:77 +0x25
sync.(*Mutex).lockSlow(0xc0001453ec)
	/usr/local/go/src/sync/mutex.go:171 +0x165
sync.(*Mutex).Lock(...)
	/usr/local/go/src/sync/mutex.go:90
github.com/twmb/franz-go/pkg/kgo.(*Client).failProducerID(0xc000145000, 0x1, 0x1, {0x150ae00?, 0xc00018a680})
	/go/src/.../vendor/github.com/twmb/franz-go/pkg/kgo/producer.go:679 +0x95
github.com/twmb/franz-go/pkg/kgo.(*sink).handleReqRespBatch(0xc00021a180, 0x0, {0xc000192648, 0x11}, 0x505dd8?, {0x4ac09e?, 0xc000181bd0?}, 0xc000505cf0?, 0x63c5?, 0xffffffffffffffff, ...)
	/go/src/.../vendor/github.com/twmb/franz-go/pkg/kgo/sink.go:812 +0xde5
github.com/twmb/franz-go/pkg/kgo.(*sink).handleReqResp(0xc00021a180, 0xc000462dc0, 0xc000596a50, {0x15226c0?, 0xc000285ec0}, {0x0?, 0x0?})
	/go/src/.../vendor/github.com/twmb/franz-go/pkg/kgo/sink.go:631 +0x895
github.com/twmb/franz-go/pkg/kgo.(*sink).produce.func3(0xd7cb46?, {0x15226c0?, 0xc000285ec0?}, {0x0?, 0x0?})
	/go/src/.../vendor/github.com/twmb/franz-go/pkg/kgo/sink.go:369 +0x85
github.com/twmb/franz-go/pkg/kgo.(*sink).handleSeqResps(0xc00021a180, 0x1e8001c?)
	/go/src/.../vendor/github.com/twmb/franz-go/pkg/kgo/sink.go:411 +0x5f
created by github.com/twmb/franz-go/pkg/kgo.(*sink).doSequenced
	/go/src/.../vendor/github.com/twmb/franz-go/pkg/kgo/sink.go:402 +0x205

Process 2

goroutine 483930 [sync.Mutex.Lock, 9 minutes]:
sync.runtime_SemacquireMutex(0x11a7480?, 0x78?, 0xc0005334d0?)
	/usr/local/go/src/runtime/sema.go:77 +0x26
sync.(*Mutex).lockSlow(0xc000216210)
	/usr/local/go/src/sync/mutex.go:171 +0x165
sync.(*Mutex).Lock(...)
	/usr/local/go/src/sync/mutex.go:90
github.com/twmb/franz-go/pkg/kgo.(*Client).resetAllProducerSequences(0x151db60?)
	/go/src/.../vendor/github.com/twmb/franz-go/pkg/kgo/producer.go:669 +0x19e
github.com/twmb/franz-go/pkg/kgo.(*Client).producerID(0xc00014a000)
	/go/src/.../vendor/github.com/twmb/franz-go/pkg/kgo/producer.go:621 +0x22f
github.com/twmb/franz-go/pkg/kgo.(*sink).produce(0xc000672180, 0xc00046c5a0)
	/go/src/.../vendor/github.com/twmb/franz-go/pkg/kgo/sink.go:278 +0xd1
github.com/twmb/franz-go/pkg/kgo.(*sink).drain(0xc000672180)
	/go/src/.../vendor/github.com/twmb/franz-go/pkg/kgo/sink.go:246 +0x105
created by github.com/twmb/franz-go/pkg/kgo.(*sink).maybeDrain
	/go/src/.../vendor/github.com/twmb/franz-go/pkg/kgo/sink.go:188 +0xf8

These two processes above, use 2 mutex locks

  1. The recBuf.mu ->
    mu sync.Mutex // guards r/w access to all fields below
  2. The producer.idMu ->
    idMu sync.Mutex

But on the opposite priority on each process.

This sequence of locks if happen to be in parallel it will create a dead lock with one process waiting the other to be released.

Questions

Are you aware of this issue?
Do you think it would be a valid case on latest release?

Kind Regards

With much respect to your work!

@twmb
Copy link
Owner

twmb commented Mar 20, 2023

  • Not aware of issue
  • Almost certainly exists on latest release
    I'd really like to wire in Go's own priority lock checking someday but I absolutely don't have the time

Very good analysis above, this can be fixed. I spent a lot of time in the past ensuring that I didn't have any lock inversions but you found one I missed 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants