Improve connection lock handling; always use context manager #1895
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is an extension of #1851 and #1854 -- the underlying issue is that several methods are not handling acquisition and release of the connection lock correctly. I initially tried to address this through more use of
self._lock.release()
, but after sitting on this change for a while I think I agree that we should rely on thewith self._lock:
context manager so that the blocks that are synchronized are more obvious and also are resilient to uncaught exceptions.I continue to believe that we should use a Lock, not an RLock, and one major reason for that is that the close() method calls back into the KafkaClient state change handler. That handler currently acquires the client lock, and so to avoid deadlocks I think we need to make sure that we no longer hold the connection lock when this handler is invoked. If we were to use an RLock, we could not be sure whether releasing the lock fully releases it or not (i.e., we only release our contextual hold, but some outer context may continue to hold the lock at a higher level).
Because of that, we also need to be careful to release the connection lock before we call self.close() and also before we call
future.success()
orfuture.failure()
because these may trigger callback/errback functions that themselves call close or some other method that may attempt to acquire the conn lock. So I restructured many of the affected blocks to move future failure and close handling out of the lock context manager. This makes the code a bit more difficult to read and maintain, but I think it is necessary at this stage. I'd welcome refactoring attempts for sure, and will continue to think about better approaches to this structure that can help us sane.This change is![Reviewable](https://camo.githubusercontent.com/1541c4039185914e83657d3683ec25920c672c6c5c7ab4240ee7bff601adec0b/68747470733a2f2f72657669657761626c652e696f2f7265766965775f627574746f6e2e737667)