-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid self-deadlock in the translog #29520
Conversation
Today when reading an operation from the current generation fails tragically we attempt to close the translog. However, by invoking close before releasing the read lock we end up in self-deadlock because closing tries to acquire the write lock and the read lock can not be upgraded to a write lock. To avoid this, we move the close invocation outside of the try-with-resources that acquired the read lock. As an extra guard against this, we document the problem and add an assertion that we are not trying to invoke close while holding the read lock.
Pinging @elastic/es-distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
private void closeOnTragicEvent(final Exception ex) { | ||
// we can not hold a read lock here because closing will attempt to obtain a write lock and that would result in self-deadlock | ||
assert readLock.isHeldByCurrentThread() == false : Thread.currentThread().getName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @jasontedor. Can you add a note to the PR description that this was introduced with #29401, i.e. affects code that was never released?
private void closeOnTragicEvent(final Exception ex) { | ||
// we can not hold a read lock here because closing will attempt to obtain a write lock and that would result in self-deadlock | ||
assert readLock.isHeldByCurrentThread() == false : Thread.currentThread().getName(); | ||
Objects.requireNonNull(ex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we never expect a null value to be passed as parameter to this internal method, I'm not a fan of sprinkling this check here. This is defensive programming at its worst.
Today when reading an operation from the current generation fails tragically we attempt to close the translog. However, by invoking close before releasing the read lock we end up in self-deadlock because closing tries to acquire the write lock and the read lock can not be upgraded to a write lock. To avoid this, we move the close invocation outside of the try-with-resources that acquired the read lock. As an extra guard against this, we document the problem and add an assertion that we are not trying to invoke close while holding the read lock.
* 6.x: [TEST] REST client request without leading '/' (#29471) Prevent accidental changes of default values (#29528) [Docs] Add definitions to glossary (#29127) Avoid self-deadlock in the translog (#29520) Minor cleanup in NodeInfo.groovy Lazy configure build tasks that require older JDKs (#29519) Fix usage of NodeInfo#nodeVersion in build Control max size/count of warning headers (#28427) (#29516) Simplify snapshot check in root build file Make NodeInfo#nodeVersion strongly-typed as Version (#29515) Enable license header exclusions (#29379) Use proper Java version for BWC builds (#29493) Mute TranslogTests#testFatalIOExceptionsWhileWritingConcurrently
* master: [TEST] REST client request without leading '/' (#29471) Using ObjectParser in UpdateRequest (#29293) Prevent accidental changes of default values (#29528) [Docs] Add definitions to glossary (#29127) Avoid self-deadlock in the translog (#29520) Minor cleanup in NodeInfo.groovy Lazy configure build tasks that require older JDKs (#29519) Simplify snapshot check in root build file Make NodeInfo#nodeVersion strongly-typed as Version (#29515) Enable license header exclusions (#29379) Use proper Java version for BWC builds (#29493) Mute TranslogTests#testFatalIOExceptionsWhileWritingConcurrently Enable skipping fetching latest for BWC builds (#29497)
* master: (96 commits) TEST: Mute testEnsureWeReconnect Mute full cluster restart test recovery test REST high-level client: add support for Indices Update Settings API [take 2] (elastic#29327) Plugins: Fix native controller confirmation for non-meta plugin (elastic#29434) Remove PipelineExecutionService#executeIndexRequest (elastic#29537) Fix overflow error in parsing of long geohashes (elastic#29418) Remove unused index.ttl.disable_purge setting (elastic#29527) FullClusterRestartIT.testRecovery should wait for all initializing shards Build: Fail if any libs depend on non-core libs (elastic#29336) [TEST] REST client request without leading '/' (elastic#29471) Using ObjectParser in UpdateRequest (elastic#29293) Prevent accidental changes of default values (elastic#29528) [Docs] Add definitions to glossary (elastic#29127) Avoid self-deadlock in the translog (elastic#29520) Minor cleanup in NodeInfo.groovy Lazy configure build tasks that require older JDKs (elastic#29519) Simplify snapshot check in root build file Make NodeInfo#nodeVersion strongly-typed as Version (elastic#29515) Enable license header exclusions (elastic#29379) Use proper Java version for BWC builds (elastic#29493) ...
Today when reading an operation from the current generation fails tragically we attempt to close the translog. However, by invoking close before releasing the read lock we end up in self-deadlock because closing tries to acquire the write lock and the read lock can not be upgraded to a write lock. To avoid this, we move the close invocation outside of the try-with-resources that acquired the read lock. As an extra guard against this, we document the problem and add an assertion that we are not trying to invoke close while holding the read lock.
Closes #29509, relates #29401