Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-28669: Deadlock found when TxnStoreMutex trying to acquireLock #5585

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dengzhhu653
Copy link
Member

What changes were proposed in this pull request?

Resolving the Deadlock when the back db is MySQL

Why are the changes needed?

By default MySQL default isolation level is REPEATABLE-READ, for update in this isolation will hold the gap lock, if multiple clients are trying to for update and then insert into the same gap, it cloud cause the deadlock.

Does this PR introduce any user-facing change?

A new hive.metastore.have.multiple.leaders, if it's false, then the same housekeeping tasks will not leverage the db mutex to block each other.

Is the change a dependency upgrade?

No

How was this patch tested?

Testing the PR locally, querying from mysql information_schema.INNODB_TRX table, showing that trx_isolation_level of the new trx is 'READ COMMITTED'

@Aggarwal-Raghav
Copy link
Contributor

Based on the comments on JIRA and I have tested this patch on my local and haven't seen the error stacktrace, hence
LGTM +1 (non-binding)

@@ -174,16 +174,13 @@ public void run() {

// Check for timed out remote workers.
recoverFailedCompactions(true);
handle.releaseLocks(startedAt);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Releasing the locks will not happen if any code runs into a exception before this line, since it is inside try block


// Wrap the inner parts of the loop in a catch throwable so that any errors in the loop
// don't doom the entire thread.
try {
handle = txnHandler.getMutexAPI().acquireLock(TxnStore.MUTEX_KEY.Initiator.name());
try (TxnStore.MutexAPI.LockHandle handle = mutex.acquireLock(TxnStore.MUTEX_KEY.Initiator.name())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like, we have to use AutoCloseable here as well ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants