Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent endless loop in lock cycle detection #1510 #1635

Closed
wants to merge 1 commit into from

Conversation

trancexpress
Copy link

@trancexpress trancexpress commented Jun 10, 2022

This change prevents an endless loop in: ReentrantCycleDetectingLock.detectPotentialLocksCycle()

Due to how code in ReentrantCycleDetectingLock.lockOrDetectPotentialLocksCycle() is synchronized,
its possible for a thread to both own/hold a lock (according to ReentrantCycleDetectingLock.lockOwnerThread)
and wait on the same lock (according to CycleDetectingLock.lockThreadIsWaitingOn).
In this state, if another thread tries to hold the same lock an endless loop will occur when calling detectPotentialLocksCycle().

With this change detectPotentialLocksCycle() removes the lock owning thread from
ReentrantCycleDetectingLock.lockOwnerThread,
if it detects that "this" lock is both waited on and owned by the same thread.
This prevents the endless loop during cycle detection.

Fix for: #1510

@google-cla
Copy link

google-cla bot commented Jun 10, 2022

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@trancexpress
Copy link
Author

I signed the contributor agreement, no idea when the bot will detect this (or whether it must be re-triggered).

@trancexpress
Copy link
Author

trancexpress commented Jun 10, 2022

OK I understand the problem now. ReentrantCycleDetectingLock.lockOrDetectPotentialLocksCycle() has the following code:

        final Thread currentThread = Thread.currentThread();
        synchronized (CycleDetectingLockFactory.class) {
          checkState();
          // Add this lock to the waiting map to ensure it is included in any reported lock cycle.
          lockThreadIsWaitingOn.put(currentThread, this);
          ListMultimap<Thread, ID> locksInCycle = detectPotentialLocksCycle();
          if (!locksInCycle.isEmpty()) {
            // We aren't actually going to wait for this lock, so remove it from the map.
            lockThreadIsWaitingOn.remove(currentThread);
            // potential deadlock is found, we don't try to take this lock
            return locksInCycle;
          }
        }

        // this may be blocking, but we don't expect it to cause a deadlock
        lockImplementation.lock();

        synchronized (CycleDetectingLockFactory.class) {
          // current thread is no longer waiting on this lock
          lockThreadIsWaitingOn.remove(currentThread);
          checkState();

          // mark it as owned by us
          lockOwnerThread = currentThread;
          lockReentranceCount++;
          // add this lock to the list of locks owned by a current thread
          locksOwnedByThread.put(currentThread, this);
        }

What this code (I assume) wants to do is:

  1. add the current thread and "this" lock to lockThreadIsWaitingOn
  2. check for cycles
  3. remove the current thread and "this" from lockThreadIsWaitingOn

And specifically, during 2.:

2.1. calls detectPotentialLocksCycle() and then calls addAllLockIdsAfter()
2.2. the two methods will loop forever, if they iterate over a thread that both owns a lock and waits on that lock, see the respective code:

        /* in detectPotentialLocksCycle() */
        ReentrantCycleDetectingLock<?> lockOwnerWaitingOn = this;
        // try to find a dependency path between lock's owner thread and a current thread
        while (lockOwnerWaitingOn != null && lockOwnerWaitingOn.lockOwnerThread != null) {
          Thread threadOwnerThreadWaits = lockOwnerWaitingOn.lockOwnerThread;
          // in case locks cycle exists lock we're waiting for is part of it
          lockOwnerWaitingOn =
              addAllLockIdsAfter(threadOwnerThreadWaits, lockOwnerWaitingOn, potentialLocksCycle);
        /* in addAllLockIdsAfter() */
        ReentrantCycleDetectingLock<?> unownedLock = lockThreadIsWaitingOn.get(thread);
        // If this thread is waiting for a lock add it to the cycle and return it
        if (unownedLock != null && unownedLock.lockFactory == this.lockFactory) {
          @SuppressWarnings("unchecked")
          ID typed = (ID) unownedLock.userLockId;
          potentialLocksCycle.put(thread, typed);
        }
        return unownedLock;

If the entire sequence was done in 1 synchronized block, there wouldn't be a problem. However, CycleDetectingLockFactory.class is locked to first add to lockThreadIsWaitingOn and check for cycles. Then the synchronized block is left, and another synchronized block is entered, to remove from lockThreadIsWaitingOn.

So what can occur (and occurs in the reproducer), is that 1 thread adds to lockThreadIsWaitingOn with a lock and then acquires the lock (and is so owning the lock). Then another thread does a cycle check, wanting to acquire the same lock. It sees the owner thread of the lock, tries to check for cycles with it and loops forever. While the first thread waits on removing from lockThreadIsWaitingOn - which would prevent the endless loop.

Unfortunately I don't know whether the adding the removal from lockThreadIsWaitingOn in the first synchronized block breaks other code. It might be safer to detect the bad case scenario and just bail out (as in the suggested workaround). I don't know what this means for the cycle detection though. The test added for #915 passes, but seeing that it misses the endless loop reported in #1510 maybe its not extensive enough to catch problems that this change introduces. Though IMO better to have cycle detection that didn't detect a cycle, then an endless loop that eventually results in an OOM.

@trancexpress trancexpress marked this pull request as ready for review June 10, 2022 21:15
@@ -249,6 +249,9 @@ private ListMultimap<Thread, ID> detectPotentialLocksCycle() {
MultimapBuilder.linkedHashKeys().arrayListValues().build();
// lock that is a part of a potential locks cycle, starts with current lock
ReentrantCycleDetectingLock<?> lockOwnerWaitingOn = this;
// Its possible the lock owner thread has not removed itself yet from the waiting-on-lock list.
// So we remove it here, to prevent an endless loop. See #1510.
lockThreadIsWaitingOn.remove(lockOwnerWaitingOn.lockOwnerThread);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this.remove(this.lockOwnerThread); ?

Copy link
Author

@trancexpress trancexpress Jun 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean lockThreadIsWaitingOn.remove(this.lockOwnerThread)?

Sure, why not. I prefer to use lockOwnerWaitingOn after this is stored into it, but its whichever.

@trancexpress
Copy link
Author

From my POV this is ready for a merge, IMO it fixes the concurrency problem in a reliable and simple way.

@trancexpress
Copy link
Author

Looks like there are fails in the cycle detection test, will have to check why:

Tests in error: 
  testCycleDetectingLockFactoriesDoNotDeadlock(com.google.inject.internal.CycleDetectingLockTest)
  testCycleReporting(com.google.inject.internal.CycleDetectingLockTest)

…detectPotentialLocksCycle()

Due to how code in ReentrantCycleDetectingLock.lockOrDetectPotentialLocksCycle() is synchronized,
its possible for a thread to both own/hold a lock (according to ReentrantCycleDetectingLock.lockOwnerThread)
and wait on the same lock (according to CycleDetectingLock.lockThreadIsWaitingOn).
In this state, if another thread tries to hold the same lock an endless loop will occur when calling detectPotentialLocksCycle().

With this change detectPotentialLocksCycle() removes the lock owning thread from
ReentrantCycleDetectingLock.lockOwnerThread,
if it detects that "this" lock is both waited on and owned by the same thread.
This prevents the endless loop during cycle detection.

Fix for: google#1510
@iloveeclipse
Copy link

@andrewthehan, @sameb, @cpovirk, @markmarch, anyone : could we please move forward with review?

copybara-service bot pushed a commit that referenced this pull request Apr 18, 2023
…would previously fail:

 * T1: Enter `lockOrDetectPotentialLocksCycle`:
    - Lock CAF.class, add itself to `lockThreadIsWaitingOn`, unlock CAF.class
    - Lock the cycle detecting lock (CDL)
    - Lock CAF.class, mark itself as `lockOwnerThread`, remove itself from `lockThreadIsWaitingOn`
    - Exit `lockOrDetectPotentialLocksCycle`
 * T1: Re-enter `lockOrDetectPotentialLocksCycle`:
    - Lock CAF.class, add itself to `lockThreadIsWaitingOn`, unlock CAF.class
 T2: Enter `lockOrDetectPotentialLocksCycle`
    - Lock CAF.class, invoke `detectPotentialLocksCycle`.

At this point, `detectPotentialLocksCycle` will now loop forever, because the `lockOwnerThread` is also in `lockThreadIsWaitingOn`. During the course of looping forever, it will OOM, because it's building up an in-memory structure of what it thinks are cycles.

The solution is to avoid the re-entrant T1 from adding itself to `lockThreadIsWaitingOn` if it's already the `lockOwnerThread`. It's guaranteed that it won't relinquish the lock concurrently, because it's the exact same thread that owns it.

Fixes #1635 and Fixes #1510

PiperOrigin-RevId: 524376697
copybara-service bot pushed a commit that referenced this pull request Apr 18, 2023
…would previously fail:

 * T1: Enter `lockOrDetectPotentialLocksCycle`:
    - Lock CAF.class, add itself to `lockThreadIsWaitingOn`, unlock CAF.class
    - Lock the cycle detecting lock (CDL)
    - Lock CAF.class, mark itself as `lockOwnerThread`, remove itself from `lockThreadIsWaitingOn`
    - Exit `lockOrDetectPotentialLocksCycle`
 * T1: Re-enter `lockOrDetectPotentialLocksCycle`:
    - Lock CAF.class, add itself to `lockThreadIsWaitingOn`, unlock CAF.class
 T2: Enter `lockOrDetectPotentialLocksCycle`
    - Lock CAF.class, invoke `detectPotentialLocksCycle`.

At this point, `detectPotentialLocksCycle` will now loop forever, because the `lockOwnerThread` is also in `lockThreadIsWaitingOn`. During the course of looping forever, it will OOM, because it's building up an in-memory structure of what it thinks are cycles.

The solution is to avoid the re-entrant T1 from adding itself to `lockThreadIsWaitingOn` if it's already the `lockOwnerThread`. It's guaranteed that it won't relinquish the lock concurrently, because it's the exact same thread that owns it.

Fixes #1635 and Fixes #1510

PiperOrigin-RevId: 524376697
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants