Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

forceTransfer/steal lock request option instead of forceRelease() ? #23

Closed
ralphch0 opened this issue Nov 20, 2017 · 23 comments
Closed
Milestone

Comments

@ralphch0
Copy link

Current API does not allow a page to ensure that it can take over a lock. Inspecting the locks and calling forceRelease() allows that page to release the lock from the current holder, but it does not guarantee that the lock will be free for it to acquire, since other pages may have queued lock requests.

I'm suggesting instead to introduce a "steal" or "forceTransfer" option to LockOptions that allows the current page to not only to force release the lock, but also to take it over ahead of all queued requests. forceRelease() can still be achieved by acquiring the lock with steal=true, and releasing it.

@inexorabletash
Copy link
Member

I was assuming (...) that forceRelease() would abort queued requests as well, but I'm not advocating any particular API shape for this.

Presumably, if two acquire(..., {steal:true}) calls take place around the same time the first caller will be told it has acquired the lock, but then the second caller would immediately steal it away?

@ralphch0
Copy link
Author

Sure, last one wins should be OK. This relies on the developers writing their code to be resilient to lock steals anyways.

@inexorabletash
Copy link
Member

Huh... I wonder if we could make seem like less of a wart by making this a non-negative safe integer priority rather than a boolean. A request/held lock has a priority (default 0); if a request comes through that's higher than held/requested priorities then those are dropped.

But maybe that would just encourage the use in normal application logic rather than just recovery scenarios.

@pwnall - thoughts?

@ralphch0
Copy link
Author

I originally proposed a priority option, but I split it into two: lockPriority & stealPriority, to make stealing explicit. Regular devs would probably expect that trying to acquire a lock with a higher priority should not automatically steal the lock, but rather just put the request ahead of lower priorities on the wait queue.

Alternatively, lock priority can probably be achieved by storing some additional information in the database:

  1. Acquire lock with ifAvailable=true
  2. If not available, check database for last lock holder information.
  3. If last lock holder is of lower priority, request the lock with steal=true.

Obviously, this is not ideal, because technically the lock holder can change between checking the database and issuing the lock request.

@pwnall
Copy link

pwnall commented Nov 22, 2017

It seems to me that (without additional support in other APIs) adding lock stealing to this API would undermine any sort of guarantees one could expect from such a system.

As far as I know, lock stealing works effectively when paired with operations that error out when the lock is broken (stolen). For example, in a distributed filesystem, reads and writes error out when the associated lock is broken.

For stealing to be meaningful in this API, I think it'd have to be complemented with support in:

  • postMessage - take a lock, and don't send the message if the lock is broken; this can be used to make sure that at most one worker is a coordinator in some system
  • indexedDB - associate a lock with a transaction, don't commit if the lock s broken; same as before, but specific to coordinators that revolve around updating database status (example: online sync)
  • fetch (and XHR, if it's not deprecated) - added for completion, because I/O

A (bold, IMO) alternative would be to restrict locks that can be stolen to worker contexts. When a lock stealing request is issued, the current lock owner's event loop is shut down as soon as possible (possibly like Worker.terminate?). When the stealing request is granted, it is guaranteed that the old worker has been terminated. I think this is simpler to reason about than the proposal above, and works for all coordination scenarios above. I'm labeling this proposal as bold because I don't know enough about worker lifecycle to know if it can be implemented in all browsers, though Worker.terminate's broad support makes me fairly optimistic.

@inexorabletash
Copy link
Member

I took an initial stab at this in the proto-spec. steal as an option, and the behavior that any previous held locks are dropped and lock requests are aborted; in either case the promise returned from acquire() rejects with AbortError.

@pwnall
Copy link

pwnall commented Jan 11, 2018

I just looked at the spec change above. First off, it's very surprising to me that steal is true by default.

Second, I would like to see concrete use cases for lock stealing discussed. It would help guide my intuition.

The case I've been thinking of is using locks for master election. For simplicity, assume a system where every tab spawns a worker and sends it a request whenever it needs to do something complex. Each worker first runs a master election algorithm. Workers that don't win forward their requests to the master, and forward the response to their tab.

The master election algorithm runs the following steps in an infinite loop:

  1. Acquire the lock with ifAvailable set to true. If the lock is acquired, break out of the loop and become master.
  2. Attempt to connect to the existing master. (e.g, create a MessageChannel and send one of the MessagePorts to a BroadcastChannel, packed in a connect request)
  3. Wait for some amount of time (1 second?) for the connection to complete. If the connection completes, break out of a loop and become a worker with a connection to the master.
  4. If the wait times out, loop to step 1.

Resilience to masters getting stuck (i.e. in a deadlock) can be added with the following measures:

  1. Add timeouts to requests sent to masters. If a worker fails to get an answer in x seconds, go back to the master election stage. This assumes idempotent requests. Effectively, wrap the earlier system description (master election + request forwarding) in a loop.
  2. Add a large timeout for the entire loop above. If a worker isn't able to get help from the master in (say) 60 seconds, it attempts to steal the lock from the current master and become a master.

@ralphch0: Sorry if this is completely disconnected from what you're considering. I need a use case to reason about.

Now that I've laid out a use case, we can start thinking about how the stealing API would work out there. Specifically, I'm concerned with how a master would detect that a lock was stolen from it. The code below is a sketch for a master's request-handling loop.

let lock_stolen = false;
try {
  await navigator.locks.acquire('master-lock', async () => {
    while (!lock_stolen) {
      // process request from queue
    }
  });
} catch(e) {
  if (IsLockStolenAbortError(e)) {
    lock_stolen = true;
  }
}

First, this looks a bit un-intuitive. Second, I'd have to pepper my code with lock_stolen checks (ideally, there'd be a check before every network / database request). Third, there's still potential for races.

Based on what I know so far, I'd still like to see lock-stealing terminate the worker(s) that held the lock, if any.

@inexorabletash
Copy link
Member

steal being true by default was a typo; and is fixed - thanks for catching that!

@ralphch0
Copy link
Author

Just wondering why we're also dropping other lock requests? If the intent is only to steal the lock, then this might cause an unintended side-effect.

I can also maybe see this feature used intelligently by some sites to transfer the ownership of a lock from one tab to another, without affecting any other pages waiting for the lock.

@inexorabletash
Copy link
Member

Keeping existing requests is doable.

I don't have a great intuition here. Dropping everything seemed straightforward, but dropping only held locks seems simpler.

Aside: what would {mode: 'shared', steal: true} mean if the lock is already held as shared (or more generally: if already grantable).

@inexorabletash
Copy link
Member

And should {steal: true, ifAvailable: true} be an error?

@ralphch0
Copy link
Author

I think in both cases it should succeed and steal as usual. It's not clear how these combinations would be useful, but none of these options are mutually conflicting.

@inexorabletash
Copy link
Member

Thanks. I'll rework the spec and my implementation (just finished a first pass) ASAP next week.

@inexorabletash
Copy link
Member

Proto-spec updated to leave the requests intact. The "steal" request jumps to the top of the request queue and is granted immediately.

The {steal: true, ifAvailable: true} combination ends up being slightly odd (in both spec and impl), since steal will ensure the request is granted and therefore ifAvailable is effectively ignored.

The {mode: 'shared', steal: true} combination causes all previous held locks — shared or exclusive — to be dropped, and the new lock is shared meaning any pending shared requests might be granted. A bit odd, especially if it was held as shared already, but consistent.

@ralphch0
Copy link
Author

Ah, I can see now how ifAvailable & steal are kind of confusing together. And since ifAvailable is completely ignored, maybe it's better to disallow this combination and force devs fix their options?

For steal and shared, I also see the gotcha there. We could either clarify this in the spec, or disallow if we don't think this is likely to be useful.

@pwnall
Copy link

pwnall commented Jan 31, 2018

(general note: I still think stealing is a footgun without something like worker termination)

I like the new approach of throwing if both steal and ifAvailable are specified. Seems like having both true can only happen in case of developer confusion, and correcting perception early in the application development cycle is good.

I find the behavior for stealing shared locks confusing. It seems to me that "steal" means "I want this lock right now without waiting, and I'm willing to step on anyone who gets in my way". In the current context of S/X locks, I would have guessed that stealing an X lock should break the lock for all current holders, while stealing an S lock would only break the lock if it's held in X mode. In a more general model, stealing a lock would break all current holders whose mode conflicts with the stealing request's mode.

Am I confusing "steal" with "preempt"? It seems to me that the current behavior of booting all S lock holders when stealing an S lock is closer to "purge" than to "steal".

@ralphch0
Copy link
Author

I generally don't have strong opinions about shared locks, simply because we currently don't have a scenario where we would use them. So if we decide to restrict this combination, I would not object. There is also a way to simulate this behavior:
Code can take an exclusive lock with steal = true, and then immediately release it and then take a shared lock.

While on the subject, "ifAvailable" and "signal" (abort controller) also seem to fall into the same category as "ifAvailable" & "steal", so maybe that should error out as well.

@inexorabletash
Copy link
Member

Code can take an exclusive lock with steal = true, and then immediately release it and then take a shared lock.

Not if there's already a queued request for an exclusive lock. That would be run ahead of the request for the shared lock.

While on the subject, "ifAvailable" and "signal" (abort controller) also seem to fall into the same category as "ifAvailable" & "steal", so maybe that should error out as well.

Good catch. BTW, I pinged the folks working on the abort stuff; sounds like that code might be in to Blink next week.

@inexorabletash
Copy link
Member

Hrm... and signal + steal is also silly:

  • ifAvailable + steal - steal would always make it available.
  • ifAvailable + signal - the availability check/grant would be processed before the signal could change the outcome
  • steal + signal - the steal would be processed before the signal could change behavior

I'll file another bug about signal edge cases.

@inexorabletash
Copy link
Member

#34 for signal racing.

@ralphch0
Copy link
Author

ralphch0 commented Jan 31, 2018

Not if there's already a queued request for an exclusive lock. That would be run ahead of the request for the shared lock.

Good point.

@inexorabletash
Copy link
Member

Based on offline conversations w/ @pwnall, making steal + shared an error. Easier to forbid now and allow later than the converse.

@inexorabletash inexorabletash added this to the v1 milestone Feb 7, 2018
@inexorabletash
Copy link
Member

Since we seem to have settled on a syntax and semantics, resolving this.

I'm still sympathetic to the argument that breaking locks is crazy, but let's open new bugs for that. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants