rfc: introduce shared locks RFC #101799

arulajmani · 2023-04-18T21:54:48Z

SQL databases support multiple forms of row-level locks, which are locks held by transactions on individual rows. The most common forms of these locks are shared locks and exclusive locks. Shared locks are to reader locks as exclusive locks are to writer locks in a reader-writer mutex.

This RFC proposes introducing SHARED key-level locks in the key-value layer and exposing them through the SELECT FOR SHARE SQL statement. This will enable the use of SHARED locks externally through SELECT FOR SHARE statements. It will also enable their use internally for operations such as foreign key checks under weaker isolation levels (see: Read Committed).

Informs: #91545

Release note: None

cockroach-teamcity · 2023-04-18T21:54:56Z

This change is

SQL databases support multiple forms of row-level locks, which are locks held by transactions on individual rows. The most common forms of these locks are shared locks and exclusive locks. Shared locks are to reader locks as exclusive locks are to writer locks in a reader-writer mutex. This RFC proposes introducing SHARED key-level locks in the key-value layer and exposing them through the `SELECT FOR SHARE` SQL statement. This will enable the use of SHARED locks externally through `SELECT FOR SHARE` statements. It will also enable their use internally for operations such as foreign key checks under weaker isolation levels (see: Read Committed). Informs: cockroachdb#91545 Release note: None

bdarnell

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @arulajmani, @michae2, @nvanbenschoten, and @sumeerbhola)

docs/RFCS/20230209_shared_locks.md line 111 at r1 (raw file):

### Lock re-acquisition

Lock upgrades are points at which deadlocks (and therefore retry errors) can occur. I see that you address deadlock detection below, but this is also a user education concern. When we have only a single strength of lock, you can avoid deadlocks by being careful about the order of lock acquisition to use a consistent order in all transactions. With upgradable locks of multiple strengths, you would also have to be careful to only acquire each lock once (at the highest strength required) to be assured that deadlocks would not occur.

docs/RFCS/20230209_shared_locks.md line 250 at r1 (raw file):

with all timestamps. This means non-MVCC read latches conflict with writes at all timestamps, and 
non-MVCC write latches conflict with all concurrent reads and writes. The read latching behavior 
here might be a bit surprising – naively, one would not expect a read latch at timestamp 0 conflict

The reason for this "surprising" behavior is that it was designed with the assumption that there is an effective distinction between MVCC keys and non-MVCC keys. That is, in the beginning, there were no keys that mixed MVCC and non-MVCC operations. I'm not sure whether that has held true over the years or whether the suggestion of a non-MVCC read latch in the next paragraph would be the first time we'd mixed things in that way.

In any case, I think I agree with the plan to use the maximum timestamp instead of zero, but we also need to be careful with synthetic future timestamps - we don't want a timestamp in the future to be fed back into the HLC (fortunately the maximum possible timestamp would just cause a crash in this occurred, so there's no chance of it slipping by unnoticed)

docs/RFCS/20230209_shared_locks.md line 260 at r1 (raw file):

more intuitive way to represent a read latch that conflicts with all writers.

The discussion above begs a different question – does declaring latches as read/write operations

Not all reads get locks, so I don't immediately see how the locking system could replace the current read/write latching system. We still need read/write latches to allow reads at historical timestamps to proceed underneath a current exclusive lock, for example.

docs/RFCS/20230209_shared_locks.md line 467 at r1 (raw file):

lock requester if we imagine a constant stream of SHARED lock requesters.

We'll deviate from this behavior. If we find that there's a queue formed on a lock, i.e., not all

👍

I think this implementation detail should be mentioned somewhere in the main body and not just the appendix. I started to write up a comment about the possibility of starvation in the "lock table sequencing" section because the way the process was described there sounded vulnerable to starvation.

sumeerbhola

Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @arulajmani, @michae2, and @nvanbenschoten)

docs/RFCS/20230209_shared_locks.md line 112 at r2 (raw file):

[*] conflict behavior is configurable using a cluster setting; defaults to 
"conflicts" if the exclusive lock is held by a serializable transaction; defaults

What are the desired isolation semantics seen by a serializable transaction when mixing read-committed and serializable transactions?
(A pointer to literature is fine).

docs/RFCS/20230209_shared_locks.md line 135 at r2 (raw file):

a transaction that acquires an `Exclusive` lock as part of a `SELECT FOR
UPDATE`, and then proceeds to actually perform the UPDATE (and thus writes an
`Intent`). One could also imagine a scenario where a transaction promotes a

Is an Intent always stronger than an Exclusive lock? Naively, it seems no, since an Intent will not conflict with a reader at a lower timestamp while an Exclusive lock will. Is the answer that this difference doesn't matter since the txn holding the exclusive lock will only monotonically increase its own timestamp, so the Intent serves as well in protecting the timestamp range in which it can commit?

docs/RFCS/20230209_shared_locks.md line 206 at r2 (raw file):

    // Invariant: only the lowest sequence number that has not been rolled back
    //            at which a lock was acquired with a particular lock strength
    //            is stored in the infos array.

Seems like this lowest sequence number scheme is an improvement/optimization over lockHolderInfo.seqs which was tracking all the TxnSeqs instead of the lowest one.

docs/RFCS/20230209_shared_locks.md line 264 at r2 (raw file):

Incoming requests are required to acquire latches over the keyspans they intend
to touch at the request’s timestamp. As of the time of writing, two types of
latches exist – reach latches and write latches. They each have different

nit: read latches

docs/RFCS/20230209_shared_locks.md line 297 at r2 (raw file):

we will need to serialize requests from the same transaction trying to acquire
shared locks, while still allowing different transactions to acquire shared
locks concurrently. This can be done by acquiring a (write) latch on a local,

IIUC, a shared lock acquisition will acquire two latches: a read latch with max timestamp on the key itself, and a write latch on a synthetic key that is a function of the key being locked and the txnID.

docs/RFCS/20230209_shared_locks.md line 371 at r2 (raw file):

liveness one above. The only difference is that instead of detecting a failed
coordinator, the result of the `PushTxn` is that one of the two transactions
that form the deadlock gets aborted.

The argument seems hand wavy.
Consider the case of transactions T1, T2, T3, T4 to start with, where we will create a deadlock cycle between T2 and T3.
T1, T2 hold a shared lock on key K1
T3, T4 hold a shared lock on key K2

T3 has an exclusive lock acquisition request waiting at K1 that selects T1 as the pushee
T2 has an exclusive lock acquisition request waiting at K2 that selects T4 as the pushee

We don't see the deadlock cycle yet. If we keep letting new shared lock acquisitions to happen while T2 and T3 are waiting (say transactions T5, T6, ...) and each time T2, T3 select that other transaction as the pushee they will wait forever.

docs/RFCS/20230209_shared_locks.md line 434 at r2 (raw file):

reservation, the request must check the list of waiting locking requests. If the
list is empty, or if the lowest sequence number from the list of waiters is
higher than the request’s sequence number, it can acquire a reservation.

I think this is what prevents the earlier deadlock scenario I was trying to sketch.

I had written long comments about joint reservations and partial reservation breakage in lock_table.go, but I didn't try to do any proofs since we didn't need those features at that time. I am a bit worried that we will overlook something if we don't try to sketch out some proofs now.

Also, I can't remember why we had reservation breaking in the original lock_table.go -- was it because we were assuming nothing about latch acquisition above the lock table that would prevent two requests getting to the lock table that both wanted exclusive locks? It seems we still have an implicit assumption in that after lockTable.ScanAndEnqueue returns a guard that says shouldWait is false, we have reservations, but no longer concern ourselves with them getting broken and proceed to evaluation to actually place those locks. I wonder whether we should be making some assumptions that the lockTable exists in the context of latching, and use that to simplify some of the lockTable implementation.
I suspect I am forgetting something important.

docs/RFCS/20230209_shared_locks.md line 487 at r2 (raw file):

concurrency when resolving durable shared locks on the same key will hurt; we
may  want to revisit latching, in the context of releasing shared locks, as we
design that.

this seems important.
I am wondering whether this RFC should cover all the locking changes we have planned for this next release since they are related.

arulajmani

Thanks for the comments @bdarnell and @sumeerbhola; flushing most (not all) replies to your comments.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @bdarnell, @michae2, @nvanbenschoten, and @sumeerbhola)

docs/RFCS/20230209_shared_locks.md line 111 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Lock upgrades are points at which deadlocks (and therefore retry errors) can occur. I see that you address deadlock detection below, but this is also a user education concern. When we have only a single strength of lock, you can avoid deadlocks by being careful about the order of lock acquisition to use a consistent order in all transactions. With upgradable locks of multiple strengths, you would also have to be careful to only acquire each lock once (at the highest strength required) to be assured that deadlocks would not occur.

That's a good point, especially considering we're not planning on adding support for Update locks just yet.

docs/RFCS/20230209_shared_locks.md line 260 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Not all reads get locks, so I don't immediately see how the locking system could replace the current read/write latching system. We still need read/write latches to allow reads at historical timestamps to proceed underneath a current exclusive lock, for example.

You're right, we wouldn't be able to replace the current read/write latching system by just considering lock strengths. We would need other auxiliary information, such as the timestamp of the request as well.

We already need something like this for locking conflicts as well -- for example, we want to allow non-locking reads at lower timestamps to proceed to evaluation without waiting on intents at higher timestamps. We've defined a new concept, called lock.Modes to capture such conflict resolution.

The thought here was that we could apply the same conflict resolution rules for latches like we do for locks (theoretically). It's not in our critical path, so this isn't something we plan on changing in the short term at least.

docs/RFCS/20230209_shared_locks.md line 467 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

👍

I think this implementation detail should be mentioned somewhere in the main body and not just the appendix. I started to write up a comment about the possibility of starvation in the "lock table sequencing" section because the way the process was described there sounded vulnerable to starvation.

Yeah, I think both you and Sumeer had similar comments. I'll call this out more explicitly earlier in the text.

docs/RFCS/20230209_shared_locks.md line 135 at r2 (raw file):

Is an Intent always stronger than an Exclusive lock? Naively, it seems no, since an Intent will not conflict with a reader at a lower timestamp while an Exclusive lock will.

An Intent is always stronger than Exclusive locks. Like you mention, an Intent will not conflict with a reader at a lower timestamp. However, an Exclusive lock will not conflict with any non-locking readers (even those at a higher timestamp)[*]

[*] This behaviour is configurable for serializable transactions using a cluster setting. Exclusive locks can be configured to conflict with non-locking readers at or above their the Exclusive locks timestamp (thus acting with the same strength as an Intent would). This is described in more detail around:

cockroach/pkg/kv/kvserver/concurrency/lock/locking.proto

Lines 153 to 183 in 09ec0ee

    
           //  +------------+---------+-----------+-----------+-------------+----------+ 
        
           //  |            |   None  |  Shared   |  Update   |  Exclusive  |  Intent  | 
        
           //  +------------+---------+-----------+-----------+-------------+----------+ 
        
           //  | None       |         |           |           |      X^*    |    X^†   | 
        
           //  +------------+---------+-----------+-----------+-------------+----------+ 
        
           //  | Shared     |         |           |           |      X      |    X     | 
        
           //  +------------+---------+-----------+-----------+-------------+----------+ 
        
           //  | Update     |         |           |     X     |      X      |    X     | 
        
           //  +------------+---------+-----------+-----------+-------------+----------+ 
        
           //  | Exclusive  |   X^*   |     X     |     X     |      X      |    X     | 
        
           //  +------------+---------+-----------+-----------+-------------+----------+ 
        
           //  | Intent     |   X^†   |     X     |     X     |      X      |    X     | 
        
           //  +------------+---------+-----------+-----------+-------------+----------+ 
        
           // 
        
           // [†] reads under optimistic concurrency control in CockroachDB only conflict 
        
           // with Intent locks if the read's timestamp is equal to or greater than the 
        
           // lock's timestamp. If the read's timestamp is below the Intent lock's 
        
           // timestamp then the two are compatible. 
        
           // 
        
           // [*] until the re-introduction of weaker isolation levels, all transactions in 
        
           // CockroachDB used serializable isolation. Historically, CockroachDB did not 
        
           // make a strength distinction between Exclusive locks and Intents. As such, 
        
           // reads under concurrency control would conflict with Exclusive locks if the 
        
           // read's timestamp was at or above the Exclusive lock's timestamp. Now that 
        
           // there is such a distinction, non-locking reads from serializable transactions 
        
           // do not block on Exclusive locks, regardless of their timestamp. However, 
        
           // there is a desire to let users opt into the old behavior using the 
        
           // ExclusiveLocksBlockNonLockingReads cluster setting. Note that this only 
        
           // applies when both the non-locking read and Exclusive lock belong to 
        
           // serializable  transactions, as that's the only "old" behavior to speak of 
        
           // here.

docs/RFCS/20230209_shared_locks.md line 206 at r2 (raw file):

Previously, sumeerbhola wrote…

Seems like this lowest sequence number scheme is an improvement/optimization over lockHolderInfo.seqs which was tracking all the TxnSeqs instead of the lowest one.

One thing @nvanbenschoten and I realized since then (discussion captured in #102269 (comment)) is that we only need to do this tracking for in-memory locks, where the lock table is the source of truth for which locks are and are not held after savepoint rollbacks. It gets incredibly tricky to try and keep the replicated state in sync with what we have here, so we're going to change these datastructures a bit to store replicated/un-replicated locks separately. Tracking the lowest sequence number scheme will stay, albeit just for un-replicated locks.

docs/RFCS/20230209_shared_locks.md line 297 at r2 (raw file):

Previously, sumeerbhola wrote…

IIUC, a shared lock acquisition will acquire two latches: a read latch with max timestamp on the key itself, and a write latch on a synthetic key that is a function of the key being locked and the txnID.

Yeah.

docs/RFCS/20230209_shared_locks.md line 371 at r2 (raw file):

Previously, sumeerbhola wrote…

The argument seems hand wavy.
Consider the case of transactions T1, T2, T3, T4 to start with, where we will create a deadlock cycle between T2 and T3.
T1, T2 hold a shared lock on key K1
T3, T4 hold a shared lock on key K2

T3 has an exclusive lock acquisition request waiting at K1 that selects T1 as the pushee
T2 has an exclusive lock acquisition request waiting at K2 that selects T4 as the pushee

We don't see the deadlock cycle yet. If we keep letting new shared lock acquisitions to happen while T2 and T3 are waiting (say transactions T5, T6, ...) and each time T2, T3 select that other transaction as the pushee they will wait forever.

Yeah -- this argument relies on the assumption that the list of shared lock holders cannot grow unboundedly. Like you mentioned below, this is guaranteed given how we're deciding when a request can acquire a reservation (and thus eventually acquire a lock) and when it has to wait (even though it may not conflict with a held lock).

docs/RFCS/20230209_shared_locks.md line 434 at r2 (raw file):

I had written long comments about joint reservations and partial reservation breakage in lock_table.go

I've referenced that comment multiple times and found it very useful, so thanks for writing it!

I wonder whether we should be making some assumptions that the lockTable exists in the context of latching, and use that to simplify some of the lockTable implementation.

I think one reason that we can't assume that requests in the lockTable always exist in the context of latching is that a request may have reserved a lock, discovered a conflict (lock, reservation it can't break) on a different key it is also trying to lock, and thus dropped its latches when queueing behind this conflicting lock/reservation. We would be susceptible to deadlock scenarios if we didn't do reservation breaking. I may be missing something, but I don't think we can avoid it.

It seems we still have an implicit assumption in that after lockTable.ScanAndEnqueue returns a guard that says shouldWait is false, we have reservations, but no longer concern ourselves with them getting broken and proceed to evaluation to actually place those locks.

I think the reason we can do that is because we're guaranteed to still be holding latches while evaluating, so the request can't have its reservation broken as it evaluates.

docs/RFCS/20230209_shared_locks.md line 487 at r2 (raw file):

Previously, sumeerbhola wrote…

this seems important.
I am wondering whether this RFC should cover all the locking changes we have planned for this next release since they are related.

The plan is to (quickly) follow this RFC up with another one that talks about replicated locks.

sumeerbhola

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @arulajmani, @bdarnell, @michae2, and @nvanbenschoten)

docs/RFCS/20230209_shared_locks.md line 135 at r2 (raw file):

[*] conflict behavior is configurable using a cluster setting; defaults to
"conflicts" if the exclusive lock is held by a serializable transaction; defaults

nit: the current text above does not say that this "at or above the Exclusive locks timestamp" part, which is why I was confused and thought that exclusive locks could be configured to conflict with readers at lower timestamps.

docs/RFCS/20230209_shared_locks.md line 206 at r2 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

One thing @nvanbenschoten and I realized since then (discussion captured in #102269 (comment)) is that we only need to do this tracking for in-memory locks, where the lock table is the source of truth for which locks are and are not held after savepoint rollbacks. It gets incredibly tricky to try and keep the replicated state in sync with what we have here, so we're going to change these datastructures a bit to store replicated/un-replicated locks separately. Tracking the lowest sequence number scheme will stay, albeit just for un-replicated locks.

Thanks for the additional detail -- this makes sense.

docs/RFCS/20230209_shared_locks.md line 434 at r2 (raw file):

I suspect I am forgetting something important.

I forgot the obvious fact that we release latches when waiting for locks :)

arulajmani requested review from nvanbenschoten, michae2 and sumeerbhola April 18, 2023 21:54

arulajmani requested a review from a team as a code owner April 18, 2023 21:54

arulajmani requested a review from AlexTalks April 19, 2023 03:21

arulajmani mentioned this pull request Apr 25, 2023

concurrency: implement LockTable datastructure changes to support SHARED locks #102270

Closed

arulajmani force-pushed the shared-locks-rfc branch from e91e88b to 04d7fd7 Compare April 25, 2023 19:23

bdarnell reviewed Apr 26, 2023

View reviewed changes

arulajmani mentioned this pull request May 5, 2023

concurrency: rollback locks sequence numbers on lock reacquisition for unreplicated locks #102269

Closed

sumeerbhola requested changes May 5, 2023

View reviewed changes

arulajmani commented May 10, 2023

View reviewed changes

sumeerbhola reviewed May 31, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rfc: introduce shared locks RFC #101799

rfc: introduce shared locks RFC #101799

arulajmani commented Apr 18, 2023

cockroach-teamcity commented Apr 18, 2023

bdarnell left a comment

sumeerbhola left a comment

arulajmani left a comment

sumeerbhola left a comment

	// +------------+---------+-----------+-----------+-------------+----------+
	// \| \| None \| Shared \| Update \| Exclusive \| Intent \|
	// +------------+---------+-----------+-----------+-------------+----------+
	// \| None \| \| \| \| X^* \| X^† \|
	// +------------+---------+-----------+-----------+-------------+----------+
	// \| Shared \| \| \| \| X \| X \|
	// +------------+---------+-----------+-----------+-------------+----------+
	// \| Update \| \| \| X \| X \| X \|
	// +------------+---------+-----------+-----------+-------------+----------+
	// \| Exclusive \| X^* \| X \| X \| X \| X \|
	// +------------+---------+-----------+-----------+-------------+----------+
	// \| Intent \| X^† \| X \| X \| X \| X \|
	// +------------+---------+-----------+-----------+-------------+----------+
	//
	// [†] reads under optimistic concurrency control in CockroachDB only conflict
	// with Intent locks if the read's timestamp is equal to or greater than the
	// lock's timestamp. If the read's timestamp is below the Intent lock's
	// timestamp then the two are compatible.
	//
	// [*] until the re-introduction of weaker isolation levels, all transactions in
	// CockroachDB used serializable isolation. Historically, CockroachDB did not
	// make a strength distinction between Exclusive locks and Intents. As such,
	// reads under concurrency control would conflict with Exclusive locks if the
	// read's timestamp was at or above the Exclusive lock's timestamp. Now that
	// there is such a distinction, non-locking reads from serializable transactions
	// do not block on Exclusive locks, regardless of their timestamp. However,
	// there is a desire to let users opt into the old behavior using the
	// ExclusiveLocksBlockNonLockingReads cluster setting. Note that this only
	// applies when both the non-locking read and Exclusive lock belong to
	// serializable transactions, as that's the only "old" behavior to speak of
	// here.

rfc: introduce shared locks RFC #101799

Are you sure you want to change the base?

rfc: introduce shared locks RFC #101799

Conversation

arulajmani commented Apr 18, 2023

cockroach-teamcity commented Apr 18, 2023

bdarnell left a comment

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment

arulajmani left a comment

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment