-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfc: introduce shared locks RFC #101799
base: master
Are you sure you want to change the base?
rfc: introduce shared locks RFC #101799
Conversation
SQL databases support multiple forms of row-level locks, which are locks held by transactions on individual rows. The most common forms of these locks are shared locks and exclusive locks. Shared locks are to reader locks as exclusive locks are to writer locks in a reader-writer mutex. This RFC proposes introducing SHARED key-level locks in the key-value layer and exposing them through the `SELECT FOR SHARE` SQL statement. This will enable the use of SHARED locks externally through `SELECT FOR SHARE` statements. It will also enable their use internally for operations such as foreign key checks under weaker isolation levels (see: Read Committed). Informs: cockroachdb#91545 Release note: None
e91e88b
to
04d7fd7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @arulajmani, @michae2, @nvanbenschoten, and @sumeerbhola)
docs/RFCS/20230209_shared_locks.md
line 111 at r1 (raw file):
### Lock re-acquisition
Lock upgrades are points at which deadlocks (and therefore retry errors) can occur. I see that you address deadlock detection below, but this is also a user education concern. When we have only a single strength of lock, you can avoid deadlocks by being careful about the order of lock acquisition to use a consistent order in all transactions. With upgradable locks of multiple strengths, you would also have to be careful to only acquire each lock once (at the highest strength required) to be assured that deadlocks would not occur.
docs/RFCS/20230209_shared_locks.md
line 250 at r1 (raw file):
with all timestamps. This means non-MVCC read latches conflict with writes at all timestamps, and non-MVCC write latches conflict with all concurrent reads and writes. The read latching behavior here might be a bit surprising – naively, one would not expect a read latch at timestamp 0 conflict
The reason for this "surprising" behavior is that it was designed with the assumption that there is an effective distinction between MVCC keys and non-MVCC keys. That is, in the beginning, there were no keys that mixed MVCC and non-MVCC operations. I'm not sure whether that has held true over the years or whether the suggestion of a non-MVCC read latch in the next paragraph would be the first time we'd mixed things in that way.
In any case, I think I agree with the plan to use the maximum timestamp instead of zero, but we also need to be careful with synthetic future timestamps - we don't want a timestamp in the future to be fed back into the HLC (fortunately the maximum possible timestamp would just cause a crash in this occurred, so there's no chance of it slipping by unnoticed)
docs/RFCS/20230209_shared_locks.md
line 260 at r1 (raw file):
more intuitive way to represent a read latch that conflicts with all writers. The discussion above begs a different question – does declaring latches as read/write operations
Not all reads get locks, so I don't immediately see how the locking system could replace the current read/write latching system. We still need read/write latches to allow reads at historical timestamps to proceed underneath a current exclusive lock, for example.
docs/RFCS/20230209_shared_locks.md
line 467 at r1 (raw file):
lock requester if we imagine a constant stream of SHARED lock requesters. We'll deviate from this behavior. If we find that there's a queue formed on a lock, i.e., not all
👍
I think this implementation detail should be mentioned somewhere in the main body and not just the appendix. I started to write up a comment about the possibility of starvation in the "lock table sequencing" section because the way the process was described there sounded vulnerable to starvation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @arulajmani, @michae2, and @nvanbenschoten)
docs/RFCS/20230209_shared_locks.md
line 112 at r2 (raw file):
[*] conflict behavior is configurable using a cluster setting; defaults to "conflicts" if the exclusive lock is held by a serializable transaction; defaults
What are the desired isolation semantics seen by a serializable transaction when mixing read-committed and serializable transactions?
(A pointer to literature is fine).
docs/RFCS/20230209_shared_locks.md
line 135 at r2 (raw file):
a transaction that acquires an `Exclusive` lock as part of a `SELECT FOR UPDATE`, and then proceeds to actually perform the UPDATE (and thus writes an `Intent`). One could also imagine a scenario where a transaction promotes a
Is an Intent always stronger than an Exclusive lock? Naively, it seems no, since an Intent will not conflict with a reader at a lower timestamp while an Exclusive lock will. Is the answer that this difference doesn't matter since the txn holding the exclusive lock will only monotonically increase its own timestamp, so the Intent serves as well in protecting the timestamp range in which it can commit?
docs/RFCS/20230209_shared_locks.md
line 206 at r2 (raw file):
// Invariant: only the lowest sequence number that has not been rolled back // at which a lock was acquired with a particular lock strength // is stored in the infos array.
Seems like this lowest sequence number scheme is an improvement/optimization over lockHolderInfo.seqs
which was tracking all the TxnSeqs instead of the lowest one.
docs/RFCS/20230209_shared_locks.md
line 264 at r2 (raw file):
Incoming requests are required to acquire latches over the keyspans they intend to touch at the request’s timestamp. As of the time of writing, two types of latches exist – reach latches and write latches. They each have different
nit: read latches
docs/RFCS/20230209_shared_locks.md
line 297 at r2 (raw file):
we will need to serialize requests from the same transaction trying to acquire shared locks, while still allowing different transactions to acquire shared locks concurrently. This can be done by acquiring a (write) latch on a local,
IIUC, a shared lock acquisition will acquire two latches: a read latch with max timestamp on the key itself, and a write latch on a synthetic key that is a function of the key being locked and the txnID.
docs/RFCS/20230209_shared_locks.md
line 371 at r2 (raw file):
liveness one above. The only difference is that instead of detecting a failed coordinator, the result of the `PushTxn` is that one of the two transactions that form the deadlock gets aborted.
The argument seems hand wavy.
Consider the case of transactions T1, T2, T3, T4 to start with, where we will create a deadlock cycle between T2 and T3.
T1, T2 hold a shared lock on key K1
T3, T4 hold a shared lock on key K2
T3 has an exclusive lock acquisition request waiting at K1 that selects T1 as the pushee
T2 has an exclusive lock acquisition request waiting at K2 that selects T4 as the pushee
We don't see the deadlock cycle yet. If we keep letting new shared lock acquisitions to happen while T2 and T3 are waiting (say transactions T5, T6, ...) and each time T2, T3 select that other transaction as the pushee they will wait forever.
docs/RFCS/20230209_shared_locks.md
line 434 at r2 (raw file):
reservation, the request must check the list of waiting locking requests. If the list is empty, or if the lowest sequence number from the list of waiters is higher than the request’s sequence number, it can acquire a reservation.
I think this is what prevents the earlier deadlock scenario I was trying to sketch.
I had written long comments about joint reservations and partial reservation breakage in lock_table.go, but I didn't try to do any proofs since we didn't need those features at that time. I am a bit worried that we will overlook something if we don't try to sketch out some proofs now.
Also, I can't remember why we had reservation breaking in the original lock_table.go -- was it because we were assuming nothing about latch acquisition above the lock table that would prevent two requests getting to the lock table that both wanted exclusive locks? It seems we still have an implicit assumption in that after lockTable.ScanAndEnqueue
returns a guard that says shouldWait
is false, we have reservations, but no longer concern ourselves with them getting broken and proceed to evaluation to actually place those locks. I wonder whether we should be making some assumptions that the lockTable exists in the context of latching, and use that to simplify some of the lockTable implementation.
I suspect I am forgetting something important.
docs/RFCS/20230209_shared_locks.md
line 487 at r2 (raw file):
concurrency when resolving durable shared locks on the same key will hurt; we may want to revisit latching, in the context of releasing shared locks, as we design that.
this seems important.
I am wondering whether this RFC should cover all the locking changes we have planned for this next release since they are related.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments @bdarnell and @sumeerbhola; flushing most (not all) replies to your comments.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @bdarnell, @michae2, @nvanbenschoten, and @sumeerbhola)
docs/RFCS/20230209_shared_locks.md
line 111 at r1 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
Lock upgrades are points at which deadlocks (and therefore retry errors) can occur. I see that you address deadlock detection below, but this is also a user education concern. When we have only a single strength of lock, you can avoid deadlocks by being careful about the order of lock acquisition to use a consistent order in all transactions. With upgradable locks of multiple strengths, you would also have to be careful to only acquire each lock once (at the highest strength required) to be assured that deadlocks would not occur.
That's a good point, especially considering we're not planning on adding support for Update
locks just yet.
docs/RFCS/20230209_shared_locks.md
line 260 at r1 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
Not all reads get locks, so I don't immediately see how the locking system could replace the current read/write latching system. We still need read/write latches to allow reads at historical timestamps to proceed underneath a current exclusive lock, for example.
You're right, we wouldn't be able to replace the current read/write latching system by just considering lock strengths. We would need other auxiliary information, such as the timestamp of the request as well.
We already need something like this for locking conflicts as well -- for example, we want to allow non-locking reads at lower timestamps to proceed to evaluation without waiting on intents at higher timestamps. We've defined a new concept, called lock.Modes
to capture such conflict resolution.
The thought here was that we could apply the same conflict resolution rules for latches like we do for locks (theoretically). It's not in our critical path, so this isn't something we plan on changing in the short term at least.
docs/RFCS/20230209_shared_locks.md
line 467 at r1 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
👍
I think this implementation detail should be mentioned somewhere in the main body and not just the appendix. I started to write up a comment about the possibility of starvation in the "lock table sequencing" section because the way the process was described there sounded vulnerable to starvation.
Yeah, I think both you and Sumeer had similar comments. I'll call this out more explicitly earlier in the text.
docs/RFCS/20230209_shared_locks.md
line 135 at r2 (raw file):
Is an Intent always stronger than an Exclusive lock? Naively, it seems no, since an Intent will not conflict with a reader at a lower timestamp while an Exclusive lock will.
An Intent is always stronger than Exclusive locks. Like you mention, an Intent will not conflict with a reader at a lower timestamp. However, an Exclusive lock will not conflict with any non-locking readers (even those at a higher timestamp)[*]
[*] This behaviour is configurable for serializable transactions using a cluster setting. Exclusive locks can be configured to conflict with non-locking readers at or above their the Exclusive locks timestamp (thus acting with the same strength as an Intent would). This is described in more detail around:
cockroach/pkg/kv/kvserver/concurrency/lock/locking.proto
Lines 153 to 183 in 09ec0ee
// +------------+---------+-----------+-----------+-------------+----------+ | |
// | | None | Shared | Update | Exclusive | Intent | | |
// +------------+---------+-----------+-----------+-------------+----------+ | |
// | None | | | | X^* | X^† | | |
// +------------+---------+-----------+-----------+-------------+----------+ | |
// | Shared | | | | X | X | | |
// +------------+---------+-----------+-----------+-------------+----------+ | |
// | Update | | | X | X | X | | |
// +------------+---------+-----------+-----------+-------------+----------+ | |
// | Exclusive | X^* | X | X | X | X | | |
// +------------+---------+-----------+-----------+-------------+----------+ | |
// | Intent | X^† | X | X | X | X | | |
// +------------+---------+-----------+-----------+-------------+----------+ | |
// | |
// [†] reads under optimistic concurrency control in CockroachDB only conflict | |
// with Intent locks if the read's timestamp is equal to or greater than the | |
// lock's timestamp. If the read's timestamp is below the Intent lock's | |
// timestamp then the two are compatible. | |
// | |
// [*] until the re-introduction of weaker isolation levels, all transactions in | |
// CockroachDB used serializable isolation. Historically, CockroachDB did not | |
// make a strength distinction between Exclusive locks and Intents. As such, | |
// reads under concurrency control would conflict with Exclusive locks if the | |
// read's timestamp was at or above the Exclusive lock's timestamp. Now that | |
// there is such a distinction, non-locking reads from serializable transactions | |
// do not block on Exclusive locks, regardless of their timestamp. However, | |
// there is a desire to let users opt into the old behavior using the | |
// ExclusiveLocksBlockNonLockingReads cluster setting. Note that this only | |
// applies when both the non-locking read and Exclusive lock belong to | |
// serializable transactions, as that's the only "old" behavior to speak of | |
// here. |
docs/RFCS/20230209_shared_locks.md
line 206 at r2 (raw file):
Previously, sumeerbhola wrote…
Seems like this lowest sequence number scheme is an improvement/optimization over
lockHolderInfo.seqs
which was tracking all the TxnSeqs instead of the lowest one.
One thing @nvanbenschoten and I realized since then (discussion captured in #102269 (comment)) is that we only need to do this tracking for in-memory locks, where the lock table is the source of truth for which locks are and are not held after savepoint rollbacks. It gets incredibly tricky to try and keep the replicated state in sync with what we have here, so we're going to change these datastructures a bit to store replicated/un-replicated locks separately. Tracking the lowest sequence number scheme will stay, albeit just for un-replicated locks.
docs/RFCS/20230209_shared_locks.md
line 297 at r2 (raw file):
Previously, sumeerbhola wrote…
IIUC, a shared lock acquisition will acquire two latches: a read latch with max timestamp on the key itself, and a write latch on a synthetic key that is a function of the key being locked and the txnID.
Yeah.
docs/RFCS/20230209_shared_locks.md
line 371 at r2 (raw file):
Previously, sumeerbhola wrote…
The argument seems hand wavy.
Consider the case of transactions T1, T2, T3, T4 to start with, where we will create a deadlock cycle between T2 and T3.
T1, T2 hold a shared lock on key K1
T3, T4 hold a shared lock on key K2T3 has an exclusive lock acquisition request waiting at K1 that selects T1 as the pushee
T2 has an exclusive lock acquisition request waiting at K2 that selects T4 as the pusheeWe don't see the deadlock cycle yet. If we keep letting new shared lock acquisitions to happen while T2 and T3 are waiting (say transactions T5, T6, ...) and each time T2, T3 select that other transaction as the pushee they will wait forever.
Yeah -- this argument relies on the assumption that the list of shared lock holders cannot grow unboundedly. Like you mentioned below, this is guaranteed given how we're deciding when a request can acquire a reservation (and thus eventually acquire a lock) and when it has to wait (even though it may not conflict with a held lock).
docs/RFCS/20230209_shared_locks.md
line 434 at r2 (raw file):
I had written long comments about joint reservations and partial reservation breakage in lock_table.go
I've referenced that comment multiple times and found it very useful, so thanks for writing it!
I wonder whether we should be making some assumptions that the lockTable exists in the context of latching, and use that to simplify some of the lockTable implementation.
I think one reason that we can't assume that requests in the lockTable always exist in the context of latching is that a request may have reserved a lock, discovered a conflict (lock, reservation it can't break) on a different key it is also trying to lock, and thus dropped its latches when queueing behind this conflicting lock/reservation. We would be susceptible to deadlock scenarios if we didn't do reservation breaking. I may be missing something, but I don't think we can avoid it.
It seems we still have an implicit assumption in that after
lockTable.ScanAndEnqueue
returns a guard that saysshouldWait
is false, we have reservations, but no longer concern ourselves with them getting broken and proceed to evaluation to actually place those locks.
I think the reason we can do that is because we're guaranteed to still be holding latches while evaluating, so the request can't have its reservation broken as it evaluates.
docs/RFCS/20230209_shared_locks.md
line 487 at r2 (raw file):
Previously, sumeerbhola wrote…
this seems important.
I am wondering whether this RFC should cover all the locking changes we have planned for this next release since they are related.
The plan is to (quickly) follow this RFC up with another one that talks about replicated locks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @arulajmani, @bdarnell, @michae2, and @nvanbenschoten)
docs/RFCS/20230209_shared_locks.md
line 135 at r2 (raw file):
[*] conflict behavior is configurable using a cluster setting; defaults to
"conflicts" if the exclusive lock is held by a serializable transaction; defaults
nit: the current text above does not say that this "at or above the Exclusive locks timestamp" part, which is why I was confused and thought that exclusive locks could be configured to conflict with readers at lower timestamps.
docs/RFCS/20230209_shared_locks.md
line 206 at r2 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
One thing @nvanbenschoten and I realized since then (discussion captured in #102269 (comment)) is that we only need to do this tracking for in-memory locks, where the lock table is the source of truth for which locks are and are not held after savepoint rollbacks. It gets incredibly tricky to try and keep the replicated state in sync with what we have here, so we're going to change these datastructures a bit to store replicated/un-replicated locks separately. Tracking the lowest sequence number scheme will stay, albeit just for un-replicated locks.
Thanks for the additional detail -- this makes sense.
docs/RFCS/20230209_shared_locks.md
line 434 at r2 (raw file):
I suspect I am forgetting something important.
I forgot the obvious fact that we release latches when waiting for locks :)
SQL databases support multiple forms of row-level locks, which are locks held by transactions on individual rows. The most common forms of these locks are shared locks and exclusive locks. Shared locks are to reader locks as exclusive locks are to writer locks in a reader-writer mutex.
This RFC proposes introducing SHARED key-level locks in the key-value layer and exposing them through the
SELECT FOR SHARE
SQL statement. This will enable the use of SHARED locks externally throughSELECT FOR SHARE
statements. It will also enable their use internally for operations such as foreign key checks under weaker isolation levels (see: Read Committed).Informs: #91545
Release note: None