-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: introduce LockTableIterator #110319
storage: introduce LockTableIterator #110319
Conversation
Informs cockroachdb#100193. This commit introduces a new form of iterator, the LockTableIterator, which is an EngineIterator that iterates over locks in the lock table keyspace. It performs no translation of input or output keys or values, so it is used like a normal EngineIterator, with the limitation that it can only be used to iterate over the lock table keyspace. The benefit of using a LockTableIterator is that it performs filtering of the locks in the lock table, only returning locks that match the configured filtering criteria and transparently skipping past locks that do not. The filtering criteria is expressed as a logical disjunction of two configuration parameters, at least one of which must be set: - MatchTxnID: if set, the iterator return locks held by this transaction. - MatchMinStr: if set, the iterator returns locks held by any transaction with this strength or stronger. Expressed abstractly as a SQL query, the filtering criteria is: SELECT * FROM lock_table WHERE (MatchTxnID != 0 AND txn_id = MatchTxnID) OR (MatchMinStr != 0 AND strength >= MatchMinStr) Pushing this filtering logic into the iterator is a convenience for its users. It also allows the iterator to use its knowledge of the lock table keyspace structure to efficiently skip past locks that do not match the filtering criteria. It does this by seeking past many ignored locks when appropriate to avoid cases of O(ignored_locks) work, instead performing at most O(matching_locks + locked_keys) work. A common case where this matters is with shared locks. If the iterator is configured to ignore shared locks, a single key with a large number of shared locks can be skipped over with a single seek. Avoiding this unnecessary work is essential to avoiding quadratic behavior during shared lock acquisition and release. The LockTableIterator will be used by `intentInterleavingIter`, `MVCCCheckLock`, `MVCCAcquireLock`, `mvccPutUsingIter`, `MVCCResolveWriteIntent`, `MVCCResolveWriteIntentRange`, `computeMinIntentTimestamp`, `SeparatedIntentScanner`, and more. Release note: None
…marks All intents are now separated. Epic: None Release note: None
Informs cockroachdb#100193. This commit updates the intentInterleavingIter to ignore locks in the lock table keyspace with strengths other than lock.Intent (i.e. shared and exclusive locks). Future versions of the iterator may expose information to users about whether any non-intent locks were observed and, if so, which keys they were found on. For now, no such information is exposed. This change is needed for replicated locks. Non-locking reads will not interact with these locks at all, so we want the intentInterleavingIter to ignore them. Meanwhile, locking reads and writes will use a different abstraction to peek into the lock table and look for conflicts. Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't look at the tests, but interface and integration with intentIterleavingIter
looks good.
Reviewed 3 of 9 files at r1, 1 of 1 files at r2, 1 of 5 files at r3, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @nvanbenschoten)
pkg/storage/lock_table_iterator.go
line 61 at r1 (raw file):
// and release. type LockTableIterator struct { iter EngineIterator
There was some commentary in intentInterleavingIter
about avoiding dynamic dispatch and using *pebbleIterator
. Consider doing that here.
pkg/storage/lock_table_iterator.go
line 139 at r1 (raw file):
// SeekEngineKeyGE implements the EngineIterator interface. func (i *LockTableIterator) SeekEngineKeyGE(key EngineKey) (valid bool, err error) { if err := checkLockTableKey(key.Key); err != nil {
Is checkLockTableKey
cheap enough, or should be hidden behind invariants.Enabled
?
pkg/storage/lock_table_iterator.go
line 252 at r1 (raw file):
) (state pebble.IterValidityState, err error) { for { engineKey, err := i.iter.UnsafeEngineKey()
nit: can this use i.LockTableKeyVersion()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @sumeerbhola)
pkg/storage/lock_table_iterator.go
line 61 at r1 (raw file):
Previously, sumeerbhola wrote…
There was some commentary in
intentInterleavingIter
about avoiding dynamic dispatch and using*pebbleIterator
. Consider doing that here.
I hoped to do that, but unfortunately, this won't always be a *pebbleIterator
. There will be cases where it's a spanset.EngineIterator
, when constructed from a spanset.spanSetReader
. We don't have this problem in newIntentInterleavingIterator
because iterators are constructed from a Reader
below spanset.spanSetReader
in the Reader.NewMVCCIterator
hierarchy.
Dynamic dispatch on the intentIter
(as opposed to the mvcc iter
) doesn't seem to make much of a difference on the benchmarks we care about, though:
➜ benchstat full_static.txt full_dynamic.txt
name old time/op new time/op delta
MVCCScan_Pebble/rows=100/versions=1/valueSize=64/numRangeKeys=0-10 25.8µs ± 3% 26.2µs ± 3% +1.72% (p=0.000 n=49+44)
name old speed new speed delta
MVCCScan_Pebble/rows=100/versions=1/valueSize=64/numRangeKeys=0-10 248MB/s ± 3% 244MB/s ± 3% -1.76% (p=0.000 n=49+45)
➜ benchstat full_static.txt static_mvcc_dynamic_intent.txt
name old time/op new time/op delta
MVCCScan_Pebble/rows=100/versions=1/valueSize=64/numRangeKeys=0-10 25.8µs ± 3% 25.8µs ± 2% ~ (p=0.163 n=49+45)
name old speed new speed delta
MVCCScan_Pebble/rows=100/versions=1/valueSize=64/numRangeKeys=0-10 248MB/s ± 3% 248MB/s ± 2% ~ (p=0.163 n=49+45)
That's because these benchmarks we were concerned about when making this optimization don't even write any intents. However, I also don't see a difference with IntentInterleavingSeekGEAndIter/separated=true/version=1/intentStride=1/keyLen=10/seekStride=1
, which should exercise the intentIter
more than any other benchmark we have. So I'll leave this as is.
pkg/storage/lock_table_iterator.go
line 139 at r1 (raw file):
Previously, sumeerbhola wrote…
Is
checkLockTableKey
cheap enough, or should be hidden behindinvariants.Enabled
?
It's inlined all the way down through the call to bytes.HasPrefix
and the underlying bytes.Equal
of two bytes, so it's just a few branches that should play well with a branch predictor.
pkg/storage/lock_table_iterator.go
line 252 at r1 (raw file):
Previously, sumeerbhola wrote…
nit: can this use
i.LockTableKeyVersion()
It can right now, but to address the TODO below, I'll need access to the EngineKey.Key
. I'll leave this as is to avoid the churn when addressing that TODO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 9 files at r1, 4 of 5 files at r3.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @arulajmani and @nvanbenschoten)
pkg/storage/lock_table_iterator_test.go
line 580 at r3 (raw file):
// TestLockTableIteratorEquivalence is a quickcheck test that verifies that a // LockTableIterator is equivalent to a raw engine iterator with non-matching // lock keys filtered before writing to the engine.
nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 9 of 9 files at r1, 1 of 1 files at r2, 5 of 5 files at r3, all commit messages.
Reviewable status:complete! 2 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)
pkg/storage/lock_table_iterator.go
line 332 at r1 (raw file):
// LockTableKeyVersion returns the strength and txn ID from the version of the // current key. func (i *LockTableIterator) LockTableKeyVersion() (lock.Strength, uuid.UUID, error) {
My IDE is telling me this isn't used anywhere -- is that because one of the subsequent commits makes use of this?
pkg/storage/lock_table_iterator_test.go
line 160 at r1 (raw file):
} // TestLockTableIterator is a datadriven test consisting of two commands:
nit: should we move this test to the top of the file?
pkg/storage/lock_table_iterator_test.go
line 580 at r3 (raw file):
Previously, sumeerbhola wrote…
nice
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs!
bors r=sumeerbhola,arulajmani
Reviewable status:
complete! 2 of 0 LGTMs obtained (waiting on @arulajmani and @sumeerbhola)
pkg/storage/lock_table_iterator.go
line 332 at r1 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
My IDE is telling me this isn't used anywhere -- is that because one of the subsequent commits makes use of this?
Yes, a few of the subsequent PRs make use of this. It's not tripping up the linter, so I'll just leave it as is for now.
pkg/storage/lock_table_iterator_test.go
line 160 at r1 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
nit: should we move this test to the top of the file?
I was following the example set by TestIntentInterleavingIter
in intent_interleaving_iter_test.go
. Since this is the only comment, I'll leave as is to avoid a round of CI.
Build succeeded: |
Informs cockroachdb#100193. This commit addresses a TODO left by cockroachdb#110319 to implement an "iter before seek" optimization in the LockTableIterator, similar to the one that exists in the pebbleMVCCScanner. The optimization places an upper bound on the number of iterations that a LockTableIterator that is configured to ignore some or all shared locks will perform across the shared locks on a single user key before seeking past them. This is used to avoid iterating over all shared locks on a key when not necessary. The optimization achieves the goal of avoiding cases of O(ignored_locks) work in the LockTableIterator, instead performing at most O(matching_locks + locked_keys) work. This is important for iteration over the lock table (e.g. intentInterleavingIter), lock acquisition (MVCCAcquireLock), and lock release (mvccReleaseLockInternal). There is a caveat to these complexity bounds, however, in that they do not consider LSM tombstones. This is being discussed in cockroachdb#110324. Release note: None
Informs cockroachdb#100193. This commit addresses a TODO left by cockroachdb#110319 to implement an "iter before seek" optimization in the LockTableIterator, similar to the one that exists in the pebbleMVCCScanner. The optimization places an upper bound on the number of iterations that a LockTableIterator that is configured to ignore some or all shared locks will perform across the shared locks on a single user key before seeking past them. This is used to avoid iterating over all shared locks on a key when not necessary. The optimization achieves the goal of avoiding cases of O(ignored_locks) work in the LockTableIterator, instead performing at most O(matching_locks + locked_keys) work. This is important for iteration over the lock table (e.g. intentInterleavingIter), lock acquisition (MVCCAcquireLock), and lock release (mvccReleaseLockInternal). There is a caveat to these complexity bounds, however, in that they do not consider LSM tombstones. This is being discussed in cockroachdb#110324. Release note: None
Informs cockroachdb#100193. This commit addresses a TODO left by cockroachdb#110319 to implement an "iter before seek" optimization in the LockTableIterator, similar to the one that exists in the pebbleMVCCScanner. The optimization places an upper bound on the number of iterations that a LockTableIterator that is configured to ignore some or all shared locks will perform across the shared locks on a single user key before seeking past them. This is used to avoid iterating over all shared locks on a key when not necessary. The optimization achieves the goal of avoiding cases of O(ignored_locks) work in the LockTableIterator, instead performing at most O(matching_locks + locked_keys) work. This is important for iteration over the lock table (e.g. intentInterleavingIter), lock acquisition (MVCCAcquireLock), and lock release (mvccReleaseLockInternal). There is a caveat to these complexity bounds, however, in that they do not consider LSM tombstones. This is being discussed in cockroachdb#110324. Release note: None
110754: storage: implement iter-before-seek optimization for LockTableIterator r=nvanbenschoten a=nvanbenschoten Informs #100193. This commit addresses a TODO left by #110319 to implement an "iter before seek" optimization in the LockTableIterator, similar to the one that exists in the pebbleMVCCScanner. The optimization places an upper bound on the number of iterations that a LockTableIterator that is configured to ignore some or all shared locks will perform across the shared locks on a single user key before seeking past them. This is used to avoid iterating over all shared locks on a key when not necessary. The optimization achieves the goal of avoiding cases of O(ignored_locks) work in the LockTableIterator, instead performing at most O(matching_locks + locked_keys) work. This is important for iteration over the lock table (e.g. intentInterleavingIter), lock acquisition (MVCCAcquireLock), and lock release (mvccReleaseLockInternal). There is a caveat to these complexity bounds, however, in that they do not consider LSM tombstones. This is being discussed in #110324. Release note: None 111126: pkg/util/log: introduce new metrics to the logging package r=knz a=abarganier See individual commit messages to review. This PR introduces two new metrics to the logging package: - `log.messages.count` - `log.buffered.messages.dropped` **log.messages.count**: This metric measures the count of messages logged on the node since startup. Note that this does not measure the fan-out of single log messages to the various configured logging sinks. This metric can be helpful in understanding log rates and volumes. **log.buffered.messages.dropped**: Buffered network logging sinks have a `max-buffer-size` attribute, which determines, in bytes, how many log messages can be buffered. Any `fluent-server` or `http-server` log sink that makes use of a `buffering` attribute in its configuration (enabled by default) qualifies as a buffered network logging sink. If this buffer becomes full, and an additional log message is sent to the buffered log sing, the buffer would exceed this `max-buffer-size`. Therefore, the buffered log sink drops older messages in the buffer to handle, in order to make room for the new. This PR also renames the metric `fluent.sink.conn.errors` to `log.fluent.sink.conn.errors`, for consistency. Fixes: #72453 ---- Release note (ops change): This patch sets the Metric Type on the metric `log.fluent.sink.conn.errors`. Previously, the Metric Type was incorrectly left unset. Note that this is simply an update to the metric's metadata. The behavior and purpose of the metric remains unchanged. ---- Release note (ops change): This patch introduces the metric, `log.messages.count`. This metric measures the count of messages logged on the node since startup. Note that this does not measure the fan-out of single log messages to the various configured logging sinks. This metric can be helpful in understanding log rates and volumes. ---- Release note (ops change): This patch introduces a new metric, `log.buffered.messages.dropped`. Buffered network logging sinks have a `max-buffer-size` attribute, which determines, in bytes, how many log messages can be buffered. Any `fluent-server` or `http-server` log sink that makes use of a `buffering` attribute in its configuration (enabled by default) qualifies as a buffered network logging sink. If this buffer becomes full, and an additional log message is sent to the buffered log sing, the buffer would exceed this `max-buffer-size`. Therefore, the buffered log sink drops older messages in the buffer to handle, in order to make room for the new. `log.buffered.messages.dropped` counts the number of messages dropped from the buffer. Note that the count is shared across all buffered logging sinks. 111603: server: always START SERVICE SHARED in testserver.StartSharedProcessTenant() r=knz a=msbutler Previously, StartSharedProcessTenant() would hang if it were run on a tenant that was created by a replication stream. This patch fixes this bug by ensuring `ALTER TENANT $1 START SERVICE SHARED` is run even if the tenant was already created. Epic: none Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Alex Barganier <[email protected]> Co-authored-by: Michael Butler <[email protected]>
Informs cockroachdb#100193. This commit addresses a TODO left by cockroachdb#110319 to implement an "iter before seek" optimization in the LockTableIterator, similar to the one that exists in the pebbleMVCCScanner. The optimization places an upper bound on the number of iterations that a LockTableIterator that is configured to ignore some or all shared locks will perform across the shared locks on a single user key before seeking past them. This is used to avoid iterating over all shared locks on a key when not necessary. The optimization achieves the goal of avoiding cases of O(ignored_locks) work in the LockTableIterator, instead performing at most O(matching_locks + locked_keys) work. This is important for iteration over the lock table (e.g. intentInterleavingIter), lock acquisition (MVCCAcquireLock), and lock release (mvccReleaseLockInternal). There is a caveat to these complexity bounds, however, in that they do not consider LSM tombstones. This is being discussed in cockroachdb#110324. Release note: None
Informs #100193.
Closes #109644.
Closes #109570.
This PR introduces a new form of iterator, the
LockTableIterator
, which is anEngineIterator
that iterates over locks in the lock table keyspace. It performs no translation of input or output keys or values, so it is used like a normalEngineIterator
, with the limitation that it can only be used to iterate over the lock table keyspace.The benefit of using a
LockTableIterator
is that it performs filtering of the locks in the lock table, only returning locks that match the configured filtering criteria and transparently skipping past locks that do not. The filtering criteria is expressed as a logical disjunction of two configuration parameters, at least one of which must be set:MatchTxnID: if set, the iterator return locks held by this transaction.
MatchMinStr: if set, the iterator returns locks held by any transaction with this strength or stronger.
Expressed abstractly as a SQL query, the filtering criteria is:
Pushing this filtering logic into the iterator is a convenience for its users. It also allows the iterator to use its knowledge of the lock table keyspace structure to efficiently skip past locks that do not match the filtering criteria. It does this by seeking past many ignored locks when appropriate to avoid cases of O(ignored_locks) work, instead performing at most O(matching_locks + locked_keys) work.
A common case where this matters is with shared locks. If the iterator is configured to ignore shared locks, a single key with a large number of shared locks can be skipped over with a single seek. Avoiding this unnecessary work is essential to avoiding quadratic behavior during shared lock acquisition and release.
The LockTableIterator will be used by
intentInterleavingIter
,MVCCCheckLock
,MVCCAcquireLock
,mvccPutUsingIter
,MVCCResolveWriteIntent
,MVCCResolveWriteIntentRange
,computeMinIntentTimestamp
,SeparatedIntentScanner
, and more.The PR introduces the abstraction of a
LockTableIterator
, but doesn't yet implement the logic to avoid O(ignored_locks) work in the iterator. This is left as a TODO which will be addressed in the next few weeks, but doesn't need to block this PR or use of theLockTableIterator
.The PR then copies over the 2 commits from #109570, which ignore non-intent locks in
intentInterleavingIter
. This could have been done in a separate PR, but it felt useful to demonstrate how much simpler and cleaner #109644 gets with this new abstraction. I can extract it back out if reviewers would like.Release note: None