kvstreamer: improve the stability #82160

yuzefovich · 2022-05-31T16:59:09Z

Currently, the usage of the streamer library seems to make things less stable memory-wise than before - which goes against the goal of the library. We should achieve higher number in tpch_concurrency roachtest with the streamer enabled. I think the root cause is that the streamer creates many more objects in general (increasing pressure on the GC), and large slices are not accounted for yet.

We should:

remove allocations as much as possible
make sure all things are accounted for
~~consider making Result an interface so that depending on Hints and Mode we could have smaller-size implementations.~~

Jira issue: CRDB-16320

The text was updated successfully, but these errors were encountered:

81830: kv: only scan intents span for QueryIntent request r=nvanbenschoten a=andrewbaptist Change QueryIntent to only read from the lock table. Previously the request required merging the read from the MVC iterator with the lock table. This should improve performance. Also, add unit testing for QueryIntent Fixes #69716 Release note: None 81981: sql: add logictest config thats disables locality optimized search r=rharding6373 a=rharding6373 This PR increases test coverage by disabling locality optimized search for some tests. Locality optimized search is enabled by default. Fixes: #73633 Release note: None 82327: roachtest: add no-streamer config to tpch_concurrency r=yuzefovich a=yuzefovich This commit adds another configuration to `tpch_concurrency` roachtest which disables the streamer. My goal is to compare the concurrency numbers with the streamer on and off while I'm working on stabilizing the streamer. Informs: #82160. Release note: None 82330: workload/tpch: augment Q15 with streamID for concurrency r=yuzefovich a=yuzefovich According to the TPC-H spec, Q15 can be augmented with the streamID identifier so that Q15 could run with concurrency, so this commit implements that. This ability is now used by the `tpch_concurrency` roachtest in order to remove the number of retries that were occurring when trying to create and drop the view named `revenue0`. Fixes: #79870. Release note: None Co-authored-by: Andrew Baptist <[email protected]> Co-authored-by: rharding6373 <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]>

82422: kvstreamer: clean up Result struct r=yuzefovich a=yuzefovich This commit cleans up the `Result` struct in order to reduce its memory size, bringing it down into 48 byte size class. ``` name old time/op new time/op delta IndexJoin/Cockroach-24 6.29ms ± 1% 6.22ms ± 2% -1.06% (p=0.024 n=9+9) IndexJoin/MultinodeCockroach-24 7.99ms ± 1% 7.93ms ± 2% ~ (p=0.165 n=10+10) name old alloc/op new alloc/op delta IndexJoin/Cockroach-24 1.64MB ± 1% 1.48MB ± 0% -9.25% (p=0.000 n=9+10) IndexJoin/MultinodeCockroach-24 2.37MB ± 1% 2.20MB ± 1% -7.06% (p=0.000 n=10+10) name old allocs/op new allocs/op delta IndexJoin/Cockroach-24 8.15k ± 1% 7.15k ± 1% -12.28% (p=0.000 n=8+10) IndexJoin/MultinodeCockroach-24 12.7k ± 1% 11.7k ± 1% -8.18% (p=0.000 n=10+10) ``` The main change of this commit is the removal of the concept of "enqueue keys" from the Streamer API in favor of relying on `Result.Position`. When requests are unique, then a single `Result` can only satisfy a single enqueue key; however, for non-unique requests a single `Result` can satisfy multiple requests and, thus, can have multiple enqueue keys. At the moment, only unique requests are supported though. Once non-unique requests are supported too, we'll need to figure out how to handle those (maybe we'll be returning a `Result` `N` number of times if it satisfies `N` original requests with different values for `Position`). Also, initially multiple "enqueue keys" support was envisioned for the case of `multiSpanGenerator` in the lookup joins (i.e. multi-equality lookup joins); however, I believe we should push that complexity out of the streamer (into `TxnKVStreamer`) which is what this commit does. Other changes done in this commit: - unexport `ScanResp.Complete` field since this is currently only used within the `kvstreamer` package - reorder all existing fields so that the footprint of the struct is minimized (in particular, `scanComplete` field is moved to the bottom and `ScanResp` anonymous struct is removed) - make `subRequestIdx` `int32` rather than `int`. This value is bound by the number of ranges in the cluster, so max int32 is more than sufficient. Addresses: #82159. Addresses: #82160. Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]>

yuzefovich · 2022-06-23T16:38:39Z

With 10 runs of tpch_concurrency test (on master with #82384, #82387, #82865, #83010 cherry-picked) for the streamer OFF I got 78.5 and for the streamer ON I got 84.9, on average.

78085: storage: add `MVCCStats` for range keys r=jbowens a=erikgrinaker **storage: export some MVCC key encoding functions** Release note: None **roachpb: add `Key.Prevish()` to find a previous key** This patch adds `Key.Prevish()`, which returns a previous key in lexicographical sort order. This is needed to expand a latch span leftwards to peek at any left-adjacent range keys. It is impossible to find the exact immediate predecessor of a key, because it can have an infinite number of `0xff` bytes at the end, so this returns the nearest previous key right-padded with `0xff` up to the given length. It is still possible for an infinite number of keys to exist between `Key` and `Key.Prevish()`, as keys have unbounded length. Release note: None **storage: add `MVCCStats` for range keys** This patch adds `MVCCStats` tracking for range keys. Four new fields are added to `MVCCStats`: * `RangeKeyCount`: the number of (fragmented) range keys, not counting historical versions. * `RangeKeyBytes`: the logical encoded byte size of all range keys. The latest version contributes the encoded key bounds, and all versions contribute encoded timestamps. Unlike point keys, which for historical reasons use a fixed-size timestamp contribution, this uses the actual variable-length timestamp size. * `RangeValCount`: the number of (fragmented) range key versions. * `RangeValBytes`: the encoded size of all range key values across all versions. The same value can be stored in multiple range keys due to fragmentation, which will be counted separately. Even though all range keys are currently MVCC range tombstones with no value, the `MVCCValueHeader` contribution can be non-zero due to e.g. a local timestamp. `ComputeStatsForRange()` has been extended to calculate the above quantities, and additionally account for range tombstones themselves in `GCBytesAge` along with their effect on point keys. All relevant call sites have been updated to surface range keys for the MVCC iterators passed to `ComputeStatsForRange()`. Most MVCC operations have been updated to correctly account for MVCC range tombstones, e.g. during point key writes and intent resolution. KV APIs are not yet updated, this will be addressed later. Range key stats are also adjusted during range splits and merges, which will split and merge any range keys that straddle the split key. This requires a single range key seek to the left and right of the split key during these operations. Touches #70412. Release note: None 82384: sql: reuse the slice of RequestUnion objects between fetches r=yuzefovich a=yuzefovich This commit teaches `txnKVFetcher` and `txnKVStreamer` to reuse the same slice of `RequestUnion` objects between different fetches. It is now extremely easy to do given the recent refactor. We do perform memory accounting for this slice (against a memory account bound to an unlimited memory monitor). Additionally, a similar optimization is applied to how resume requests are populated by the Streamer. Addresses: #82160. Release note: None Co-authored-by: Erik Grinaker <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]>

82384: sql: reuse the slice of RequestUnion objects between fetches r=yuzefovich a=yuzefovich This commit teaches `txnKVFetcher` and `txnKVStreamer` to reuse the same slice of `RequestUnion` objects between different fetches. It is now extremely easy to do given the recent refactor. We do perform memory accounting for this slice (against a memory account bound to an unlimited memory monitor). Additionally, a similar optimization is applied to how resume requests are populated by the Streamer. Addresses: #82160. Release note: None 83480: vendor: bump Pebble to 85bb1c759894 r=jbowens a=nicktrav ``` 85bb1c75 rangekey: create `rangekey` package 0d272ec1 db: fix bug in deletion estimate computations ``` Release note: None. Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Nick Travers <[email protected]>

82387: kvstreamer: remove temporary allocations for []Result r=yuzefovich a=yuzefovich **kvstreamer: refactor the loop of processing the batch response** This commit refactors the code which processes the batch response in order to separate out two different concerns: - processing non-empty responses in order to create `Result`s - processing incomplete responses to populate the resume request. Now each of these "concerns" is handled in a separate loop making it easier to reason about, especially so in the following commit. This commit also extracts out multiple return arguments of a function into a struct as well as updates some of the comments and moves some of the error-checking to an earlier stage. Release note: None **kvstreamer: remove temporary allocations for []Result** Previously, the worker goroutine would accumulate all `Result`s it can create based on the KV response into a slice, and then the slice would be passed into the results buffer. At that point, the slice would be discarded since the results buffer would copy all `Result`s into its internal state. This commit refactors the streamer as well as the results buffer to avoid this temporary allocation of `[]Result`. The main idea is that `Result`s are now passed one-by-one into the results buffer. The worker goroutine now acquires the results buffer's mutex, processes the KV responses one at a time, and whenever a `Result` is created, it is added into the results buffer right away. However, in order to prevent the results buffer from eagerly returning a single `Result` on `GetResults` call, the streamer's user goroutine won't be woken up, until a newly introduced `doneAddingLocked` method is called by the worker goroutine. Some care needs to be taken to prevent deadlocks with all of the mutexes. Now, since we're finalizing the results one at a time, we might need to hold the streamer's mutex (so that we can set `scanComplete` correctly), and that mutex must be acquired before the results buffer's one. This change shows a modest improvement on the microbenchmarks but is a lot more important on analytical, TPCH-like queries, where this `[]Result` is one of the largest sources of garbage. ``` name old time/op new time/op delta IndexJoin/Cockroach-24 5.98ms ± 1% 5.95ms ± 1% ~ (p=0.079 n=9+10) IndexJoin/MultinodeCockroach-24 7.55ms ± 1% 7.59ms ± 1% +0.47% (p=0.015 n=8+9) IndexJoinColumnFamilies/Cockroach-24 8.68ms ± 3% 8.56ms ± 2% ~ (p=0.133 n=9+10) IndexJoinColumnFamilies/MultinodeCockroach-24 11.8ms ± 5% 11.7ms ± 3% ~ (p=0.315 n=10+10) LookupJoinEqColsAreKeyNoOrdering/Cockroach-24 6.67ms ± 1% 6.69ms ± 1% ~ (p=0.315 n=10+9) LookupJoinEqColsAreKeyNoOrdering/MultinodeCockroach-24 7.87ms ± 1% 7.92ms ± 1% +0.73% (p=0.015 n=10+10) LookupJoinEqColsAreKeyOrdering/Cockroach-24 9.30ms ± 2% 9.31ms ± 4% ~ (p=0.796 n=10+10) LookupJoinEqColsAreKeyOrdering/MultinodeCockroach-24 10.9ms ± 4% 10.9ms ± 2% ~ (p=0.971 n=10+10) LookupJoinNoOrdering/Cockroach-24 8.99ms ± 1% 9.03ms ± 4% ~ (p=0.549 n=9+10) LookupJoinNoOrdering/MultinodeCockroach-24 12.1ms ± 4% 11.9ms ± 6% ~ (p=0.143 n=10+10) LookupJoinOrdering/Cockroach-24 10.9ms ± 3% 10.8ms ± 3% ~ (p=0.243 n=10+9) LookupJoinOrdering/MultinodeCockroach-24 14.2ms ± 5% 13.9ms ± 3% ~ (p=0.113 n=10+9) name old alloc/op new alloc/op delta IndexJoin/Cockroach-24 1.36MB ± 1% 1.31MB ± 0% -3.61% (p=0.000 n=10+9) IndexJoin/MultinodeCockroach-24 2.07MB ± 2% 2.04MB ± 3% ~ (p=0.063 n=10+10) IndexJoinColumnFamilies/Cockroach-24 1.43MB ± 1% 1.38MB ± 0% -3.56% (p=0.000 n=9+9) IndexJoinColumnFamilies/MultinodeCockroach-24 2.27MB ± 1% 2.22MB ± 2% -2.09% (p=0.000 n=8+10) LookupJoinEqColsAreKeyNoOrdering/Cockroach-24 1.71MB ± 0% 1.67MB ± 0% -2.70% (p=0.000 n=9+10) LookupJoinEqColsAreKeyNoOrdering/MultinodeCockroach-24 2.43MB ± 5% 2.35MB ± 1% -3.31% (p=0.000 n=10+10) LookupJoinEqColsAreKeyOrdering/Cockroach-24 1.72MB ± 1% 1.62MB ± 1% -6.20% (p=0.000 n=10+10) LookupJoinEqColsAreKeyOrdering/MultinodeCockroach-24 2.39MB ± 2% 2.30MB ± 3% -3.53% (p=0.000 n=10+10) LookupJoinNoOrdering/Cockroach-24 1.79MB ± 1% 1.74MB ± 1% -2.80% (p=0.000 n=10+9) LookupJoinNoOrdering/MultinodeCockroach-24 2.35MB ± 3% 2.32MB ± 2% ~ (p=0.079 n=10+9) LookupJoinOrdering/Cockroach-24 1.63MB ± 1% 1.53MB ± 1% -5.77% (p=0.000 n=10+10) LookupJoinOrdering/MultinodeCockroach-24 2.30MB ± 4% 2.23MB ± 2% -3.41% (p=0.002 n=9+8) name old allocs/op new allocs/op delta IndexJoin/Cockroach-24 7.15k ± 1% 7.16k ± 1% ~ (p=0.888 n=10+9) IndexJoin/MultinodeCockroach-24 11.9k ± 2% 11.9k ± 2% ~ (p=0.968 n=10+9) IndexJoinColumnFamilies/Cockroach-24 11.9k ± 0% 11.9k ± 0% ~ (p=0.075 n=9+10) IndexJoinColumnFamilies/MultinodeCockroach-24 17.6k ± 1% 17.5k ± 1% ~ (p=0.566 n=10+10) LookupJoinEqColsAreKeyNoOrdering/Cockroach-24 9.86k ± 1% 9.88k ± 1% ~ (p=0.150 n=9+10) LookupJoinEqColsAreKeyNoOrdering/MultinodeCockroach-24 14.1k ± 0% 14.1k ± 1% ~ (p=0.055 n=8+10) LookupJoinEqColsAreKeyOrdering/Cockroach-24 12.6k ± 1% 12.5k ± 1% -0.77% (p=0.005 n=10+10) LookupJoinEqColsAreKeyOrdering/MultinodeCockroach-24 17.2k ± 1% 17.0k ± 0% -0.88% (p=0.000 n=10+8) LookupJoinNoOrdering/Cockroach-24 12.3k ± 1% 12.3k ± 1% ~ (p=0.929 n=10+10) LookupJoinNoOrdering/MultinodeCockroach-24 16.8k ± 1% 16.8k ± 1% ~ (p=0.968 n=9+10) LookupJoinOrdering/Cockroach-24 14.5k ± 1% 14.5k ± 1% ~ (p=0.271 n=10+10) LookupJoinOrdering/MultinodeCockroach-24 19.4k ± 1% 19.3k ± 1% ~ (p=0.056 n=9+8) ``` Addresses: #82160. Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]>

83197: kvserver: bump the in-memory GC threshold as a pre-apply side effect r=aayushshah15 a=aayushshah15 This commit changes where the in-memory GC threshold on a Replica is bumped during the application of a GC request. Previously, the in-memory GC threshold was bumped as a post-apply side effect. Additionally, GC requests do not acquire latches at the timestamp that they are garbage collecting, and thus, readers need to take additional care to ensure that results aren't served off of a partial state. Readers today rely on the invariant that the in-memory GC threshold is bumped before the actual garbage collection. This today is true because the mvccGCQueue issues GC requests in 2 phases: the first that simply bumps the in-memory GC threshold, and the second one that performs the actual garbage collection. If the in-memory GC threshold were bumped in the pre-apply phase of command application, this usage quirk wouldn't need to exist. That's what this commit does. Relates to #55293 Release note: None 83659: kvstreamer: optimize the results buffers a bit r=yuzefovich a=yuzefovich **kvstreamer: make InOrder-mode structs implement heap logic on their own** This commit refactors the `inOrderRequestsProvider` as well as the `inOrderResultsBuffer` to make them maintain the min heap over the requests and results, respectively, on their own. This allows us to avoid allocations for `interface{}` objects that occur when using `heap.Interface`. The code was copied, with minor adjustments, from the standard library. ``` name old time/op new time/op delta LookupJoinEqColsAreKeyOrdering/Cockroach-24 8.83ms ± 3% 8.58ms ± 3% -2.82% (p=0.002 n=10+10) LookupJoinEqColsAreKeyOrdering/MultinodeCockroach-24 10.3ms ± 1% 10.1ms ± 2% -1.86% (p=0.009 n=10+10) LookupJoinOrdering/Cockroach-24 10.7ms ± 5% 10.5ms ± 3% ~ (p=0.123 n=10+10) LookupJoinOrdering/MultinodeCockroach-24 14.0ms ± 5% 13.6ms ± 5% -3.16% (p=0.043 n=10+10) name old alloc/op new alloc/op delta LookupJoinEqColsAreKeyOrdering/Cockroach-24 1.77MB ± 2% 1.59MB ± 1% -9.94% (p=0.000 n=10+10) LookupJoinEqColsAreKeyOrdering/MultinodeCockroach-24 2.57MB ± 3% 2.38MB ± 1% -7.57% (p=0.000 n=10+10) LookupJoinOrdering/Cockroach-24 1.67MB ± 0% 1.50MB ± 1% -10.08% (p=0.000 n=9+9) LookupJoinOrdering/MultinodeCockroach-24 2.53MB ± 2% 2.34MB ± 2% -7.78% (p=0.000 n=10+9) name old allocs/op new allocs/op delta LookupJoinEqColsAreKeyOrdering/Cockroach-24 12.5k ± 1% 10.4k ± 1% -16.54% (p=0.000 n=9+10) LookupJoinEqColsAreKeyOrdering/MultinodeCockroach-24 17.6k ± 0% 15.5k ± 0% -11.64% (p=0.000 n=8+10) LookupJoinOrdering/Cockroach-24 14.4k ± 1% 12.4k ± 1% -14.17% (p=0.000 n=9+9) LookupJoinOrdering/MultinodeCockroach-24 20.6k ± 1% 18.5k ± 1% -10.19% (p=0.000 n=10+9) ``` Addresses: #82159. Release note: None **kvstreamer: refactor the OutOfOrder requests provider a bit** This commit refactors the `requestsProvider` interface renaming a couple of methods - `s/firstLocked/nextLocked/` and `s/removeFirstLocked/removeNextLocked/`. The idea is that for the InOrder mode "first" == "next", but for the OutOfOrder mode we can pick an arbitrary request that is not necessarily "first". This commit then makes the OutOfOrder requests provider pick its last request in the queue - which should reduce the memory usage in case resume requests are added later. Previously, we would slice "next" requests off the front and append to the end, and it could trigger reallocation of the underlying slice. Now, we'll be updating only the "tail" of the queue, and since a single request can result in at most one resume request, the initial slice provided in `enqueue` will definitely have enough capacity. Release note: None **kvstreamer: reduce allocations in the InOrder results buffer** This commit improves the InOrder results buffer by reusing the same scratch space for returning the results on `get()` calls as well as by reducing the size of `inOrderBufferedResult` struct from the 80 bytes size class to the 64 bytes size class. It also cleans up a couple of comments and clarifies `GetResults` method a bit. ``` name old time/op new time/op delta LookupJoinEqColsAreKeyOrdering/Cockroach-24 8.47ms ± 3% 8.54ms ± 2% ~ (p=0.182 n=9+10) LookupJoinEqColsAreKeyOrdering/MultinodeCockroach-24 10.2ms ± 2% 10.1ms ± 2% ~ (p=0.280 n=10+10) LookupJoinOrdering/Cockroach-24 10.4ms ± 3% 10.3ms ± 4% ~ (p=0.393 n=10+10) LookupJoinOrdering/MultinodeCockroach-24 13.8ms ± 5% 13.7ms ± 5% ~ (p=0.739 n=10+10) name old alloc/op new alloc/op delta LookupJoinEqColsAreKeyOrdering/Cockroach-24 1.59MB ± 2% 1.49MB ± 1% -6.20% (p=0.000 n=10+9) LookupJoinEqColsAreKeyOrdering/MultinodeCockroach-24 2.36MB ± 2% 2.27MB ± 2% -3.49% (p=0.000 n=10+10) LookupJoinOrdering/Cockroach-24 1.51MB ± 1% 1.42MB ± 1% -6.27% (p=0.000 n=9+10) LookupJoinOrdering/MultinodeCockroach-24 2.37MB ± 6% 2.26MB ± 2% -4.77% (p=0.000 n=10+9) name old allocs/op new allocs/op delta LookupJoinEqColsAreKeyOrdering/Cockroach-24 10.4k ± 1% 10.3k ± 1% ~ (p=0.055 n=8+9) LookupJoinEqColsAreKeyOrdering/MultinodeCockroach-24 15.5k ± 0% 15.5k ± 1% -0.34% (p=0.037 n=10+10) LookupJoinOrdering/Cockroach-24 12.4k ± 1% 12.3k ± 1% -0.98% (p=0.000 n=9+10) LookupJoinOrdering/MultinodeCockroach-24 18.5k ± 1% 18.5k ± 2% ~ (p=0.743 n=8+9) ``` Addresses: #82160. Release note: None 83705: teamcity: `grep` for pebble version in `go.mod` rather than `DEPS.bzl` r=nicktrav,jbowens a=rickystewart The `grep` can fail in the case that there is no more recent version of `pebble` than the one that is vendored in-tree. This fixes that case. Release note: None Co-authored-by: Aayush Shah <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Ricky Stewart <[email protected]>

82865: kvcoord: optimize batch truncation loop r=yuzefovich a=yuzefovich **kvcoord: truncate BatchRequest to range boundaries serially** This commit refactors the DistSender's loop of iterating over the range descriptors so that the truncation of the BatchRequest happens serially. This incurs a minor performance hit when the requests are sent in parallel, but it makes it possible to apply the optimizations to this iteration in the following commits. Release note: None **kvcoord: merge truncate_test into batch_test** This commit moves `TestTruncate` as well as `BenchmarkTruncate` into `batch_test.go` file in the preparation for the refactor done in the following commit. The code itself wasn't changed at all. Release note: None **kvcoord: introduce batch truncation helper** This commit introduces a batch truncation helper that encompasses logic of truncating requests to the boundaries of a single range as well as returning the next key to seek the range iterator to. The helper is now used both in the DistSender as well as in the Streamer. No modification to the actual logic of `Truncate` nor `Next`/`prev` functions has been done other than incorporating the return of the next seek key into `Truncate` function itself. This is needed since the following commit will tightly couple the truncation process with the next seek key determination in order to optimize it. The helper can be configured with a knob indicating whether `Truncate` needs to return requests in the original order. This behavior is necessary by the BatchRequests that contain writes since in several spots we rely on the ordering assumptions (e.g. of increasing values of `Sequence`). The following adjustments were made to the tests: - `BenchmarkTruncate` has been renamed to `BenchmarkTruncateLegacy` - `TestTruncate` has been refactored to exercise the new and the old code-paths - `TestBatchPrevNext` has been refactored to run through the new code path, also a few test cases have been adjusted slightly. This commit also introduces some unit tests for the new code path when it runs in a loop over multiple ranges as well as a corresponding benchmark. Release note: None **kvcoord: optimize batch truncation loop** This commit optimizes the batch truncation loop for the case when the requests only use global keys. The optimized approach sorts the requests according to their start keys (with the Ascending scan direction) or in the reverse order of their end keys (with the Descending scan direction). Then on each `Truncate` call it looks only at a subset of the requests (that haven't been fully processed yet and don't start after the current range), allowing us to avoid many unnecessary key comparisons. Please see a comment on the new `truncateAsc` and `truncateDesc` functions for more details and examples. The optimized approach actually has a worse time complexity in the absolute worst case (when each request is a range-spanning one and actually spans all of the ranges against which the requests are truncated) - because of the need to sort the requests upfront - but in practice, it is much faster, especially with point requests. ``` name old time/op new time/op delta TruncateLoop/asc/reqs=128/ranges=4/type=get-24 59.8µs ± 1% 33.9µs ± 3% -43.38% (p=0.000 n=10+10) TruncateLoop/asc/reqs=128/ranges=4/type=scan-24 83.0µs ± 4% 69.7µs ± 7% -15.98% (p=0.000 n=10+10) TruncateLoop/asc/reqs=128/ranges=64/type=get-24 865µs ± 1% 62µs ± 1% -92.84% (p=0.000 n=10+10) TruncateLoop/asc/reqs=128/ranges=64/type=scan-24 1.07ms ± 5% 0.46ms ± 8% -56.95% (p=0.000 n=10+10) TruncateLoop/asc/reqs=16384/ranges=4/type=get-24 7.09ms ± 0% 5.99ms ± 1% -15.56% (p=0.000 n=10+9) TruncateLoop/asc/reqs=16384/ranges=4/type=scan-24 9.22ms ± 0% 10.52ms ± 1% +14.08% (p=0.000 n=9+10) TruncateLoop/asc/reqs=16384/ranges=64/type=get-24 108ms ± 0% 6ms ± 2% -94.50% (p=0.000 n=8+10) TruncateLoop/asc/reqs=16384/ranges=64/type=scan-24 129ms ± 1% 71ms ± 1% -45.06% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=get-24 60.2µs ± 1% 36.0µs ± 2% -40.14% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=scan-24 82.4µs ± 8% 72.1µs ± 4% -12.51% (p=0.000 n=10+9) TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=get-24 862µs ± 1% 72µs ± 1% -91.65% (p=0.000 n=10+9) TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=scan-24 1.06ms ± 4% 0.49ms ± 8% -53.39% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=get-24 7.24ms ± 0% 6.46ms ± 1% -10.74% (p=0.000 n=10+9) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=scan-24 10.1ms ± 2% 9.8ms ± 2% -2.36% (p=0.000 n=10+9) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=get-24 107ms ± 0% 7ms ± 0% -93.57% (p=0.000 n=8+10) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=scan-24 122ms ± 0% 80ms ± 1% -34.32% (p=0.000 n=9+9) TruncateLoop/desc/reqs=128/ranges=4/type=get-24 78.9µs ± 1% 36.4µs ± 3% -53.81% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=4/type=scan-24 79.4µs ± 4% 52.3µs ± 5% -34.14% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=64/type=get-24 1.16ms ± 1% 0.07ms ± 1% -94.39% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=64/type=scan-24 1.01ms ± 4% 0.46ms ± 5% -54.56% (p=0.000 n=10+10) TruncateLoop/desc/reqs=16384/ranges=4/type=get-24 9.42ms ± 0% 6.26ms ± 1% -33.52% (p=0.000 n=10+10) TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24 8.41ms ± 1% 9.22ms ± 1% +9.54% (p=0.000 n=10+9) TruncateLoop/desc/reqs=16384/ranges=64/type=get-24 145ms ± 0% 6ms ± 1% -95.63% (p=0.000 n=9+9) TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24 125ms ± 1% 67ms ± 1% -46.31% (p=0.000 n=9+9) TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=get-24 77.8µs ± 1% 39.6µs ± 2% -49.10% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=scan-24 74.0µs ± 3% 63.0µs ± 6% -14.92% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=get-24 1.16ms ± 1% 0.08ms ± 1% -93.47% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=scan-24 1.04ms ± 5% 0.47ms ± 7% -54.65% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=get-24 9.50ms ± 0% 6.73ms ± 1% -29.21% (p=0.000 n=9+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=scan-24 8.88ms ± 1% 13.24ms ± 1% +49.04% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=get-24 146ms ± 0% 7ms ± 1% -94.98% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=scan-24 125ms ± 1% 74ms ± 1% -40.75% (p=0.000 n=10+9) name old alloc/op new alloc/op delta TruncateLoop/asc/reqs=128/ranges=4/type=get-24 7.58kB ± 0% 21.00kB ±11% +176.84% (p=0.000 n=7+10) TruncateLoop/asc/reqs=128/ranges=4/type=scan-24 39.8kB ± 6% 49.1kB ±15% +23.46% (p=0.000 n=9+10) TruncateLoop/asc/reqs=128/ranges=64/type=get-24 6.48kB ± 5% 18.25kB ± 2% +181.79% (p=0.000 n=10+9) TruncateLoop/asc/reqs=128/ranges=64/type=scan-24 428kB ±20% 368kB ±13% -13.85% (p=0.003 n=10+10) TruncateLoop/asc/reqs=16384/ranges=4/type=get-24 1.60MB ± 0% 2.91MB ± 0% +82.49% (p=0.000 n=8+10) TruncateLoop/asc/reqs=16384/ranges=4/type=scan-24 5.24MB ± 0% 5.89MB ± 0% +12.41% (p=0.000 n=10+8) TruncateLoop/asc/reqs=16384/ranges=64/type=get-24 1.15MB ± 4% 2.41MB ± 1% +110.09% (p=0.000 n=10+10) TruncateLoop/asc/reqs=16384/ranges=64/type=scan-24 69.8MB ± 1% 64.6MB ± 1% -7.55% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=get-24 9.77kB ±22% 21.98kB ± 4% +125.07% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=scan-24 38.9kB ±23% 49.9kB ± 2% +28.28% (p=0.000 n=10+8) TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=get-24 6.56kB ± 4% 20.20kB ± 3% +208.11% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=scan-24 407kB ±15% 372kB ±13% -8.68% (p=0.043 n=10+10) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=get-24 1.65MB ± 0% 3.62MB ± 0% +118.55% (p=0.000 n=8+8) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=scan-24 6.60MB ± 2% 5.65MB ± 1% -14.38% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=get-24 1.10MB ± 5% 2.77MB ± 1% +152.58% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=scan-24 60.5MB ± 1% 67.9MB ± 1% +12.19% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=4/type=get-24 17.2kB ±10% 21.5kB ± 1% +24.91% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=4/type=scan-24 35.5kB ±12% 29.1kB ± 4% -17.83% (p=0.000 n=10+9) TruncateLoop/desc/reqs=128/ranges=64/type=get-24 138kB ± 1% 20kB ± 3% -85.34% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=64/type=scan-24 344kB ±15% 363kB ±10% +5.50% (p=0.035 n=10+10) TruncateLoop/desc/reqs=16384/ranges=4/type=get-24 2.78MB ± 0% 3.35MB ± 0% +20.24% (p=0.000 n=8+10) TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24 4.42MB ± 6% 5.29MB ± 0% +19.67% (p=0.000 n=10+8) TruncateLoop/desc/reqs=16384/ranges=64/type=get-24 17.9MB ± 0% 2.7MB ± 2% -85.21% (p=0.000 n=10+10) TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24 65.3MB ± 0% 61.0MB ± 1% -6.65% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=get-24 15.9kB ± 3% 26.7kB ± 1% +67.87% (p=0.000 n=10+9) TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=scan-24 29.4kB ± 6% 41.6kB ± 5% +41.50% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=get-24 138kB ± 0% 23kB ± 3% -83.61% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=scan-24 390kB ±19% 350kB ±11% -10.16% (p=0.015 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=get-24 2.69MB ± 4% 3.51MB ± 1% +30.22% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=scan-24 4.89MB ± 1% 8.19MB ± 0% +67.68% (p=0.000 n=10+9) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=get-24 17.9MB ± 0% 3.0MB ± 2% -83.34% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=scan-24 65.4MB ± 1% 62.9MB ± 1% -3.81% (p=0.000 n=10+10) name old allocs/op new allocs/op delta TruncateLoop/asc/reqs=128/ranges=4/type=get-24 50.0 ± 0% 52.4 ±10% +4.80% (p=0.017 n=8+10) TruncateLoop/asc/reqs=128/ranges=4/type=scan-24 569 ±12% 557 ±15% ~ (p=0.617 n=10+10) TruncateLoop/asc/reqs=128/ranges=64/type=get-24 207 ± 4% 210 ± 5% ~ (p=0.380 n=10+10) TruncateLoop/asc/reqs=128/ranges=64/type=scan-24 6.97k ±13% 6.02k ± 9% -13.64% (p=0.000 n=10+10) TruncateLoop/asc/reqs=16384/ranges=4/type=get-24 126 ± 0% 122 ± 0% -3.17% (p=0.002 n=8+10) TruncateLoop/asc/reqs=16384/ranges=4/type=scan-24 51.9k ± 1% 42.8k ± 1% -17.59% (p=0.000 n=10+9) TruncateLoop/asc/reqs=16384/ranges=64/type=get-24 1.12k ± 1% 1.12k ± 1% +0.43% (p=0.027 n=10+10) TruncateLoop/asc/reqs=16384/ranges=64/type=scan-24 786k ± 1% 714k ± 1% -9.13% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=get-24 41.2 ± 3% 58.0 ± 3% +40.78% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=scan-24 574 ±18% 532 ±11% ~ (p=0.143 n=10+10) TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=get-24 205 ± 2% 234 ± 3% +14.26% (p=0.000 n=9+9) TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=scan-24 6.70k ± 9% 6.00k ± 9% -10.40% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=get-24 127 ± 0% 125 ± 0% -1.57% (p=0.001 n=8+9) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=scan-24 63.7k ± 1% 27.8k ± 2% -56.34% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=get-24 1.14k ± 0% 1.14k ± 1% ~ (p=0.515 n=10+10) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=scan-24 696k ± 1% 752k ± 1% +7.97% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=4/type=get-24 554 ± 1% 169 ± 2% -69.52% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=4/type=scan-24 519 ± 9% 268 ± 9% -48.32% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=64/type=get-24 8.38k ± 0% 0.33k ± 3% -96.06% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=64/type=scan-24 5.90k ±10% 5.89k ± 5% ~ (p=0.796 n=10+10) TruncateLoop/desc/reqs=16384/ranges=4/type=get-24 65.7k ± 0% 16.5k ± 0% -74.87% (p=0.002 n=8+10) TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24 39.5k ± 1% 27.8k ± 2% -29.54% (p=0.000 n=10+10) TruncateLoop/desc/reqs=16384/ranges=64/type=get-24 1.05M ± 0% 0.02M ± 0% -98.33% (p=0.000 n=9+10) TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24 741k ± 0% 679k ± 1% -8.32% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=get-24 559 ± 0% 182 ± 2% -67.42% (p=0.000 n=9+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=scan-24 438 ± 8% 404 ±13% -7.76% (p=0.014 n=9+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=get-24 8.39k ± 0% 0.36k ± 5% -95.75% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=scan-24 6.38k ±11% 5.56k ± 9% -12.85% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=get-24 65.7k ± 0% 16.5k ± 0% -74.84% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=scan-24 46.8k ± 1% 67.6k ± 1% +44.65% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=get-24 1.05M ± 0% 0.02M ± 0% -98.33% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=scan-24 739k ± 1% 694k ± 1% -6.08% (p=0.000 n=10+10) ``` The truncation loops for the optimized strategy are very similar in both directions, so I tried to extract the differences out into an interface. However, this showed non-trivial slow down and increase in allocations, so I chose to have some duplicated code to get the best performance. Here is a snippet of the comparison when the interface was prototyped: ``` name old time/op new time/op delta TruncateLoop/desc/reqs=128/ranges=4/type=get-24 36.9µs ± 3% 44.5µs ± 3% +20.55% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=4/type=scan-24 74.8µs ± 5% 88.8µs ± 3% +18.73% (p=0.000 n=9+10) TruncateLoop/desc/reqs=128/ranges=64/type=get-24 64.9µs ± 1% 78.3µs ± 1% +20.72% (p=0.000 n=10+9) TruncateLoop/desc/reqs=128/ranges=64/type=scan-24 471µs ± 8% 682µs ±13% +44.73% (p=0.000 n=10+10) TruncateLoop/desc/reqs=16384/ranges=4/type=get-24 6.34ms ± 1% 7.39ms ± 0% +16.47% (p=0.000 n=10+9) TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24 11.2ms ± 1% 12.4ms ± 1% +10.36% (p=0.000 n=10+9) TruncateLoop/desc/reqs=16384/ranges=64/type=get-24 6.40ms ± 2% 7.39ms ± 1% +15.47% (p=0.000 n=10+10) TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24 70.9ms ± 1% 102.0ms ± 2% +43.87% (p=0.000 n=9+9) name old alloc/op new alloc/op delta TruncateLoop/desc/reqs=128/ranges=4/type=get-24 22.2kB ± 9% 30.4kB ± 0% +36.55% (p=0.000 n=10+7) TruncateLoop/desc/reqs=128/ranges=4/type=scan-24 52.2kB ± 5% 67.6kB ± 4% +29.47% (p=0.000 n=8+10) TruncateLoop/desc/reqs=128/ranges=64/type=get-24 20.0kB ± 2% 32.2kB ± 1% +60.86% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=64/type=scan-24 372kB ±13% 600kB ±10% +61.29% (p=0.000 n=10+8) TruncateLoop/desc/reqs=16384/ranges=4/type=get-24 3.24MB ± 0% 4.45MB ± 0% +37.42% (p=0.000 n=8+7) TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24 6.61MB ± 0% 7.86MB ± 0% +18.90% (p=0.000 n=10+9) TruncateLoop/desc/reqs=16384/ranges=64/type=get-24 2.75MB ± 2% 3.74MB ± 1% +36.03% (p=0.000 n=10+10) TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24 65.7MB ± 1% 97.2MB ± 1% +47.95% (p=0.000 n=10+10) name old allocs/op new allocs/op delta TruncateLoop/desc/reqs=128/ranges=4/type=get-24 177 ± 2% 314 ± 0% +77.40% (p=0.000 n=10+8) TruncateLoop/desc/reqs=128/ranges=4/type=scan-24 597 ± 8% 847 ± 8% +41.89% (p=0.000 n=9+10) TruncateLoop/desc/reqs=128/ranges=64/type=get-24 329 ± 3% 531 ± 2% +61.40% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=64/type=scan-24 6.02k ± 9% 9.56k ± 6% +58.80% (p=0.000 n=10+8) TruncateLoop/desc/reqs=16384/ranges=4/type=get-24 16.5k ± 0% 32.9k ± 0% +99.17% (p=0.000 n=8+8) TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24 53.5k ± 1% 73.1k ± 1% +36.69% (p=0.000 n=10+9) TruncateLoop/desc/reqs=16384/ranges=64/type=get-24 17.5k ± 0% 33.9k ± 0% +94.02% (p=0.000 n=6+10) TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24 727k ± 1% 1194k ± 1% +64.31% (p=0.000 n=10+10) ``` An additional knob is introduced to the batch truncation helper to indicate whether the helper can take ownership of the passed-in requests slice and reorder it as it pleases. This is the case for the Streamer, but the DistSender relies on the order of requests not being modified, so the helper makes a copy of the slice. Some tests needed an adjustment since now we process requests not necessarily in the original order, so the population of ResumeSpans might be different. Fixes: #68536. Addresses: #82159. Release note: None 83358: sql: initialize sql instance during instance provider start r=ajwerner a=rharding6373 Before this change, there was a race condition where the instance provider and the instance reader would start before the instance provider created a SQL instance, potentially causing the reader to not cache the instance before initialization was complete. This is a problem in multi-tenant environments, where we may not be able to plan queries if the reader does not know of any SQL instances. This change moves sql instance initialization into the instance provider's `Start()` function before starting the reader, so that the instance is guaranteed to be visible to the reader on its first rangefeed scan of the `system.sql_instances` table. Fixes: #82706 Fixes: #81807 Fixes: #81567 Release note: None 83683: kvstreamer: perform more memory accounting r=yuzefovich a=yuzefovich **kvstreamer: perform more memory accounting** This commit performs memory accounting for more of the internal state of the streamer. In particular, it adds accounting for the overhead of `Result`s in the results buffer (previously, we would only account for the size of the Get or the Scan response but not for the `Result` struct itself). It also adds accounting for all slices that are `O(N)` in size where `N` is the number of requests. Addresses: #82160. Release note: None **kvstreamer: account for the overhead of each request** Previously, we didn't account for some of the overhead of Get and Scan requests, and this commit fixes that. Here is the list of all things that we need to account for a single Get request: - (if referenced by `[]roachpb.RequestUnion`) the overhead of the `roachpb.RequestUnion` object - the overhead of `roachpb.RequestUnion_Get` object - the overhead of `roachpb.GetRequest` object - the footprint of the key inside of `roachpb.GetRequest`. Previously, we only accounted for the first and the fourth items, and now we account for all of them. Additionally, we used the auto-generated `Size` method for the fourth item which I believe represent the size of the serialized protobuf message (possibly compressed - I'm not sure), but we're interested in the capacities of the underlying slices. Addresses: #82160. Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: rharding6373 <[email protected]>

83547: stmtdiagnostics: remove the usage of gossip r=yuzefovich a=yuzefovich This commit removes the usage of gossip in the stmt diagnostics feature since it is really an optimization (i.e not required for the feature to work) and is in a way of the architecture unification (i.e. multi-tenancy effort). The gossip was previously used to let all nodes in the cluster know about the new stmt diagnostics request or about the cancellation of an existing one (in fact, the gossip propagation of the latter was broken anyway since we forgot to register the corresponding callback). We now rely on the polling mechanism on each node to read from the system table to populate the registry of current requests. This polling takes place every `sql.stmt_diagnostics.poll_interval` seconds (10 by default). Additionally, this commit changes the class of that setting from TenantWritable to TenantReadOnly because 1. we don't charge the user for the polling 2. to prevent the abuse where a tenant could set the polling interval to a very low value incurring no costs to themselves (we excluded the polling from the tenant cost recently). The follow-up work remains to bring the optimization of propagating new requests quickly using a range feed on the system table. Epic: CRDB-16702 Informs: #47893 Release note: None 83896: ui/cluster-ui: filter out closed sessions from active exec pages r=xinhaoz a=xinhaoz Previously, it was possible for the active transactions page to show txns from closed sessions. The sessions API was recently updated to return closed sessions, and it is possible for the active_txn field in a closed session to be populated. This commit filters out the closed sessions when retrieving active transactions. Release note (bug fix): active transactions page no longer shows transactions from closed sessions 83903: ui: persist stmt view selection in sql activity page r=xinhaoz a=xinhaoz Previously, the selection of viewing historical or active executions for statements and transactions tabs in the SQL activity page did not persist on tab change. This commit persists the selection between tab changes in the SQL activity page. Release note (ui change): In the SQL Activity Page, the selection to view historical or active executions will persist between tabs. https://www.loom.com/share/6990f8e273f64ddcaf971e85f0a8ef89 83926: kvstreamer: reuse the truncation helper and introduce a fast path for a single range case r=yuzefovich a=yuzefovich **kvstreamer: introduce a single range fast path for request truncation** This commit introduces a fast path to avoid the usage of the batch truncation helper when all requests are contained within a single range. Some modifications needed to be made to the `txnKVStreamer` so that it didn't nil out the requests slice - we now delay that until right before the next call to `Enqueue`. ``` ame old time/op new time/op delta IndexJoin/Cockroach-24 6.21ms ± 1% 5.96ms ± 2% -4.08% (p=0.000 n=8+10) IndexJoinColumnFamilies/Cockroach-24 8.97ms ± 4% 8.79ms ± 7% ~ (p=0.190 n=10+10) name old alloc/op new alloc/op delta IndexJoin/Cockroach-24 1.39MB ± 1% 1.27MB ± 1% -7.97% (p=0.000 n=10+10) IndexJoinColumnFamilies/Cockroach-24 1.46MB ± 1% 1.34MB ± 0% -8.04% (p=0.000 n=9+7) name old allocs/op new allocs/op delta IndexJoin/Cockroach-24 7.20k ± 1% 7.16k ± 1% -0.61% (p=0.022 n=10+10) IndexJoinColumnFamilies/Cockroach-24 12.0k ± 1% 11.9k ± 0% -0.83% (p=0.000 n=9+8) ``` Addresses: #82159. Release note: None **kvcoord: refactor the truncation helper for reuse** This commit refactors the batch truncation helper so that it can be reused for multiple batches of requests. In particular, that ability is now utilized by the streamer. Additionally, since the streamer now holds on to the same truncation helper for possibly a long time, this commit adds the memory accounting for the internal state of the helper. Addresses: #82160. Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Xin Hao Zhang <[email protected]>

yuzefovich · 2022-07-12T15:59:55Z

Latest 20 runs of tpch_concurrency showed some regression: 75.65 with the streamer OFF vs 68.9 with the streamer ON. I'm currently looking at the core dumps to see what we're not accounting for. This already exposed #84229 and #84230.

yuzefovich · 2022-07-15T19:50:43Z

I did 10 runs of different tpch_concurrency configs on a binary that included #84230 and #84319 (as well as already merged #84229), and here is what I got (the higher, the better):

high_refresh_spans_bytes: 78.5
no streamer: 107.6
with streamer: 104.

At this point, it seems like the stability of the streamer is roughly on par with the non-streamer config, so I'm marking this issue as "done", and will make sure to push #84230 and #84319 over the finish line. I also might look into the stability aspect during the stability period.

84319: storage: be tighter with allocations when TargetBytes is present r=yuzefovich a=yuzefovich Previously, while `put`ting the key into the `repr`, we could make an allocation that was too large given the remaining `TargetBytes` budget. This is the case since we're exponentially increasing the capacities of the buffers until 128MiB and because we're only accounting for the length of the slice even though the whole capacity would have a memory footprint. For example, with 10MiB of `TargetBytes` (which is used by SQL in many cases) and a ScanResponse that exceeds that limit, we'd allocate capacities that are powers of two, starting at, say, 256B, and would go all the way up to 8MiB; however, given that 10MiB limit, we'd only use 2MiB of that last 8MiB `repr` when we encounter the target bytes limit and stop. Such behavior is kinda ok if the response is marshalled by the gRPC to be sent to the other node, but it is quite unfortunate in the local fast-path cases (meaning the SQL and the KV are part of the same physical machine). In the latter scenario SQL would only account for the lengths of slices while keeping the whole slices alive for a while, leading to significant unaccounted for memory. In the example above, on the order of 6MiB would be unaccounted for - multiply that by some concurrency, and we have unaccounted memory on the order of hundreds of MiBs. The current behavior seems especially bad for the streamer use case where we issue many requests with the `TargetBytes` set and use `ScanResponse.NumBytes` field (which tracks the lengths of the slices) for the memory accounting purposes. In order to improve here, this commit teaches `put` method about the maximum capacity it can use. In the example above, the last slice would be on the order of 2MiB making everything happy: we stay very close to TargetBytes limit and don't have any wasted space. Addresses: #64906. Addresses: #82160. Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]>

yuzefovich added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label May 31, 2022

yuzefovich self-assigned this May 31, 2022

blathers-crl bot added the T-sql-queries SQL Queries Team label May 31, 2022

This was referenced May 31, 2022

kvcoord: permit parallelization of scans with limits #54680

Open

roachtest: add no-streamer config to tpch_concurrency #82327

Merged

mari-crl added the sync-me label Jun 2, 2022

This was referenced Jun 3, 2022

sql: reuse the slice of RequestUnion objects between fetches #82384

Merged

kvstreamer: clean up Result struct #82422

Merged

yuzefovich mentioned this issue Jun 9, 2022

kvstreamer: remove temporary allocations for []Result #82387

Merged

This was referenced Jun 30, 2022

kvstreamer: optimize the results buffers a bit #83659

Merged

kvstreamer: perform more memory accounting #83683

Merged

yuzefovich mentioned this issue Jul 6, 2022

kvstreamer: reuse the truncation helper and introduce a fast path for a single range case #83926

Merged

yuzefovich mentioned this issue Jul 13, 2022

storage: be tighter with allocations when TargetBytes is present #84319

Merged

yuzefovich closed this as completed Jul 15, 2022

mgartner added this to SQL Queries Jul 24, 2023

mgartner moved this to Done in SQL Queries Jul 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvstreamer: improve the stability #82160

kvstreamer: improve the stability #82160

yuzefovich commented May 31, 2022 •

edited

Loading

yuzefovich commented Jun 23, 2022

yuzefovich commented Jul 12, 2022

yuzefovich commented Jul 15, 2022

kvstreamer: improve the stability #82160

kvstreamer: improve the stability #82160

Comments

yuzefovich commented May 31, 2022 • edited Loading

yuzefovich commented Jun 23, 2022

yuzefovich commented Jul 12, 2022

yuzefovich commented Jul 15, 2022

yuzefovich commented May 31, 2022 •

edited

Loading