kvcoord: investigate tpch_concurrency roachtest regression after bumping default value of max_refresh_span_bytes #81451

arulajmani · 2022-05-18T14:41:27Z

Describe the problem

In #80115 we bumped the default value of kv.transaction.max_refresh_span_bytes to 4MB. As @yuzefovich points out here, the test attempts to find the maximum concurrency of TPCH queries that can be sustained without OOM-ing. The regression doesn't seem entirely surprising given we're using more memory to track refresh spans, but it's still worth investigating to make sure.

We might also want to take the opportunity to add memory accounting for the spans we're tracking. One could imagine a scheme where we dynamically reduce this limit if we're running close to the limit instead of letting the cluster OOM because the static limit is too high.

Jira issue: CRDB-15176

Epic CRDB-19172

The text was updated successfully, but these errors were encountered:

yuzefovich · 2022-06-02T02:11:19Z

I looked into this some time ago but forgot to write the details down. I think there were two problems:

refresh spans not being condensed enough. At the end of the query execution we propagate this as metadata, so if the refresh spans are smaller, then we'll need to send and receive less across the network. This seems like a KV issue, but I'm also not certain that there is a bug in the code.
LeafTxnFinalState metadata is not accounted for on the receiving side. This is a SQL issue.

For context, TPCH queries are analytical in nature, so they read a lot of data, and we need to propagate the refresh spans back to the gateway at the end of query execution. Bumping up max_refresh_span_bytes seems to have increased the amount of those refresh spans by too much. TPCH Q9 seems to be the one OOMing the nodes in tpch_concurrency/high_refresh_span_bytes roachtest the most.

blathers-crl · 2022-07-09T20:20:10Z

Hi @yuzefovich, please add branch-* labels to identify which branch(es) this release-blocker affects.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

83904: rfc: add user defined function RFC r=mgartner a=mgartner Release note: None 84286: streamingccl: heartbeat persisted frontier r=samiskin a=samiskin Resolves #84086 Previously we would forward the ingestion frontier's resolved timestamp at the point of heartbeat, however this would result in the chance of the protected timestamp record of the producer to exceed that of the ingestion job's persisted frontier. This PR uses the last persisted frontier value instead. Release note (bug fix): The protected timestamp of the producer job is no longer able to exceed the persisted ingestion frontier. 85285: colexec: improve handling of the metadata r=yuzefovich a=yuzefovich **colmem: introduce a helper to release all reservations from allocator** Release note: None This commit audits all of the places where we're operating with the producer metadata and improves things a bit. This was prompted by the fact that some types of the metadata (in particular, the LeafTxnFinalState) can be of non-trivial footprint, so the sooner we lose the reference to the metadata objects, the more stable CRDB will be. **colexec: improve handling of the metadata** This commit adds the memory accounting for the LeafTxnFinalState metadata objects in most (all?) places that buffer metadata (inbox, columnarizer, materializer, parallel unordered synchronizer). The reservations are released whenever the buffered component is drained, however, the drainer - who takes over the metadata - might not perform the accounting. The difficulty there is that the lifecycle of the metadata objects are not super clear: it's possible that we're at the end of the execution, and the whole plan is being drained - in such a scenario, the metadata is pushed into the DistSQLReceiver and then imported into the root txn and is discarded (modulo the increased root txn footprint); it's also possible that the metadata from the drained component gets buffered somewhere up the tree. But this commit adds the accounting in most such places. Addresses: #64906. Addresses: #81451. Release note: None Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Shiranka Miskin <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]>

84230: kvcoord: account for the span overhead when condensing refresh spans r=yuzefovich a=yuzefovich Previously, we would only account for the lengths of the key and the end key of the span for the purposes of memory estimation while condensing the refresh spans set. However, each span has non-trivial overhead (48 bytes) of `roachpb.Span` object itself which we previously ignored. As a result, the actual footprint of the refresh spans could previously significantly exceed the target size, especially when the keys are small. For example, when looking at a recently collected core dump, I saw the refresh spans taking up about 24MB in the heap whereas the target setting is only 4MiB. This memory currently is not tracked against the memory accounting system at all, so such over-shots are quite bad, especially so given the recent bump of the setting from 256KiB to 4MiB. Addresses: #64906. Addresses: #81451. Release note (ops change): The way we track memory against `kv.transaction.max_intents_bytes` and `kv.transaction.max_refresh_spans_bytes` has been adjusted to be more precise (we no longer ignore some of the overhead). As a result, the stability of CRDB improves (we're less likely to OOM), however, this change effectively reduces the budgets determined by those cluster settings. In practice, this means that - the intents might be tracked more coarsely (due to span coalescing) which makes the intent resolution less efficient - the refresh spans become more coarse too making it more likely that `ReadWithinUncertaintyIntervalError`s are returned to the user rather than are retried transparently. 85156: changefeedccl: reduce allocations in kvevent blocking buffer r=jayshrivastava a=jayshrivastava This change removes a pointer from the kvevent.Event struct, reducing overall allocations. The hope is that this reduces the amount of work Go gc has to do, which will reduce SQL latency at the end of the day. When doing backfills, the allocations in kv events add up pretty fast, so reducing even one pointer is significant. See #84709 for more info. I'm not closing the issue with this PR since we may decide to reduce more pointers in future PRs using some of the ideas in the issue comments. Here are the benchmark results ``` name old time/op new time/op delta MemBuffer-10 98.1µs ± 0% 95.8µs ± 1% -2.35% (p=0.008 n=5+5) name old alloc/op new alloc/op delta MemBuffer-10 76.9kB ± 0% 64.4kB ± 0% -16.17% (p=0.008 n=5+5) name old allocs/op new allocs/op delta MemBuffer-10 859 ± 0% 675 ± 0% -21.42% (p=0.008 n=5+5) ``` 85368: roachtest: add KV/YCSB benchmarks with global MVCC range tombstones r=jbowens a=erikgrinaker **kvserver: add env var to write global MVCC range tombstone** This patch adds the envvar `COCKROACH_GLOBAL_MVCC_RANGE_TOMBSTONE`. When enabled, it will write a global MVCC range tombstone across the entire table data keyspan during cluster bootstrapping. This can be used to test performance and correctness in the presence of MVCC range tombstones, by activating range key-specific code paths while not semantically affecting the data above it. Touches #84384. Release note: None **roachtest: add KV/YCSB benchmarks with global MVCC range tombstones** This patch adds a set of benchmark variants that write a single MVCC range tombstone across the entire SQL keyspan at cluster start, via the `COCKROACH_GLOBAL_MVCC_RANGE_TOMBSTONE` env var. Even though this range tombstone will not affect the data written during the benchmarks, it activates range key-specific code paths in the storage layer which can have a significant impact on performance. The new benchmarks are: * `kv0/enc=false/nodes=3/cpu=32/mvcc-range-keys=global` * `kv95/enc=false/nodes=3/cpu=32/mvcc-range-keys=global` * `ycsb/A/nodes=3/cpu=32/mvcc-range-keys=global` * `ycsb/B/nodes=3/cpu=32/mvcc-range-keys=global` * `ycsb/C/nodes=3/cpu=32/mvcc-range-keys=global` * `ycsb/D/nodes=3/cpu=32/mvcc-range-keys=global` * `ycsb/E/nodes=3/cpu=32/mvcc-range-keys=global` * `ycsb/F/nodes=3/cpu=32/mvcc-range-keys=global` Resolves #84384. Release note: None 85424: cmd/dev: add support for --show-diff flag from logictests r=rytaft a=rytaft This commit adds support for the `--show-diff` flag when running tests with `dev`. This flag is used by the logictests in order to show diffs between the expected and actual output. Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Jayant Shrivastava <[email protected]> Co-authored-by: Erik Grinaker <[email protected]> Co-authored-by: Rebecca Taft <[email protected]>

arulajmani · 2022-09-20T17:17:34Z

With @yuzefovich's patch patch the TPCC concurrency numbers (TODO:arul add link to roachperf once the view is updated) are ~similar to those we see in 22.1. Given this, and other benefits of bumping the max_refresh_span_bytes that prompted the change, we should be fine shipping the change in 22.2.

88291: roachtest: use default value of max_refresh_spans_bytes for tpch_concurrency r=yuzefovich a=yuzefovich This commit makes it so that we use the default value of `kv.transaction.max_refresh_spans_bytes` cluster setting in the tpch_concurrency roachtest. The idea is that we should be testing what we ship, and we do understand why the increase of the default for that setting in 22.2 led to regression on this roachtest. Many improvements have been made to get that back, so we now are on par with 22.1, and the corresponding issue has been closed. As a result, one test config is now removed. I decided to keep the "no streamer" config as it still seems useful, at least for 23.1 release cycle. Related to #81451. Release note: None 88349: nightlies: fix cloud unit test nightly script r=rhu713 a=adityamaru Release note: None 88354: vendor: bump Pebble to 63300403d537 r=nicktrav a=jbowens ``` 63300403 db: enable TrySeekUsingNext after Next in external iters fa910870 db: add ExternalIter_NonOverlapping_SeekNextScan benchmark 1d444f36 sstable: include more blocks' stats in BlockBytes d8f4eb38 docs: fix date for struct zeroing annotation c04d1287 metamorphic: always synchronize Clone on underlying Batch 81a4342c docs: add benchmark annotation for #1822 ``` Addresses #88329. Release note: None 88357: bazel,ci: find `compare_test` binary under `bazel-bin` r=healthy-pod a=rickystewart Since the Go 1.19 upgrade this has been broken as `realpath` has been getting the `-test.timeout` argument and been getting confused. Also since Go 1.19 it is must easier to find this binary which is right under the normal `bazel-bin`. Release note: None 88359: sql: fix beautiful diagram which gofmt messed up r=Xiang-Gu a=ajwerner This works. Don't ask me why. Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: adityamaru <[email protected]> Co-authored-by: Jackson Owens <[email protected]> Co-authored-by: Ricky Stewart <[email protected]> Co-authored-by: Andrew Werner <[email protected]>

arulajmani added the C-investigation Further steps needed to qualify. C-label will change. label May 18, 2022

blathers-crl bot added the T-kv KV Team label May 18, 2022

jlinder added sync-me and removed sync-me labels May 20, 2022

yuzefovich added the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Jul 9, 2022

yuzefovich added the branch-master Failures and bugs on the master branch. label Jul 9, 2022

yuzefovich mentioned this issue Jul 12, 2022

kvcoord: account for the span overhead when condensing refresh spans #84230

Merged

yuzefovich mentioned this issue Jul 28, 2022

colexec: improve handling of the metadata #85285

Merged

arulajmani self-assigned this Aug 29, 2022

nvanbenschoten added GA-blocker and removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Sep 6, 2022

arulajmani closed this as completed Sep 20, 2022

yuzefovich mentioned this issue Sep 20, 2022

roachtest: use default value of max_refresh_spans_bytes for tpch_concurrency #88291

Merged

yuzefovich mentioned this issue Oct 13, 2023

roachtest: tpch_concurrency failed #112178

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvcoord: investigate tpch_concurrency roachtest regression after bumping default value of max_refresh_span_bytes #81451

kvcoord: investigate tpch_concurrency roachtest regression after bumping default value of max_refresh_span_bytes #81451

arulajmani commented May 18, 2022 •

edited by exalate-issue-sync bot

Loading

yuzefovich commented Jun 2, 2022

blathers-crl bot commented Jul 9, 2022

arulajmani commented Sep 20, 2022

kvcoord: investigate tpch_concurrency roachtest regression after bumping default value of max_refresh_span_bytes #81451

kvcoord: investigate tpch_concurrency roachtest regression after bumping default value of max_refresh_span_bytes #81451

Comments

arulajmani commented May 18, 2022 • edited by exalate-issue-sync bot Loading

yuzefovich commented Jun 2, 2022

blathers-crl bot commented Jul 9, 2022

arulajmani commented Sep 20, 2022

arulajmani commented May 18, 2022 •

edited by exalate-issue-sync bot

Loading