Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvcoord: investigate tpch_concurrency roachtest regression after bumping default value of max_refresh_span_bytes #81451

Closed
arulajmani opened this issue May 18, 2022 · 3 comments
Assignees
Labels
branch-master Failures and bugs on the master branch. C-investigation Further steps needed to qualify. C-label will change. GA-blocker T-kv KV Team

Comments

@arulajmani
Copy link
Collaborator

arulajmani commented May 18, 2022

Describe the problem

In #80115 we bumped the default value of kv.transaction.max_refresh_span_bytes to 4MB. As @yuzefovich points out here, the test attempts to find the maximum concurrency of TPCH queries that can be sustained without OOM-ing. The regression doesn't seem entirely surprising given we're using more memory to track refresh spans, but it's still worth investigating to make sure.

We might also want to take the opportunity to add memory accounting for the spans we're tracking. One could imagine a scheme where we dynamically reduce this limit if we're running close to the limit instead of letting the cluster OOM because the static limit is too high.

Jira issue: CRDB-15176

Epic CRDB-19172

@arulajmani arulajmani added the C-investigation Further steps needed to qualify. C-label will change. label May 18, 2022
@blathers-crl blathers-crl bot added the T-kv KV Team label May 18, 2022
@jlinder jlinder added sync-me and removed sync-me labels May 20, 2022
@yuzefovich
Copy link
Member

I looked into this some time ago but forgot to write the details down. I think there were two problems:

  • refresh spans not being condensed enough. At the end of the query execution we propagate this as metadata, so if the refresh spans are smaller, then we'll need to send and receive less across the network. This seems like a KV issue, but I'm also not certain that there is a bug in the code.
  • LeafTxnFinalState metadata is not accounted for on the receiving side. This is a SQL issue.

For context, TPCH queries are analytical in nature, so they read a lot of data, and we need to propagate the refresh spans back to the gateway at the end of query execution. Bumping up max_refresh_span_bytes seems to have increased the amount of those refresh spans by too much. TPCH Q9 seems to be the one OOMing the nodes in tpch_concurrency/high_refresh_span_bytes roachtest the most.

@yuzefovich yuzefovich added the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Jul 9, 2022
@blathers-crl
Copy link

blathers-crl bot commented Jul 9, 2022

Hi @yuzefovich, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@yuzefovich yuzefovich added the branch-master Failures and bugs on the master branch. label Jul 9, 2022
craig bot pushed a commit that referenced this issue Jul 29, 2022
83904: rfc: add user defined function RFC r=mgartner a=mgartner

Release note: None

84286: streamingccl: heartbeat persisted frontier r=samiskin a=samiskin

Resolves #84086 

Previously we would forward the ingestion frontier's resolved timestamp
at the point of heartbeat, however this would result in the chance of
the protected timestamp record of the producer to exceed that of the
ingestion job's persisted frontier.

This PR uses the last persisted frontier value instead.

Release note (bug fix): The protected timestamp of the producer job is
no longer able to exceed the persisted ingestion frontier.


85285: colexec: improve handling of the metadata r=yuzefovich a=yuzefovich

**colmem: introduce a helper to release all reservations from allocator**

Release note: None

This commit audits all of the places where we're operating with the
producer metadata and improves things a bit. This was prompted by the
fact that some types of the metadata (in particular, the
LeafTxnFinalState) can be of non-trivial footprint, so the sooner we
lose the reference to the metadata objects, the more stable CRDB will
be.

**colexec: improve handling of the metadata**

This commit adds the memory accounting for the LeafTxnFinalState
metadata objects in most (all?) places that buffer metadata (inbox,
columnarizer, materializer, parallel unordered synchronizer). The
reservations are released whenever the buffered component is drained,
however, the drainer - who takes over the metadata - might not perform
the accounting. The difficulty there is that the lifecycle of the
metadata objects are not super clear: it's possible that we're at the
end of the execution, and the whole plan is being drained - in such a
scenario, the metadata is pushed into the DistSQLReceiver and then
imported into the root txn and is discarded (modulo the increased root
txn footprint); it's also possible that the metadata from the drained
component gets buffered somewhere up the tree. But this commit adds the
accounting in most such places.

Addresses: #64906.
Addresses: #81451.

Release note: None

Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Shiranka Miskin <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
craig bot pushed a commit that referenced this issue Aug 1, 2022
84230: kvcoord: account for the span overhead when condensing refresh spans r=yuzefovich a=yuzefovich

Previously, we would only account for the lengths of the key and the end
key of the span for the purposes of memory estimation while condensing
the refresh spans set. However, each span has non-trivial overhead (48
bytes) of `roachpb.Span` object itself which we previously ignored. As
a result, the actual footprint of the refresh spans could previously
significantly exceed the target size, especially when the keys are
small. For example, when looking at a recently collected core dump,
I saw the refresh spans taking up about 24MB in the heap whereas the
target setting is only 4MiB. This memory currently is not tracked
against the memory accounting system at all, so such over-shots are
quite bad, especially so given the recent bump of the setting from
256KiB to 4MiB.

Addresses: #64906.
Addresses: #81451.

Release note (ops change): The way we track memory against
`kv.transaction.max_intents_bytes` and
`kv.transaction.max_refresh_spans_bytes` has been adjusted to be more
precise (we no longer ignore some of the overhead). As a result, the
stability of CRDB improves (we're less likely to OOM), however, this
change effectively reduces the budgets determined by those cluster
settings. In practice, this means that
- the intents might be tracked more coarsely (due to span coalescing)
which makes the intent resolution less efficient
- the refresh spans become more coarse too making it more likely that
`ReadWithinUncertaintyIntervalError`s are returned to the user rather
than are retried transparently.

85156: changefeedccl: reduce allocations in kvevent blocking buffer  r=jayshrivastava a=jayshrivastava

This change removes a pointer from the kvevent.Event struct, reducing overall allocations. The hope is that this reduces the amount of work Go gc has to do, which will reduce SQL latency at the end of the day. When doing backfills, the allocations in kv events add up pretty fast, so reducing even one pointer is significant.

See #84709 for more info. I'm not closing the issue with this PR since we may decide to reduce more pointers in future PRs using some of the ideas in the issue comments.

Here are the benchmark results 
```
name          old time/op    new time/op    delta
MemBuffer-10    98.1µs ± 0%    95.8µs ± 1%   -2.35%  (p=0.008 n=5+5)

name          old alloc/op   new alloc/op   delta
MemBuffer-10    76.9kB ± 0%    64.4kB ± 0%  -16.17%  (p=0.008 n=5+5)

name          old allocs/op  new allocs/op  delta
MemBuffer-10       859 ± 0%       675 ± 0%  -21.42%  (p=0.008 n=5+5)

```



85368: roachtest: add KV/YCSB benchmarks with global MVCC range tombstones r=jbowens a=erikgrinaker

**kvserver: add env var to write global MVCC range tombstone**

This patch adds the envvar `COCKROACH_GLOBAL_MVCC_RANGE_TOMBSTONE`.
When enabled, it will write a global MVCC range tombstone across the
entire table data keyspan during cluster bootstrapping. This can be used
to test performance and correctness in the presence of MVCC range
tombstones, by activating range key-specific code paths while not
semantically affecting the data above it.

Touches #84384.

Release note: None

**roachtest: add KV/YCSB benchmarks with global MVCC range tombstones**

This patch adds a set of benchmark variants that write a single MVCC
range tombstone across the entire SQL keyspan at cluster start, via the
`COCKROACH_GLOBAL_MVCC_RANGE_TOMBSTONE` env var. Even though this range
tombstone will not affect the data written during the benchmarks, it
activates range key-specific code paths in the storage layer which can
have a significant impact on performance.

The new benchmarks are:

* `kv0/enc=false/nodes=3/cpu=32/mvcc-range-keys=global`
* `kv95/enc=false/nodes=3/cpu=32/mvcc-range-keys=global`
* `ycsb/A/nodes=3/cpu=32/mvcc-range-keys=global`
* `ycsb/B/nodes=3/cpu=32/mvcc-range-keys=global`
* `ycsb/C/nodes=3/cpu=32/mvcc-range-keys=global`
* `ycsb/D/nodes=3/cpu=32/mvcc-range-keys=global`
* `ycsb/E/nodes=3/cpu=32/mvcc-range-keys=global`
* `ycsb/F/nodes=3/cpu=32/mvcc-range-keys=global`

Resolves #84384.


Release note: None

85424: cmd/dev: add support for --show-diff flag from logictests r=rytaft a=rytaft

This commit adds support for the `--show-diff` flag when running tests
with `dev`. This flag is used by the logictests in order to show diffs
between the expected and actual output.

Release note: None

Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Jayant Shrivastava <[email protected]>
Co-authored-by: Erik Grinaker <[email protected]>
Co-authored-by: Rebecca Taft <[email protected]>
@arulajmani arulajmani self-assigned this Aug 29, 2022
@nvanbenschoten nvanbenschoten added GA-blocker and removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Sep 6, 2022
@arulajmani
Copy link
Collaborator Author

With @yuzefovich's patch patch the TPCC concurrency numbers (TODO:arul add link to roachperf once the view is updated) are ~similar to those we see in 22.1. Given this, and other benefits of bumping the max_refresh_span_bytes that prompted the change, we should be fine shipping the change in 22.2.

craig bot pushed a commit that referenced this issue Sep 21, 2022
88291: roachtest: use default value of max_refresh_spans_bytes for tpch_concurrency r=yuzefovich a=yuzefovich

This commit makes it so that we use the default value of `kv.transaction.max_refresh_spans_bytes` cluster setting in the tpch_concurrency roachtest. The idea is that we should be testing what we ship, and we do understand why the increase of the default for that setting in 22.2 led to regression on this roachtest. Many improvements have been made to get that back, so we now are on par with 22.1, and the corresponding issue has been closed. As a result, one test config is now removed. I decided to keep the "no streamer" config as it still seems useful, at least for 23.1 release cycle.

Related to #81451.

Release note: None

88349: nightlies: fix cloud unit test nightly script r=rhu713 a=adityamaru

Release note: None

88354: vendor: bump Pebble to 63300403d537 r=nicktrav a=jbowens

```
63300403 db: enable TrySeekUsingNext after Next in external iters
fa910870 db: add ExternalIter_NonOverlapping_SeekNextScan benchmark
1d444f36 sstable: include more blocks' stats in BlockBytes
d8f4eb38 docs: fix date for struct zeroing annotation
c04d1287 metamorphic: always synchronize Clone on underlying Batch
81a4342c docs: add benchmark annotation for #1822
```

Addresses #88329.

Release note: None

88357: bazel,ci: find `compare_test` binary under `bazel-bin` r=healthy-pod a=rickystewart

Since the Go 1.19 upgrade this has been broken as `realpath` has been getting the `-test.timeout` argument and been getting confused. Also since Go 1.19 it is must easier to find this binary which is right under the normal `bazel-bin`.

Release note: None

88359: sql: fix beautiful diagram which gofmt messed up r=Xiang-Gu a=ajwerner

This works. Don't ask me why.

Release note: None

Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: adityamaru <[email protected]>
Co-authored-by: Jackson Owens <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-investigation Further steps needed to qualify. C-label will change. GA-blocker T-kv KV Team
Projects
None yet
Development

No branches or pull requests

4 participants