-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: slow SST NextKey()
scans during restore
#88329
Labels
A-storage
Relating to our storage engine (Pebble) on-disk storage.
branch-release-22.2
Used to mark GA and release blockers, technical advisories, and bugs for 22.2
C-performance
Perf of queries or internals. Solution not expected to change functional behavior.
GA-blocker
sync-me
T-storage
Storage Team
Comments
erikgrinaker
added
C-performance
Perf of queries or internals. Solution not expected to change functional behavior.
A-storage
Relating to our storage engine (Pebble) on-disk storage.
T-storage
Storage Team
labels
Sep 21, 2022
Fix in progress here: cockroachdb/pebble#1960 |
Hi @erikgrinaker, please add branch-* labels to identify which branch(es) this release-blocker affects. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
erikgrinaker
added
the
branch-release-22.2
Used to mark GA and release blockers, technical advisories, and bugs for 22.2
label
Sep 21, 2022
This was referenced Sep 21, 2022
jbowens
added a commit
to jbowens/cockroach
that referenced
this issue
Sep 21, 2022
``` 63300403 db: enable TrySeekUsingNext after Next in external iters fa910870 db: add ExternalIter_NonOverlapping_SeekNextScan benchmark 1d444f36 sstable: include more blocks' stats in BlockBytes d8f4eb38 docs: fix date for struct zeroing annotation c04d1287 metamorphic: always synchronize Clone on underlying Batch 81a4342c docs: add benchmark annotation for cockroachdb#1822 ``` Addresses cockroachdb#88329. Release note: None
craig bot
pushed a commit
that referenced
this issue
Sep 21, 2022
88291: roachtest: use default value of max_refresh_spans_bytes for tpch_concurrency r=yuzefovich a=yuzefovich This commit makes it so that we use the default value of `kv.transaction.max_refresh_spans_bytes` cluster setting in the tpch_concurrency roachtest. The idea is that we should be testing what we ship, and we do understand why the increase of the default for that setting in 22.2 led to regression on this roachtest. Many improvements have been made to get that back, so we now are on par with 22.1, and the corresponding issue has been closed. As a result, one test config is now removed. I decided to keep the "no streamer" config as it still seems useful, at least for 23.1 release cycle. Related to #81451. Release note: None 88349: nightlies: fix cloud unit test nightly script r=rhu713 a=adityamaru Release note: None 88354: vendor: bump Pebble to 63300403d537 r=nicktrav a=jbowens ``` 63300403 db: enable TrySeekUsingNext after Next in external iters fa910870 db: add ExternalIter_NonOverlapping_SeekNextScan benchmark 1d444f36 sstable: include more blocks' stats in BlockBytes d8f4eb38 docs: fix date for struct zeroing annotation c04d1287 metamorphic: always synchronize Clone on underlying Batch 81a4342c docs: add benchmark annotation for #1822 ``` Addresses #88329. Release note: None 88357: bazel,ci: find `compare_test` binary under `bazel-bin` r=healthy-pod a=rickystewart Since the Go 1.19 upgrade this has been broken as `realpath` has been getting the `-test.timeout` argument and been getting confused. Also since Go 1.19 it is must easier to find this binary which is right under the normal `bazel-bin`. Release note: None 88359: sql: fix beautiful diagram which gofmt messed up r=Xiang-Gu a=ajwerner This works. Don't ask me why. Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: adityamaru <[email protected]> Co-authored-by: Jackson Owens <[email protected]> Co-authored-by: Ricky Stewart <[email protected]> Co-authored-by: Andrew Werner <[email protected]>
jbowens
added a commit
to jbowens/cockroach
that referenced
this issue
Sep 23, 2022
``` d8728d2a db: fix RangeKeyChanged and -WithLimit interaction 986f0c8d sstable: fix interaction between bpf and monotonic bounds optimization c13723fd db: enable TrySeekUsingNext after Next in external iters d0971a91 db: add ExternalIter_NonOverlapping_SeekNextScan benchmark 5269d612 sstable: include more blocks' stats in BlockBytes 3b11f3dd db: expand iter_histories test coverage 5580541b db: refactor TestRangeKeys into TestIterHistories ``` Close cockroachdb#88329. Close cockroachdb#88296. Release note: None
adityamaru
added a commit
to adityamaru/cockroach
that referenced
this issue
Oct 4, 2022
This setting was previously disabled because of timeouts being observed when restoring our TPCCInc fixtures. The cause of those timeouts has been identified as cockroachdb#88329 making it safe to re-enable merging of spans during restore. This settings prevents restore from over-splitting and leaving the cluster with a merge hangover post restore. Informs: cockroachdb#86470 Release note (sql change): Sets `backup.restore_span.target_size` to default to 384 MiB so that restore merges upto that size of spans when reading from the backup before actually ingesting data. This should reduce the number of ranges created during restore and thereby reduce the merging of ranges that needs to occur post restore.
craig bot
pushed a commit
that referenced
this issue
Oct 4, 2022
87449: workload,ttl: add TTL workload for benchmarking time to finish r=rafiss a=ecwall fixes #88172 Measures time row-level TTL job takes to run on a table: 1) Drop TTL table IF EXISTS. 2) Create a table without TTL. 3) Insert initialRowCount number of rows. 4) Gets number of rows that should expire. 5) Wait for table ranges to stabilize after scattering. 6) Enable TTL on table. 7) Poll table until TTL job is complete. Note: Ops is a no-op and no histograms are used. Benchmarking is done inside Hooks and details are logged. Adds useDistSQL field to TTL job progress protobuf for visibility into which version was run during cluster upgrades. Release justification: Added TTL workload. Release note: None 89317: sql,tree: improve function resolution efficiency r=ajwerner a=ajwerner #### sql: prevent allocations by avoiding some name pointers We don't need pointers for these names. They generally won't escape. #### sql,tree: change SearchPath to avoid allocations The closure-oriented interface was forcing the closures and the variables they referenced to escape to the heap. This change, while not beautiful, ends up being much more efficient. ``` name old time/op new time/op delta SQL/MultinodeCockroach/Upsert/count=1000-16 20.4ms ±11% 18.9ms ± 8% -7.47% (p=0.000 n=20+19) name old alloc/op new alloc/op delta SQL/MultinodeCockroach/Upsert/count=1000-16 10.1MB ±29% 9.8MB ±29% ~ (p=0.231 n=20+20) name old allocs/op new allocs/op delta SQL/MultinodeCockroach/Upsert/count=1000-16 56.3k ± 7% 50.2k ±10% -10.81% (p=0.000 n=19+19) ``` Release note: None 89333: backupccl: enable `restore_span.target_size` r=dt,stevendanna a=adityamaru This setting was previously disabled because of timeouts being observed when restoring our TPCCInc fixtures. The cause of those timeouts has been identified as #88329 making it safe to re-enable merging of spans during restore. This settings prevents restore from over-splitting and leaving the cluster with a merge hangover post restore. Informs: #86470 Release note (sql change): Sets `backup.restore_span.target_size` to default to 384 MiB so that restore merges upto that size of spans when reading from the backup before actually ingesting data. This should reduce the number of ranges created during restore and thereby reduce the merging of ranges that needs to occur post restore. Co-authored-by: Evan Wall <[email protected]> Co-authored-by: Andrew Werner <[email protected]> Co-authored-by: adityamaru <[email protected]>
blathers-crl bot
pushed a commit
that referenced
this issue
Oct 4, 2022
This setting was previously disabled because of timeouts being observed when restoring our TPCCInc fixtures. The cause of those timeouts has been identified as #88329 making it safe to re-enable merging of spans during restore. This settings prevents restore from over-splitting and leaving the cluster with a merge hangover post restore. Informs: #86470 Release note (sql change): Sets `backup.restore_span.target_size` to default to 384 MiB so that restore merges upto that size of spans when reading from the backup before actually ingesting data. This should reduce the number of ranges created during restore and thereby reduce the merging of ranges that needs to occur post restore.
exalate reopened |
blathers-crl bot
pushed a commit
that referenced
this issue
Oct 24, 2022
This setting was previously disabled because of timeouts being observed when restoring our TPCCInc fixtures. The cause of those timeouts has been identified as #88329 making it safe to re-enable merging of spans during restore. This settings prevents restore from over-splitting and leaving the cluster with a merge hangover post restore. Informs: #86470 Release note (sql change): Sets `backup.restore_span.target_size` to default to 384 MiB so that restore merges upto that size of spans when reading from the backup before actually ingesting data. This should reduce the number of ranges created during restore and thereby reduce the merging of ranges that needs to occur post restore.
blathers-crl bot
pushed a commit
that referenced
this issue
Oct 25, 2022
This setting was previously disabled because of timeouts being observed when restoring our TPCCInc fixtures. The cause of those timeouts has been identified as #88329 making it safe to re-enable merging of spans during restore. This settings prevents restore from over-splitting and leaving the cluster with a merge hangover post restore. Informs: #86470 Release note (sql change): Sets `backup.restore_span.target_size` to default to 384 MiB so that restore merges upto that size of spans when reading from the backup before actually ingesting data. This should reduce the number of ranges created during restore and thereby reduce the merging of ranges that needs to occur post restore.
We're going to re-run the benchmarks to confirm that recent fixes made an impact here. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-storage
Relating to our storage engine (Pebble) on-disk storage.
branch-release-22.2
Used to mark GA and release blockers, technical advisories, and bugs for 22.2
C-performance
Perf of queries or internals. Solution not expected to change functional behavior.
GA-blocker
sync-me
T-storage
Storage Team
Backup restoration uses an SST iterator that multiplexes several input SSTs, and then scans across them using
NextKey()
.pebbleIterator
will try to step once, and if it's not on a new key then it will seek:cockroach/pkg/storage/pebble_iterator.go
Lines 487 to 492 in 4cf84e0
However, this has been seen to be very expensive. Pebble itself has an optimization to step rather than seek if the target is nearby, but no such optimization exists for the SST iterator. Furthermore, if the SSTs are exhausted, the SST reader will attempt to reseek the SST, which is very expensive.
We need to improve performance here.
Jira issue: CRDB-19772
Epic CRDB-20465
The text was updated successfully, but these errors were encountered: