kv/kvserver: TestMergeQueue failed #73107

cockroach-teamcity · 2021-11-24T08:26:25Z

kv/kvserver.TestMergeQueue failed with artifacts on master @ a132b15e6c9705a6922c2e476936597c6670e072:

=== RUN   TestMergeQueue
    test_log_scope.go:79: test logs captured to: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestMergeQueue870837684
    test_log_scope.go:80: use -show-logs to present logs inline
=== CONT  TestMergeQueue
    client_merge_test.go:4453: -- test log scope end --
test logs left over in: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestMergeQueue870837684
--- FAIL: TestMergeQueue (2.58s)
=== RUN   TestMergeQueue/load-based-merging/combined-qps-below-threshold
    client_merge_test.go:4397: ranges unexpectedly unmerged expected startKey /Table/Max, but got "\xfa\x00\x00"
        --- FAIL: TestMergeQueue/load-based-merging/combined-qps-below-threshold (0.03s)
=== RUN   TestMergeQueue/load-based-merging
    --- FAIL: TestMergeQueue/load-based-merging (0.14s)

Help

See also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM)Parameters in this failure:

TAGS=deadlock
GOFLAGS=-parallel=4

/cc @cockroachdb/kv _{This test on roachdash | Improve this report!

Jira issue: CRDB-11429}

The text was updated successfully, but these errors were encountered:

cockroach-teamcity · 2021-12-09T08:31:46Z

kv/kvserver.TestMergeQueue failed with artifacts on master @ a96f5b255fead76a94759231a0ee9b699e810956:

=== RUN   TestMergeQueue
    test_log_scope.go:79: test logs captured to: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestMergeQueue3321291283
    test_log_scope.go:80: use -show-logs to present logs inline
=== CONT  TestMergeQueue
    client_merge_test.go:4453: -- test log scope end --
test logs left over in: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestMergeQueue3321291283
--- FAIL: TestMergeQueue (2.57s)
=== RUN   TestMergeQueue/load-based-merging/combined-qps-below-threshold
    client_merge_test.go:4397: ranges unexpectedly unmerged expected startKey /Table/Max, but got "\xfa\x00\x00"
        --- FAIL: TestMergeQueue/load-based-merging/combined-qps-below-threshold (0.03s)
=== RUN   TestMergeQueue/load-based-merging
    --- FAIL: TestMergeQueue/load-based-merging (0.14s)

Help

See also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM)Parameters in this failure:

TAGS=deadlock
GOFLAGS=-parallel=4

_{This test on roachdash | Improve this report!}

cockroach-teamcity · 2022-02-09T08:14:31Z

kv/kvserver.TestMergeQueue failed with artifacts on master @ 456ff4c8692512f6d6f2b196f98c4a0691ca4589:

=== RUN   TestMergeQueue
    test_log_scope.go:79: test logs captured to: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestMergeQueue390597492
    test_log_scope.go:80: use -show-logs to present logs inline
=== CONT  TestMergeQueue
    client_merge_test.go:4456: -- test log scope end --
test logs left over in: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestMergeQueue390597492
--- FAIL: TestMergeQueue (2.15s)
=== RUN   TestMergeQueue/load-based-merging/combined-qps-below-threshold
    client_merge_test.go:4400: ranges unexpectedly unmerged expected startKey /Table/Max, but got "\xfa\x00\x00"
        --- FAIL: TestMergeQueue/load-based-merging/combined-qps-below-threshold (0.02s)
=== RUN   TestMergeQueue/load-based-merging
    --- FAIL: TestMergeQueue/load-based-merging (0.04s)

Help

See also: How To Investigate a Go Test Failure (internal)
Parameters in this failure:

GOFLAGS=-json

_{This test on roachdash | Improve this report!}

tbg · 2022-03-23T09:24:21Z

Letting CI stress this test here edit: actually here (hopefully fewer quoting issues)
Ok these didn't work but here's the winner: https://teamcity.cockroachdb.com/viewQueued.html?itemId=4663767

tbg · 2022-03-28T09:17:25Z

9163 runs completed, 0 failures, over 24h0m0s

erikgrinaker · 2022-03-28T09:22:54Z

What's your sense here? Real bug, or non-deterministic test?

tbg · 2022-03-28T12:03:00Z

I think a test problem:

cockroach/pkg/kv/kvserver/client_merge_test.go

Lines 4406 to 4407 in 160e9d2

    
           store.MustForceMergeScanAndProcess() 
        
           verifyMerged(t, store, lhsStartKey, rhsStartKey)

It seems reasonable that under certain conditions the merge queue wouldn't carry out the merge (some gossip-related thing not having arrived yet, etc).

github-actions · 2022-04-28T10:11:22Z

We have marked this test failure issue as stale because it has been
inactive for 1 month. If this failure is still relevant, removing the
stale label or adding a comment will keep it active. Otherwise,
we'll close it in 5 days to keep the test failure queue tidy.

erikgrinaker · 2022-07-19T11:00:52Z

Still seen to flake on master, see e.g. #79109 (comment).

86325: ui: add workload insight details page v1 r=ericharmeling a=ericharmeling This commit adds the v1 Workload Insight Details page to the DB Console, via the `cluster-ui` package. The v1 Workload Insight Details page only includes details about a specific High Wait Time transaction insight event, populated with information served a new "endpoint" built on top of the SQL-over-HTTP API. Note that this v1 page only includes information available on contention for individual transaction executions recorded in the `crdb_internal.transaction_contention_events` table. As most transaction statistics are aggregated across multiple transaction executions, there are some data missing in this v1, most notably: end time, rows processed, priority, full scan, retry count, last retry reason, session id. Fixes #83775. https://www.loom.com/share/3f7dbd1326954c4ca7949ed8539ae4e9 Release note (ui change): Added new Workload Insight Details page to DB Console Release justification: low-risk updates to new functionality 86412: kvserver: deflake TestMergeQueue{...} tests r=irfansharif a=irfansharif Fixes #73107. Fixes #79109. Fixes #85372. It's possible that under certain conditions the merge queue doesn't immediately merge what these tests are expecting. One possible reasons is span config asynchrony; the merge queue is actually disabled until each store learns about at least some span config state: #78122 (if we don't have any span configs available, enabling range merges would be extremely dangerous -- we could collapse everything into a single range). Release justification: stability work Release note: None 86416: bazel: remove `devdarwinx86_64` toolchain r=rail,dt,jlinder a=rickystewart We haven't come up with a `clang` 10 toolchain for macOS/arm64 because LLVM doesn't publish `clang` archives for that architecture. Instead of letting the two platforms differ to this extent, just remove the `dev` toolchain for x86_64. Closes #72828. Release justification: non-production code changes Release note: None 86421: kvevent: implement chunked blocking buffer r=jayshrivastava a=jayshrivastava ### **kvevent: refactor blocking buffer benchmark** This change updates the blocking buffer micro benchmark in several ways: - it uses different types of events - it uses more producers than consumers to keep the buffer full - it makes b.N correspond to the total number of events, so the benchmark can analyze allocs per event Release note: None Release justification: This change updates a benchmark only. ### **kvevent: implement chunked buffer event queue** This change implements a simple chunked event queue. The purpose of this queue is to be used by kvevent.blockingBuffer in subsequent commits. Release note: None Release justification: This change does not affect any production code. It adds files which are not called by any packages. ### **kvevent: refactor memory buffer to chunked linked list** This change refactors kvevent/blocking_buffer.go to use a chunked linked list instead of a regular linked list to reduce pointer usage. Note that the underlying sync.Pool, which is also a linked list, will use less pointers due to us pooling chunks instead of events. Release note: None Release justification: This change significantly improves performance by significantly reducing pressure on GC. Consequently, this significantly improves foreground SQL p99 latency. GC has been causing severe issues in production changefeeds. Merging this change in this release is worth it for its potential to reduce incidents. ### **Results (micro)** These are the results of running the microbenchmark. `./dev bench pkg/ccl/changefeedccl/kvevent --filter=BenchmarkMemBuffer --count=10 --bench-mem --stream-output --test-args="--test.benchtime=45s" -- --nocache_test_results --test_verbose_timeout_warnings |& tee bench.txt` ``` name old time/op new time/op delta MemBuffer-10 1.22µs ± 2% 0.85µs ± 3% -30.04% (p=0.000 n=8+10) name old alloc/op new alloc/op delta MemBuffer-10 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta MemBuffer-10 0.00 0.00 ~ (all equal) ``` - Memory usage is 0 due to pooling in both implementations. - We can achieve a higher throughput with the chunked implementation - about 50-60M events in 45 seconds as opposed to ~40M with the old implementation. ### **Results (Macro)** Full results are published [here](https://docs.google.com/document/d/10-Pc_cnkCUhkRFpLyBAId39q1aqsjuqTb2k4ehTmQCY/edit?usp=sharing). In summary: I analyzed performing by running TPC-C for 30 mins on a 15 node cluster with 10k warehouses. Before starting the workload, I started a changefeed on the `order_line` table (~200GB). I also set the following cluster settings to stress the buffer and pressure GC: `changefeed.backfill.concurrent_scan_requests = 100;` `changefeed.memory.per_changefeed_limit = '1073741824';` (~1GB) Then, I analyzed SQL latency from admin UI and GC performance using the output of `GODEBUG=gctrace=1.` These are the outcomes: - The p99 SQL latency during the workload was reduced from approx. 1.75s -> 0.150s (91%) - CPU time spent doing GC was reduced from 37.86 mins -> 20.75 mins (45%) - The p99 spike at the beginning of the workload was reduced from approx. 15s -> 12s (20%) ### Relevant Issues Addresses: #84582 (for now...) 86426: sql: bump memory limits in a couple of tests r=yuzefovich a=yuzefovich This commit is similar to dd82d13 where we bumped --max-sql-memory limit. We have recently started hitting the memory limit flakes again (because we do more accounting and have more internal queries), so this commit bumps up the limits significantly while still preserving the tests' spirit. Fixes: #86372. Release justification: test-only change. Release note: None 86458: insights: surface suboptimal plans and failed executions r=matthewtodd a=matthewtodd Fixes #85827. Release justification: Category 2: Bug fixes and low-risk updates to new functionality. Release note: None 86471: vendor: bump Pebble to 406c1dce54c9 r=itsbilal,nicktrav a=jbowens ``` 406c1dce *: add simpleLevelIterator, reduce merging levels in external iter cd7f076e metamorphic: reposition iterator after SetOptions 5f6b4325 db: make RangeKeyMasking.Filter a constructor function ``` In addition to the vendor bump, this commit makes some small code changes in response to interface changes within Pebbe. I kept these changes minimal for the Pebble bump PR, and we can follow up with fully adapting call sites to take advantage of the new ability to pass 'levels' of sstables into the sstable iterator. Release note: None Release justification: Non-production code changes, low-risk updates to new functionality. 86481: Makefile: fix logpb generation r=nicktrav a=ajwerner We needed to pass the flag. This will hit problems if any of the deps of the file change. Release justification: fixes the build Release note: None Co-authored-by: Eric Harmeling <[email protected]> Co-authored-by: irfan sharif <[email protected]> Co-authored-by: Ricky Stewart <[email protected]> Co-authored-by: Yevgeniy Miretskiy <[email protected]> Co-authored-by: Jayant Shrivastava <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Matthew Todd <[email protected]> Co-authored-by: Jackson Owens <[email protected]> Co-authored-by: Andrew Werner <[email protected]>

cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Nov 24, 2021

This comment has been minimized.

Sign in to view

tbg added the S-1 High impact: many users impacted, serious risk of high unavailability or data loss label Feb 1, 2022

AlexTalks added the T-kv KV Team label Feb 18, 2022

erikgrinaker mentioned this issue Mar 31, 2022

kv/kvserver: TestMergeQueueWithSlowNonVoterSnaps failed #79109

Closed

github-actions bot added no-test-failure-activity X-stale labels Apr 28, 2022

github-actions bot closed this as completed May 3, 2022

jlinder added the sync-me-3 label May 24, 2022

erikgrinaker reopened this Jul 19, 2022

github-actions bot removed X-stale no-test-failure-activity labels Jul 19, 2022

irfansharif self-assigned this Aug 18, 2022

irfansharif mentioned this issue Aug 18, 2022

kvserver: deflake TestMergeQueue{...} tests #86412

Merged

irfansharif removed the S-1 High impact: many users impacted, serious risk of high unavailability or data loss label Aug 19, 2022

craig bot closed this as completed in 8f335cb Aug 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv/kvserver: TestMergeQueue failed #73107

kv/kvserver: TestMergeQueue failed #73107

cockroach-teamcity commented Nov 24, 2021 •

edited by cockroach-jira-scripts

Loading

cockroach-teamcity commented Dec 9, 2021

This comment has been minimized.

This comment has been minimized.

cockroach-teamcity commented Feb 9, 2022

tbg commented Mar 23, 2022 •

edited

Loading

tbg commented Mar 28, 2022

erikgrinaker commented Mar 28, 2022

tbg commented Mar 28, 2022

github-actions bot commented Apr 28, 2022

erikgrinaker commented Jul 19, 2022

kv/kvserver: TestMergeQueue failed #73107

kv/kvserver: TestMergeQueue failed #73107

Comments

cockroach-teamcity commented Nov 24, 2021 • edited by cockroach-jira-scripts Loading

cockroach-teamcity commented Dec 9, 2021

This comment has been minimized.

This comment has been minimized.

cockroach-teamcity commented Feb 9, 2022

tbg commented Mar 23, 2022 • edited Loading

tbg commented Mar 28, 2022

erikgrinaker commented Mar 28, 2022

tbg commented Mar 28, 2022

github-actions bot commented Apr 28, 2022

erikgrinaker commented Jul 19, 2022

cockroach-teamcity commented Nov 24, 2021 •

edited by cockroach-jira-scripts

Loading

tbg commented Mar 23, 2022 •

edited

Loading