-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv/kvserver: TestMergeQueue failed #97000
Comments
kv/kvserver.TestMergeQueue failed with artifacts on master @ 2a7edbeb0737b1309064c25c641a309c2980d9ba:
Parameters: |
kv/kvserver.TestMergeQueue failed with artifacts on master @ 2a7edbeb0737b1309064c25c641a309c2980d9ba:
Parameters: |
Will look into later today. #96128 |
kv/kvserver.TestMergeQueue failed with artifacts on master @ 3e26d85118ef73133c00b04b17449c27c31b8bc4:
Parameters: |
Previously, it was possible for ranges to spuriously merge when not expected to in `TestMergeQueue`. Unexpected merging occurred due to the merge delay interval being shorter than the time between test statements - so that a merge could sneak when it shouldn't have. This was only realistically possible running `--stress` and with a slower `deadlock` build. This commit updates the merge delay to be 1000 seconds, to avoid this situation occurring. Informs: cockroachdb#97000 Release note: None
kv/kvserver.TestMergeQueue failed with artifacts on master @ f3ff41774a902d6005dbfad504135e64d9434daf:
Parameters: |
97086: kvserver: deflake test merge queue r=andrewbaptist a=kvoli Previously, it was possible for ranges to spuriously merge when not expected to in `TestMergeQueue`. Unexpected merging occurred due to the merge delay interval being shorter than the time between test statements - so that a merge could sneak when it shouldn't have. This was only realistically possible running `--stress` and with a slower `deadlock` build. This commit updates the merge delay to be 1000 seconds, to avoid this situation occurring. Informs: #97000 Release note: None Co-authored-by: Austen McClernon <[email protected]>
kv/kvserver.TestMergeQueue failed with artifacts on master @ 51be9f048a59be0d4353498b447d162145384d13:
Parameters: |
kv/kvserver.TestMergeQueue failed with artifacts on master @ 5f85453c39c3fe74e96c3f004181a26a7220aa3c:
Parameters: |
kv/kvserver.TestMergeQueue failed with artifacts on master @ 7e2df35a2f6bf7a859bb0539c8ca43c4e72ed260:
Parameters: |
kv/kvserver.TestMergeQueue failed with artifacts on master @ c95bef097bd4c213c6b5c0c125a9a846c4479d73:
Parameters: |
kv/kvserver.TestMergeQueue failed with artifacts on master @ 3d054f37c7c87f53cb56fac4e5500f0d1130d09a:
Parameters: |
kv/kvserver.TestMergeQueue failed with artifacts on master @ a9d4e7040c538aeaa0e0e049e5525e2569eb364b:
Parameters: |
@kvoli Are you looking into this? |
I looked into the non deadlock flake and opened a patch. I'm looking into the deadlock issue atm. |
kv/kvserver.TestMergeQueue failed with artifacts on master @ dd2749ae4ab61eed2f99238acb74e8d3c6b4cb1d:
Parameters: |
kv/kvserver.TestMergeQueue failed with artifacts on master @ 286b3e235171a39b8f9910555affcc7ce310741a:
Parameters: |
Previously, changing the rebalance objective could lead to inconsistent locking order between the load based splitter and rebalance objective. The split config is created per replica, rather than per store as it was previously. The split config and split decider are bundled underneath a new mutex which ensures consistent access. Resolves: cockroachdb#97000 Release note: None
Previously, changing the rebalance objective could lead to inconsistent locking order between the load based splitter and rebalance objective. When the objective was updated, the previous method also blocked batch requests from completing until every replica lb splitter was reset. This commit moves the split objective to be a variable owned by the decider, rather than inferred on each decider operation. The split objective is updated on a rebalance objective change atomically over each replica but not atomically over a store. This removes the need for blocking batch requests until every replica is updated. Resolves: cockroachdb#97000 Resolves: cockroachdb#97445 Resolves: cockroachdb#97450 Resolves: cockroachdb#97452 Resolves: cockroachdb#97457 Release note: None
kv/kvserver.TestMergeQueue failed with artifacts on master @ 821ffce6292895f1b43e89ea7cc65a5703cf1506:
Parameters: |
Previously, changing the rebalance objective could lead to inconsistent locking order between the load based splitter and rebalance objective. When the objective was updated, the previous method also blocked batch requests from completing until every replica lb splitter was reset. This commit moves the split objective to be a variable owned by the decider, rather than inferred on each decider operation. The split objective is updated on a rebalance objective change atomically over each replica but not atomically over a store. This removes the need for blocking batch requests until every replica is updated. Resolves: cockroachdb#97000 Resolves: cockroachdb#97445 Resolves: cockroachdb#97450 Resolves: cockroachdb#97452 Resolves: cockroachdb#97457 Release note: None
97148: changefeedccl: Expire protected timestamps r=miretskiy a=miretskiy Changefeeds utilize protected timestamp system (PTS) to ensure that the data targeted by changefeed is not garbage collected prematurely. PTS record is managed by running changefeed by periodically updating PTS record timestamp, so that the data older than the that timestamp may be GCed. However, if the changefeed stops running when it is paused (either due to operator action, or due to `on_error=pause` option, the PTS record remains so that the changefeed can be resumed at a later time. However, it is also possible that operator may not notice that the job is paused for too long, thus causing buildup of garbage data. Excessive buildup of GC work is not great since it impacts overall cluster performance, and, once GC can resume, its cost is proportional to how much GC work needs to be done. This PR introduces a new changefeed option `gc_protect_expires_after` to automatically expire PTS records that are too old. This automatic expiration is a safety mechanism in case changefeed job gets paused by an operator or due to an error, while holding onto PTS record due to `protect_gc_on_pause` option. The operator is still expected to monitor changefeed jobs, and to restart paused changefeeds expediently. If the changefeed job remains paused, and the underlying PTS records expires, then the changefeed job will be canceled to prevent build up of GC data. Epic: [CRDB-21953](https://cockroachlabs.atlassian.net/browse/CRDB-21953) Informs #84598 Release note (enterprise change): Changefeed will automatically expire PTS records for paused jobs if changefeed is configured with `gc_protect_expires_after` option. 97539: kvserver: fix deadlock on rebalance obj change r=kvoli a=kvoli Previously, changing the rebalance objective could lead to inconsistent locking order between the load based splitter and rebalance objective. When the objective was updated, the previous method also blocked batch requests from completing until every replica lb splitter was reset. This commit moves the split objective to be a variable owned by the decider, rather than inferred on each decider operation. The split objective is updated on a rebalance objective change atomically over each replica but not atomically over a store. This removes the need for blocking batch requests until every replica is updated. Resolves: #97000 Resolves: #97445 Resolves: #97450 Resolves: #97452 Resolves: #97457 Release note: None Co-authored-by: Yevgeniy Miretskiy <[email protected]> Co-authored-by: Austen McClernon <[email protected]>
kv/kvserver.TestMergeQueue failed with artifacts on master @ 2a7edbeb0737b1309064c25c641a309c2980d9ba:
Parameters:
TAGS=bazel,gss,deadlock
Help
See also: How To Investigate a Go Test Failure (internal)
This test on roachdash | Improve this report!
Jira issue: CRDB-24450
The text was updated successfully, but these errors were encountered: