You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@irfansharif reported this log from a WIP test1. In the test, r64 lost quorum and the circuit breaker tripped. But then we see this on r1, which should be healthy (it's a single-voter group):
Almost certainly this will have something to do with the fact that the first range has a start key of KeyMin (empty key) but the addressable keyspace only starts at keys.LocalMax (\x02). Both replicaGC and constraint reports scan meta2, which is contained in r1 in this test. In doing so they seem to acquire a latch that spans range-local replicated keys for r64 (and likely all other ranges). This is incorrect.
I don't understand how that latch gets into r1's latch manager. Each replica has its own latch manager so even if r64 has a bunch of poisoned latches why does it show up in r1?
To reproduce, I used the following from #98308. One oddity was that I was suppressing time-based election timeouts, so I'd use RaftElectionTimeoutTicks: 10000.
$ dev test pkg/kv/kvserver -f TestFlowControlRaftMembershipRemoveSelf/transfer-lease-first=true \
-v --show-logs --stream-output --timeout 10m --stress --ignore-cache
Describe the problem
@irfansharif reported this log from a WIP test1. In the test, r64 lost quorum and the circuit breaker tripped. But then we see this on r1, which should be healthy (it's a single-voter group):
and
Almost certainly this will have something to do with the fact that the first range has a start key of
KeyMin
(empty key) but the addressable keyspace only starts atkeys.LocalMax
(\x02
). Both replicaGC and constraint reports scan meta2, which is contained in r1 in this test. In doing so they seem to acquire a latch that spans range-local replicated keys for r64 (and likely all other ranges). This is incorrect.I don't understand how that latch gets into r1's latch manager. Each replica has its own latch manager so even if r64 has a bunch of poisoned latches why does it show up in r1?
Jira issue: CRDB-27742
Footnotes
https://cockroachlabs.slack.com/archives/G01G8LK77DK/p1683495028679899 ↩
The text was updated successfully, but these errors were encountered: