Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
roachtest: fix replicagc-changed-peers
The test ends up in the following situation: n1: down, no replicas n2: down, no replicas n3: alive, with constraint that wants all replicas to move, and there may be a few ranges still on n3 n4-n6: alive where the ranges predominantly 3x-replicated. The test is then verifying that the replica count (as in, replicas on n3, in contrast to replicas assigned via the meta ranges) on n3 drops to zero. However, system ranges cannot move in this configuration. The number of cluster nodes is six (decommission{ing,ed} nodes would be excluded, but no nodes are decommission{ing,ed} here) and so the system ranges operate at a replication factor of five. There are only four live nodes here, so if n3 is still a member of any system ranges, they will stay there and the test fails. This commit attempts to rectify that by making sure that while n3 is down earlier in the test, all replicas are moved from it. That was always the intent of the test, which is concerned with n3 realizing that replicas have moved elsewhere and initiating replicaGC; however prior to this commit it was always left to chance whether n3 would or would not have replicas assigned to it by the time the test moved to the stage above. The reason the test wasn't previously waiting for all replicas to be moved off n3 while it was down was that it required checking the meta ranges, which wasn't necessary for the other two nodes. This commit passed all five runs of replicagc-changed-peers/restart=false, so I think it reliably addresses the problem. There is still the lingering question of why this is failing only now (note that both flavors of the test failed on master last night, so I doubt it is rare). We just merged cockroachdb#67319 which is likely somehow related. Fixes cockroachdb#67910. Fixes cockroachdb#67914. Release note: None
- Loading branch information