Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
kvserver: kill TestSystemZoneConfigs
Fixes #98200. This test was written pre-spanconfig days, and when enabling spanconfigs by default over a year ago, opted out of using it. It's a real chore to bring this old test back up to spec (#100210 is an earlier attempt). It has been skipped for a while after flaking (for test-only reasons that are understood, see #100210) and is notoriously slow taking 30+s given it waits for actual upreplication and replica movement, making it not --stress friendly. In our earlier attempt to upgrade this to use spanconfigs, we learnt two new things: - There was a latent bug, previously thought to have been fixed in #75939. In very rare cases, right during cluster bootstrap before the span config reconciler has ever had a chance to run (i.e. system.span_configurations is empty), it was possible that the subscriber had subscribed to an empty span config state (we've only seen this happen in unit tests with 50ms scan intervals). So it was not been meaningfully "updated" in any sense of the word, but we still previously set a non-empty last-updated timestamp, something various components in KV rely on as proof that we have span configs as of some timestamp. As a result, we saw KV incorrectly merge away the liveness range into adjacent ranges, and then later split it off. We don't think we've ever seen this happen outside of tests as it instantly triggers the following fatal in the raftScheduler, which wants to prioritize the liveness range above all else: panic: priority range ID already set: old=2, new=61, first set at: This bug continues to exist. We've filed #104195 to track fixing it. - Fixing the bug above (by erroring out until a span config snapshot is available) made it so that tests now needed to actively wait for a span config snapshot before relocating ranges manually or using certain kv queues. Adding that synchronization made lots of tests a whole lot slower (by 3+s each) despite reducing the closed timestamp interval, etc. These tests weren't really being harmed by the bug (== empty span config snapshot). So it's not clear that the bug fix is worth fixing. But that can be litigated in #104195. We don't really need this test in this current form (end-to-end spanconfig tests exist elsewhere and are more comprehensive without suffering the issues above). Release note: None
- Loading branch information