-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: Decommissioning can get stuck by dormant replicas never getting GC'ed #17288
Comments
Processing such a Replica requires a consistent RangeLookup, and we wanted to avoid hammering the metadata ranges, but with quiescence it looks like we're not going to try to GC the replica until after 10 days, which clearly isn't going to cut it. I think it's fine (at least for now) to wake up dormant replicas in |
Yeah, although with how fast the scanner runs on nodes that don't have many ranges, we definitely shouldn't do a consistent lookup or wake dormant replicas every time. And to correct my initial post, even restarting the node doesn't wake them up, it turns out. |
It's not that expensive to wake them up though, is it? The group will go dormant again after ~1 round. One perhaps better solution could be to signal to replicas which are about to removed that that is happening. I'd have to page the details back in, but iirc the new configuration in a replica change goes into effect pretty early, and that's why a removed replica often doesn't learn about it until later. We could just commit a Raft command (could do a direct RPC to the node too, but that doesn't seem less onerous) that triggers "eager gc" for a while on the replica that is supposedly getting removed. Then the scanner would only do eager work for replicas with that flag (as long as the flag is reasonably fresh, say 5min). |
Yeah, but they'll be getting woken up every Do we need to worry about how fast GC happens in situations other than decommissioning? If not, we can just GC more eagerly when in a decommissioning state. |
That's a good idea.
Not really, though it's one of those things that's often annoying. You're debugging something, and there are these old replicas laying around -- I'd say it'd be nice to smoothen out this process, but it's shouldn't be crucial. |
Tests the fix in cockroachdb#17304 for issue cockroachdb#17288
Tests the fix in cockroachdb#17304 for issue cockroachdb#17288
Tests the fix in cockroachdb#17304 for issue cockroachdb#17288
Tests the fix in cockroachdb#17304 for issue cockroachdb#17288
While playing around with replica decommissioning, I was able to get the process stuck. It's stuck because even though all replicas have been officially replicated away from the node, it still has two dormant, non-GC'ed replicas on it, and thus it still shows up as not being empty:
To be honest, I'm not quite sure how the two replicas left on the node never received the raft commands that removed them from the range, but now that they're in this state they're stuck forever (or until restarting the node, presumably) because their dormant state keeps them from trying to send traffic to the other replicas and learning about the fact they were removed.
@tschottdorf @garvitjuniwal
The text was updated successfully, but these errors were encountered: