-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gossip: demonstrate problem with loopback info propagation #34472
gossip: demonstrate problem with loopback info propagation #34472
Conversation
Glad the problem is where we hoped it was going to be. Is the "fix" as simple as making sure that the first time a node receives Gossip, it receives everything that isn't expired? More generally, infinite TTLs are problematic in themselves because we can't delete from Gossip (and I'd rather not introduce it), but at the same time this isn't a problem with infinite TTLs, right? It would affect any TTL that was longer-lived. In the case of node liveness, we could Gossip liveness record X with a TTL that makes the gossip record expire at the liveness entry's expiration + one week, meaning that a week after decommissioning, the node would disappear from Gossip. |
Is that sufficient? What happens if a node starts up and connects to another node that is partitioned? It might only only receive the the non-expired entries from that node. Or were you suggesting that we send all non-expired entries every time a gossip connection is made, disregarding the highwater stamps?
The problem here is bad for any gossip record that is gossiped by more than one node and that isn't gossiped on a regular interval. I think the liveness records might be the most problematic case, but I need to do an audit of the other keys. I'm hesitant to replace infinite TTLs with ones that expire after a week. Figuring out that a gossip record expired after a week and that is causing problems in a cluster isn't something I want to debug. Super subtle problems in gossip like we're continuing to discover make me want to replace it, not keep on applying bandaids. |
This might also affect the table-stat-added infos and the system-config info. I don't think the table-stat-added infos is problematic, because that key simply causes the table stat cache to invalidate an entry. If a node restarts, its cache will be empty. The system-config may also not be problematic, but I think that is only because there is careful logic to re-gossip the system-config if the leaseholder for the ranges changes. See:
I also just discovered that we're aware that a node's gossiped infos won't necessarily come back to it on restart. See 1b43a93:
|
Yeah, a week is long enough to be very hard to debug. I'd suggest that we have a limit of say an hour for any non-infinite TTL (so anything that needs to live longer than that needs to be re-gossiped periodically).
I think what we probably need is to start identifying gossip connections by a tuple like (node id, start time) so that a restarted node is not seen as a continuation of its previous incarnation. |
Release note: None
Heh, I just filed a bug where I mention that as being an ideal solution. I think there are complexities in actually implementing this and handling the backwards compatibility, though. I need to think about this more. |
Similar to cockroachdb#17285. We need to always gossip node liveness infos after the node-liveness range lease changes hands due to the way gossip handles propagation of infos which "loopback" to the sender. Due to the highwater stamp mechanism, such infos are never sent back to the sender. If the gossip info is subsequently never re-gossiped, a node in the cluster may never see the info. This is only a problem for gossip keys which can be gossiped from multiple nodes, such as the node liveness keys and the system config key. The other keys which can be gossiped from multiple nodes are the table-stats-added key, and the table-disable-merges key, neither of which appear to have problematic scenarios associated with the failure to propagate them back to the sender after a restart. See cockroachdb#34494 Release note: None
cf99eef
to
4034f80
Compare
I added a 2nd commit which works around the problem with loopback propagation of infos. This is similar to what was done in #17285. I consider this a temporary bandaid that is being utilized because it is much simpler and more easily backported than a fix inside gossip itself. PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1, 2 of 2 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell)
bors r=tbg |
34472: gossip: demonstrate problem with loopback info propagation r=tbg a=petermattis Release note: None Co-authored-by: Peter Mattis <[email protected]>
Build succeeded |
Release note: None