gossip: demonstrate problem with loopback info propagation #34472

petermattis · 2019-01-31T23:08:01Z

Release note: None

cockroach-teamcity · 2019-01-31T23:08:08Z

This change is

tbg · 2019-02-01T11:56:05Z

Glad the problem is where we hoped it was going to be. Is the "fix" as simple as making sure that the first time a node receives Gossip, it receives everything that isn't expired?

More generally, infinite TTLs are problematic in themselves because we can't delete from Gossip (and I'd rather not introduce it), but at the same time this isn't a problem with infinite TTLs, right? It would affect any TTL that was longer-lived. In the case of node liveness, we could Gossip liveness record X with a TTL that makes the gossip record expire at the liveness entry's expiration + one week, meaning that a week after decommissioning, the node would disappear from Gossip.

petermattis · 2019-02-01T15:22:52Z

Glad the problem is where we hoped it was going to be. Is the "fix" as simple as making sure that the first time a node receives Gossip, it receives everything that isn't expired?

Is that sufficient? What happens if a node starts up and connects to another node that is partitioned? It might only only receive the the non-expired entries from that node. Or were you suggesting that we send all non-expired entries every time a gossip connection is made, disregarding the highwater stamps?

More generally, infinite TTLs are problematic in themselves because we can't delete from Gossip (and I'd rather not introduce it), but at the same time this isn't a problem with infinite TTLs, right? It would affect any TTL that was longer-lived. In the case of node liveness, we could Gossip liveness record X with a TTL that makes the gossip record expire at the liveness entry's expiration + one week, meaning that a week after decommissioning, the node would disappear from Gossip.

The problem here is bad for any gossip record that is gossiped by more than one node and that isn't gossiped on a regular interval. I think the liveness records might be the most problematic case, but I need to do an audit of the other keys.

I'm hesitant to replace infinite TTLs with ones that expire after a week. Figuring out that a gossip record expired after a week and that is causing problems in a cluster isn't something I want to debug. Super subtle problems in gossip like we're continuing to discover make me want to replace it, not keep on applying bandaids.

petermattis · 2019-02-01T15:38:10Z

I think the liveness records might be the most problematic case, but I need to do an audit of the other keys.

This might also affect the table-stat-added infos and the system-config info. I don't think the table-stat-added infos is problematic, because that key simply causes the table stat cache to invalidate an entry. If a node restarts, its cache will be empty. The system-config may also not be problematic, but I think that is only because there is careful logic to re-gossip the system-config if the leaseholder for the ranges changes. See:

	if gossipedCfg := r.store.Gossip().GetSystemConfig(); gossipedCfg != nil && gossipedCfg.Equal(loadedCfg) &&
		r.store.Gossip().InfoOriginatedHere(gossip.KeySystemConfig) {
		log.VEventf(ctx, 2, "not gossiping unchanged system config")
		return nil
	}

I also just discovered that we're aware that a node's gossiped infos won't necessarily come back to it on restart. See 1b43a93:

storage: Always gossip system config after lease changes hands

Without doing this, it's possible for n1 to gossip the system config,
lose its lease, and restart with the source node of the system config
gossip still being n1. If this happens, other nodes will not provide the
system config gossip back to n1 because its watermark for itself will be
higher than the system config's timestamp.

bdarnell · 2019-02-01T16:16:14Z

I'm hesitant to replace infinite TTLs with ones that expire after a week.

Yeah, a week is long enough to be very hard to debug. I'd suggest that we have a limit of say an hour for any non-infinite TTL (so anything that needs to live longer than that needs to be re-gossiped periodically).

Is that sufficient? What happens if a node starts up and connects to another node that is partitioned? It might only only receive the the non-expired entries from that node. Or were you suggesting that we send all non-expired entries every time a gossip connection is made, disregarding the highwater stamps?

I think what we probably need is to start identifying gossip connections by a tuple like (node id, start time) so that a restarted node is not seen as a continuation of its previous incarnation.

Release note: None

petermattis · 2019-02-01T16:19:43Z

I think what we probably need is to start identifying gossip connections by a tuple like (node id, start time) so that a restarted node is not seen as a continuation of its previous incarnation.

Heh, I just filed a bug where I mention that as being an ideal solution. I think there are complexities in actually implementing this and handling the backwards compatibility, though. I need to think about this more.

Similar to cockroachdb#17285. We need to always gossip node liveness infos after the node-liveness range lease changes hands due to the way gossip handles propagation of infos which "loopback" to the sender. Due to the highwater stamp mechanism, such infos are never sent back to the sender. If the gossip info is subsequently never re-gossiped, a node in the cluster may never see the info. This is only a problem for gossip keys which can be gossiped from multiple nodes, such as the node liveness keys and the system config key. The other keys which can be gossiped from multiple nodes are the table-stats-added key, and the table-disable-merges key, neither of which appear to have problematic scenarios associated with the failure to propagate them back to the sender after a restart. See cockroachdb#34494 Release note: None

petermattis · 2019-02-01T19:15:29Z

I added a 2nd commit which works around the problem with loopback propagation of infos. This is similar to what was done in #17285. I consider this a temporary bandaid that is being utilized because it is much simpler and more easily backported than a fix inside gossip itself.

PTAL

petermattis · 2019-02-05T13:48:25Z

@tbg or @bdarnell I'd like to get this merged and backported. Any concerns with the hackiness of this "fix"?

tbg

Reviewed 1 of 1 files at r1, 2 of 2 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell)

petermattis · 2019-02-05T14:47:06Z

bors r=tbg

34472: gossip: demonstrate problem with loopback info propagation r=tbg a=petermattis Release note: None Co-authored-by: Peter Mattis <[email protected]>

craig · 2019-02-05T15:13:57Z

Build succeeded

GitHub CI (Cockroach)

petermattis requested a review from a team January 31, 2019 23:08

petermattis requested review from tbg and bdarnell January 31, 2019 23:08

gossip: demonstrate problem with loopback info propagation

e4758c2

Release note: None

petermattis force-pushed the pmattis/gossip-loopback-info-propagation branch from cf99eef to 4034f80 Compare February 1, 2019 19:09

petermattis mentioned this pull request Feb 2, 2019

gossip: loopback infos are often not received #34494

Open

tbg approved these changes Feb 5, 2019

View reviewed changes

craig bot pushed a commit that referenced this pull request Feb 5, 2019

Merge #34472

d1591c3

34472: gossip: demonstrate problem with loopback info propagation r=tbg a=petermattis Release note: None Co-authored-by: Peter Mattis <[email protected]>

craig bot merged commit 4034f80 into cockroachdb:master Feb 5, 2019

petermattis deleted the pmattis/gossip-loopback-info-propagation branch February 5, 2019 17:25

petermattis mentioned this pull request Feb 5, 2019

release-2.1: gossip: demonstrate problem with loopback info propagation #34593

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gossip: demonstrate problem with loopback info propagation #34472

gossip: demonstrate problem with loopback info propagation #34472

petermattis commented Jan 31, 2019

cockroach-teamcity commented Jan 31, 2019

tbg commented Feb 1, 2019

petermattis commented Feb 1, 2019

petermattis commented Feb 1, 2019

bdarnell commented Feb 1, 2019

petermattis commented Feb 1, 2019

petermattis commented Feb 1, 2019

petermattis commented Feb 5, 2019

tbg left a comment

petermattis commented Feb 5, 2019

craig bot commented Feb 5, 2019

gossip: demonstrate problem with loopback info propagation #34472

gossip: demonstrate problem with loopback info propagation #34472

Conversation

petermattis commented Jan 31, 2019

cockroach-teamcity commented Jan 31, 2019

tbg commented Feb 1, 2019

petermattis commented Feb 1, 2019

petermattis commented Feb 1, 2019

bdarnell commented Feb 1, 2019

petermattis commented Feb 1, 2019

petermattis commented Feb 1, 2019

petermattis commented Feb 5, 2019

tbg left a comment

Choose a reason for hiding this comment

petermattis commented Feb 5, 2019

craig bot commented Feb 5, 2019

Build succeeded