Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: Always gossip system config after lease changes hands #17285

Merged
merged 1 commit into from
Jul 28, 2017

Conversation

a-robinson
Copy link
Contributor

Without doing this, it's possible for n1 to gossip the system config,
lose its lease, and restart with the source node of the system config
gossip still being n1. If this happens, other nodes will not provide the
system config gossip back to n1 because its watermark for itself will be
higher than the system config's timestamp.

I think it's better to add this "last-gossiped-from-here" check to the
universal code path for gossiping the system config than to change only
the leaseChangingHands code path, since it's possible that one of the
other checks or an error preventing gossip could trigger from there and
then we wouldn't know the next time through that we should gossip it.

This fixes the main problem seen in #17272

Without doing this, it's possible for n1 to gossip the system config,
lose its lease, and restart with the source node of the system config
gossip still being n1. If this happens, other nodes will not provide the
system config gossip back to n1 because its watermark for itself will be
higher than the system config's timestamp.

I think it's better to add this "last-gossiped-from-here" check to the
universal code path for gossiping the system config than to change only
the leaseChangingHands code path, since it's possible that one of the
other checks or an error preventing gossip could trigger from there and
then we wouldn't know the next time through that we should gossip it.
@a-robinson a-robinson requested a review from tbg July 28, 2017 17:10
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@a-robinson a-robinson merged commit f7c24e6 into cockroachdb:master Jul 28, 2017
@a-robinson a-robinson deleted the gossipsyscfg branch May 18, 2018 20:34
petermattis added a commit to petermattis/cockroach that referenced this pull request Feb 1, 2019
Similar to cockroachdb#17285. We need to always gossip node liveness infos after
the node-liveness range lease changes hands due to the way gossip
handles propagation of infos which "loopback" to the sender. Due to the
highwater stamp mechanism, such infos are never sent back to the
sender. If the gossip info is subsequently never re-gossiped, a node in
the cluster may never see the info. This is only a problem for gossip
keys which can be gossiped from multiple nodes, such as the node
liveness keys and the system config key. The other keys which can be
gossiped from multiple nodes are the table-stats-added key, and the
table-disable-merges key, neither of which appear to have problematic
scenarios associated with the failure to propagate them back to the
sender after a restart.

See cockroachdb#34494

Release note: None
petermattis added a commit to petermattis/cockroach that referenced this pull request Feb 5, 2019
Similar to cockroachdb#17285. We need to always gossip node liveness infos after
the node-liveness range lease changes hands due to the way gossip
handles propagation of infos which "loopback" to the sender. Due to the
highwater stamp mechanism, such infos are never sent back to the
sender. If the gossip info is subsequently never re-gossiped, a node in
the cluster may never see the info. This is only a problem for gossip
keys which can be gossiped from multiple nodes, such as the node
liveness keys and the system config key. The other keys which can be
gossiped from multiple nodes are the table-stats-added key, and the
table-disable-merges key, neither of which appear to have problematic
scenarios associated with the failure to propagate them back to the
sender after a restart.

See cockroachdb#34494

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants