Deal better with network partitions in leaders #2749

manishrjain · 2018-11-13T17:49:55Z

Currently, if Zero leader goes in a network partition, the Alpha nodes would get stuck indefinitely waiting to hear updates from the leader, hence the zero leader becoming a single point of failure. After a network heal, it takes a while for the partitioned nodes to get better again.

This PR fixes these issues, by:

Each Zero leader now sends membership updates every second.
If an Alpha does not get a membership update for over 10s, it disconnects from the leader and tries to recreate the connection to another (any) Zero server. Thus, every alpha would correctly pick up the new membership state and hence, the new Zero leader.
Oracle Delta Stream: If the Zero leader changes or Zero connection becomes unhealthy, Alpha leader would disconnect from the current leader, and try to recreate connection to the new one. Thus, it would continue to receive updates correctly.
Connection Pool: It used to poll every 10s, with no timeout. Changed that to poll every 1s, with a timeout of 1s -- so we get to know about connection health issues quicker. This creates more network traffic (one Echo every 1s, N^2, where N = number of servers in the Dgraph cluster), but if and when that becomes a problem, we'll fix it.

This change is

…Need to work on ensuring that Zero is sending update every second or so.

… leader is partitioned away from the cluster.

…ero and alpha leaders, increments converge quickly to the new ones. And when partition heals, both of them heal quickly.

Currently, if Zero leader goes in a network partition, the Alpha nodes would get stuck indefinitely waiting to hear updates from the leader, hence the zero leader becoming a single point of failure. After a network heal, it takes a while for the partitioned nodes to get better again. This PR fixes these issues, by: - Each Zero leader now sends membership updates every second. - If an Alpha does not get a membership update for over 10s, it disconnects from the leader and tries to recreate the connection to another (any) Zero server. Thus, every alpha would correctly pick up the new membership state and hence, the new Zero leader. - Oracle Delta Stream: If the Zero leader changes or Zero connection becomes unhealthy, Alpha leader would disconnect from the current leader, and try to recreate connection to the new one. Thus, it would continue to receive updates correctly. - Connection Pool: It used to poll every 10s, with no timeout. Changed that to poll every 1s, with a timeout of 1s -- so we get to know about connection health issues quicker. This creates more network traffic (one Echo every 1s, N^2, where N = number of servers in the Dgraph cluster), but if and when that becomes a problem, we'll fix it. Commits: * Added some code to cancel recv from Zero if no update for x seconds. Need to work on ensuring that Zero is sending update every second or so. * Alpha leader can reconnect to the new Zero leader after existing Zero leader is partitioned away from the cluster. * Fixed various partition related issues. After partitioning off both Zero and alpha leaders, increments converge quickly to the new ones. And when partition heals, both of them heal quickly.

manishrjain added 8 commits November 10, 2018 09:28

Added some code to cancel recv from Zero if no update for x seconds. …

d581365

…Need to work on ensuring that Zero is sending update every second or so.

Alpha leader can reconnect to the new Zero leader after existing Zero…

750080d

… leader is partitioned away from the cluster.

Fixed various partition related issues. After partitioning off both Z…

0e738e6

…ero and alpha leaders, increments converge quickly to the new ones. And when partition heals, both of them heal quickly.

Self review

0011015

Self review

333b6c3

Add a TODO

6d5d999

Self review

24aba2c

Merge with master

6f10b1f

manishrjain merged commit e7170c3 into master Nov 13, 2018

manishrjain deleted the mrjn/zero-partition branch November 13, 2018 18:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deal better with network partitions in leaders #2749

Deal better with network partitions in leaders #2749

manishrjain commented Nov 13, 2018 •

edited

Loading

Deal better with network partitions in leaders #2749

Deal better with network partitions in leaders #2749

Conversation

manishrjain commented Nov 13, 2018 • edited Loading

manishrjain commented Nov 13, 2018 •

edited

Loading