Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peers are pinged even after dead #5

Open
webglider opened this issue Jul 17, 2015 · 4 comments
Open

Peers are pinged even after dead #5

webglider opened this issue Jul 17, 2015 · 4 comments

Comments

@webglider
Copy link

I created 3 peers on different ports (using gossipmonger-tcp-transport listeners as transport). Lets call them A,B,C

A was given an empty seed set
B was given an seed set as [A]
C was given seed set as [B]

Everything runs fine when all the peers a live.
Then, I close peer C by terminating it's process (Ctrl-C)
A and B start reporting connection errors as expected
After some time they recognize peer C is dead (the 'peer dead' event is fired on both)
But even after they have recognized that the peer is dead, I continue to see periodic connection error reports.

Shouldn't peers stop trying to ping peers which they think are dead?
(MIN_LIVE_PEERS is set to the default 1, so that should not be a problem)

If they keep trying to ping even peers which are dead, what is the point of even calling them dead? That means a peer can never leave the cluster

@webglider
Copy link
Author

Ok, I checked the source code.
There is a probability with which dead nodes are tried.
Is this necessary to ensure consistency, or would it be okay to get rid of this?

@tristanls
Copy link
Owner

Hi @webglider,

First, thank you for a great description of what you're seeing. It was easy to understand what you described.

You're right, the current code does not allow a peer to leave a cluster. I think you've found a missing feature. The retrying of dead peers was implemented because that's what was in the algorithm I read in the paper. I never got around to implementing peers cleanly leaving the cluster.

I don't think we should get rid of the retrying the dead nodes feature. It allows the cluster to recover from network partitions, partial failures, etc.

I think it would be possible to implement leaving the cluster by adding some sort of delete method to the storage api that would get rid of the desired peer.

@webglider
Copy link
Author

@tristanls Yeah, I guess a method to allow peers to deliberately leave the network could be implemented. Right now I guess that can be done with a small hack - we could program each peer to recognize a special tombstone key,value pair maybe ('i am', 'dead') :P , so that whenever it receives such an update from another peer, it forcefully deletes the peer from storage. So if a peer wants to leave intentionally, it will update this tombstone key value pair locally, wait for a few seconds and then disconnect. I guess that will be sufficient to ensure that the update will propagate through gossip to all other peers.

But, what if a rough peer doesn't follow the standard protocol for leaving the cluster and simply terminates his process and shuts down his computer. In such cases the redundancy will remain. Is there a way to find such instances and delete them from the cluster? (maybe if a peer cannot be reached by many peers it is considered as to have left)

That leads me to another question regarding the phi accrual failure detection. I haven't read the details of how exactly the phi value is computed, but looking at the source code I could see that it was being computed using last time and interval related data. If I am not wrong this data is updated only when a peer receives a digest or a delta from another peer. So now lets imagine an idle network (i.e no updates are happening as of now), where peers are randomly exchanging digests. This means peer A's phi value of peer B will be updated only when A receives a digest from B. If the network is large, isn't there a chance that this could take considerably long? This means the failure detection information is not being gossipped i.e the peers are not working together in collaboration to decide the failure of a node. Each node independently decides failure of all the other nodes. Am I right regarding this? or have I missed something in the implementation? Shouldn't peers be gossiping about their phi values of other peers, and then they decide failure using this collective information?

@tristanls
Copy link
Owner

Hey,

I've thought about this for a bit. I think a place where to introduce leaving the cluster (at least at first) is through the storage abstraction https://github.com/tristanls/gossipmonger#gossipmonger-storage. If storage afforded something like storage.remove(id), which would remove the peer from storage, the next time around gossipmonger calling storage.livePeers() or storage.deadPeers() the removed peer would not be included in the results.

With that in place, the protocol for leaving the cluster would be:

  1. shut down the peer (no need to be clean) permanently.
  2. wait until everyone declares the peer dead (if we were to remove the peer prior to it being considered dead, we would could enter a state where we advertise a dead peer in the middle of removal)
  3. communicate removal of that particular id (this can be some 'admin' gossip key)
  4. when peers receive the admin message, they remove the peer use storage.remove(id)

The above protocol is very much a type of "after-the-fact clean up" protocol. In essence, you can shut down the peers and let them go dead.. and at some point, you can do the admin message that cleans up the ones you want to remove.

Regarding phi accrual failure detection. If you take a look at https://github.com/tristanls/gossipmonger/blob/master/index.js#L327, you'll see that we calculate phi when we locally attempt to connect to a peer (gossipmonger.gossip()), and not when the peer connects to us. Since, we constantly attempt to gossipmonger.gossip() https://github.com/tristanls/gossipmonger/blob/master/index.js#L349, we continuously update phi locally. So, an idle network would result in our local node thinking that all the peers are dead. You are correct that each node independently decides failure of all the other nodes. If the network is large, we might not connect to everyone, and therefore declare them dead, but that's why we continue trying dead peers, in order to test our local assumptions about the remote state of the peer.

As to peers gossiping their phi values to other peers, it sounds interesting, but I have not explored that path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants