-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What will happen if 4 of the 7 elected consensus nodes are offline? #2203
Comments
Maybe we should restore to original candidates if we reach more than view X |
Please check #2205 |
Hmm, not sure if that will solve the issue. I also think 2 days is too much. If the network is offline for 2 days people will hunt us with stones and sticks. Maybe we need committee members as backup nodes. I like your solution because it is pretty straight forward, but at the same time, I'm not confident that this will solve the issue 'forever'. Maybe it will, we need more feedback. |
@shargon can't new CN fork the network if they are not in sync? Can't they do this on purpose, like an attack? If so, it looks like a very profitable attack, meaning we need to add more security measures. If you think they can't fork it, I think that your solution may work. |
i think the network will be stuck until enough node get online. |
It's really hard to do that. What is an online/healthy candidate? We're on a P2P network, even if we're to imagine some ping-address mechanism there it remains P2P, the chain can only see some results of this interaction via oracle.
Of course if 3 out of 7 nodes go down the chain will just stop and we've seen that happening, at the same time we have quite a number of blocks on Neo 2 mainnet/testnet, so it seems like that cases were handled somehow. And I'd split this question into two cases:
I think we've only seen the first thing happening and it rarely happens with three nodes at the same time, if it happens we assume node maintainers to be responsible people and ask them to fix the node. It usually works fine and with proper distribution of nodes between various parties it's hard to expect 3 out of 7 to be unreachable/unresponsive simultaneously. Still, there is some probability for that. The second case is more interesting in that it could be an attack on the network with bad guys voting for random key with no node behind it and moving it to the list of CNs. This attack technically requires a lot of NEO (outweighing some other three proper node votes) and it's really hard to imagine any holder of substantial amounts of NEO doing that (ruining the network and making NEO worthless). But we can of course consider this scenario too. In both cases we have some nodes not working and we can't get them back online. PR #2205 was dismissed already, so it's not a solution, let's look at #2226 now. I think there are several problems there:
So I'm not sure #2226 is the best solution. What we can do first is try minimizing the chance of this happening:
And then, if we're still in this situation I think instead of going back to standby list it's much more appropriate to use the current committee again. We can add a possibility for committee to sign a block (with some candidate deactivating transactions inside probably). So that blocks could be signed either by CNs (normally) or by the committee (if CNs can't do that). We trust the committee any way, it is current committee as of when this happens, it shouldn't be a problem to collect proper number of signatures, it doesn't require any configuration. The mechanism can be optional as it only makes sense when committee is at least twice bigger than the number of validators. In any event, this can be done post-preview5 or even later. |
We voted for 21 nodes to become the committee, and the 7 nodes with the highest votes became consensus nodes. There is a process for a node to gradually become a consensus node from 0 votes. If it is offline, maybe we will realize it and vote it out when it becomes a committee member. |
It is impossible because users can vote to candidates only. And to be a candidate, his public key is verified.
Agree. |
Proactive monitoring and voting probably still is the best thing to prevent this. The question is though how do we monitor for a non-CN node. But if we're to have confirmation for committee node being alive then it becomes very easy to quickly replace (vote out) non-functional CNs (just because we'll always know that there are other nodes that can immediately replace them).
Right, but this verification only means that the key in question existed when registration transaction was created. It doesn't mean that there is a node on the network with this key and it doesn't prevent throwing away this key after registration. This is very theoretical, but still one can register a key, never run a node and still organize some "vote for X" campaign to gather enough votes. |
Also not a fan of #2226 in its current form... who knows what state the default nodes are in. Maybe they are still trusted, maybe they are lost keys, maybe they were sold on the black market a long time ago and are now in the hands of a single malicious actor. We turn a liveness fail into a potential safety fail. Moving back in the direction of the lightning voting proposal... Why not use a PoW-based fallback to facilitate voting and carrying any other critical messages until the CN error is resolved? Committee nodes can be your miners, keep block times short (1 minute target?). It would allow us to keep moving forward until dBFT is restored, also serves as a check to see which CNs or candidates are online and pulling their weight. I imagine it could have other uses in the future too, e.g. provide entropy for PRNG (#2019). |
I'm studying the voting mechanism for Neo 3, and some issues came to my mind. I didn't see any validation to verify if the candidates are actually online/healthy. What will happen if the elected nodes are not online? If there is no consensus, how can we 'roll back' election results? If there are no consensus nodes, it is not possible to change the elected nodes.
Is this true? What am I missing?
Edit: If this is in fact true, then we need to add some transition phase to ensure that elected CN are producing blocks
The text was updated successfully, but these errors were encountered: