Consensus: "enterPrevote: ProposalBlock is invalid" - Error: "wrong signature" #4926

njmurarka · 2020-05-31T06:18:15Z

Tendermint version:

Tendermint Core Semantic Version: 0.33.3
P2P Protocol Version: 7
Block Protocol Version: 10

"node_info": {
    "protocol_version": {
      "p2p": "7",
      "block": "10",
      "app": "0"
    },
    "id": "dbc39feecf277f59b4b16ae277e8545c54ac244a",
    "listen_addr": "tcp://0.0.0.0:26656",
    "network": "bluzelle",
    "version": "0.33.3",
    "channels": "4020212223303800",
    "moniker": "daemon-sentry-3",
    "other": {
      "tx_index": "on",
      "rpc_address": "tcp://0.0.0.0:26657"
    }

ABCI app:

Cosmos SDK Version: v0.38.3

"application_version": {
    "name": "BluzelleService",
    "server_name": "blzd",
    "client_name": "blzcli",
    "version": "0.0.0-74-ge1ee575",
    "commit": "e1ee575051ad2ea18ef22fc6bf7a6fc904612a49",
    "build_tags": "ledger,faucet,cosmos-sdk v0.38.3",
    "go": "go version go1.14.3 linux/amd64"
  }

Big Dipper Explorer URL:

http://explorer.testnet.public.bluzelle.com:3000

Instructions to setup a similar node (I'd suggest just setting up a sentry):

https://github.com/bluzelle/curium/blob/devel/docs/public/buildvalidatorsentry.md

Access to genesis file for chain:

http://a.sentry.bluzellenet.bluzelle.com:1317/genesis.json

Sample command to get node info:

curl --location --request GET 'http://a.sentry.testnet.public.bluzelle.com:1317/node_info'

Discord channel invite (in case you want to live chat with me... I am Neeraj one of the admins):

https://discord.gg/BbBZJZJ

Environment:

OS (e.g. from /etc/os-release):

Distributor ID: Ubuntu
Description:  Ubuntu 18.04.4 LTS
Release:  18.04
Codename: bionic

Install tools:

Using COSMOS SDK v0.38.3. Otherwise, not sure what else to say here.

Others:

We are running a testnet chain with our CRUD database as one of the application modules, in COSMOS.

We currently (as of filing this issue) have 5 "sentries" and 3 validators. To be clear, the sentries have no voting power and are the only peers that the validators talk to (the validators can talk to each other too). Furthermore, the validators are IP firewalled to only be able to talk to the sentries and other validators. The sentries themselves keep the validator node id's private.

Sentry hostnames:

a.sentry.testnet.public.bluzelle.com
b.sentry.testnet.public.bluzelle.com
c.sentry.testnet.public.bluzelle.com
d.sentry.testnet.public.bluzelle.com
e.sentry.testnet.public.bluzelle.com

I am not listing the validator hostnames, since they are inaccessible (due to the firewall) anyways.

The validators are only listening on 26656 to validators and sentries. The sentries are listening on 26656 and 26657 and also each run the cosmos REST server, listening on 1317.

We have opened our testnet to the public. Members of the public have setup sentries and validators of their own, and are expected to use our five sentries as their P2P peers in config.toml.

What happened:

For weeks, things on our testnet had been running fine. I had dozens of members of the public running validators on it, just so these people could learn the process of setting up a validator, etc.

I needed to increase the max # of allowed validators (to something much higher than the default value of 100) in the "app_state/staking/params/max_validators" value in genesis.json. I think that this particular value is a COSMOS thing, but I wanted to mention it for context. We are not using COSMOS governance yet, so we decided to do a hard reset (ie: generate a new genesis.json and start the chain all over).

First, here is what I did on my OWN 5 sentries and 3 validators:

Stopped all my sentries and validators.
Wiped out their .blzd folders (this is the name of my "home" folder for my "blzd" daemons). Because of this, the nodes will all get new node ids and will be new "peers".
Re-initialized each sentry and validator with "blzd init", etc... much like I always do when I setup a validator or sentry from scratch (setting up peers, etc). I had also increased the "max_num_inbound_peers" to 800 and "max_num_outbound_peers" to 200, in the [p2p] section of config.toml. This might only be anecdotal in value. I had an issue where we had too many connects to my sentries and they were dropping connections on the p2p port.
Generated the new genesis.
Deployed this genesis to all the sentries and validators.
Run the necessary COSMOS commands to get the validators staked, created, etc.
Start up all my sentries and validators, (thereby starting up the new chain from block 0).

Next, here is what I asked the people in the community to do with their validator and/or sentries:

Run "blzd unsafe-reset-all" on all their daemons. I asked the community do this instead of wiping out the ".blzd" folder, to save them some work.
Copy over the new genesis.json file, replacing the old genesis.json.
Set the new peers list in the p2p section of config.toml.
Run the necessary COSMOS commands to get the validators staked, created, etc.
Start up their sentries and validators.

The community slowly started up their daemons.

At some point (within an hour or so, about 2300 blocks in), I started to get the error below. I was getting this on all my sentries and validators. Basically, the chain had completely crashed. I tried to restart my validators and sentries, but this was unrecoverable.

E[2020-05-29|03:03:32.975] enterPrevote: ProposalBlock is invalid       module=consensus height=2285 round=0 err="wrong signature (#35): C683341000384EA00A345F9DB9608292F65EE83B51752C0A375A9FCFC2BD895E0792A0727925845DC13BA0E208C38B7B12B2218B2FE29B6D9135C53D7F253D05"
E[2020-05-29|03:03:35.128] enterPrevote: ProposalBlock is invalid       module=consensus height=2285 round=1 err="wrong signature (#35): C683341000384EA00A345F9DB9608292F65EE83B51752C0A375A9FCFC2BD895E0792A0727925845DC13BA0E208C38B7B12B2218B2FE29B6D9135C53D7F253D05"
E[2020-05-29|03:03:37.255] enterPrevote: ProposalBlock is invalid       module=consensus height=2285 round=2 err="wrong signature (#35): C683341000384EA00A345F9DB9608292F65EE83B51752C0A375A9FCFC2BD895E0792A0727925845DC13BA0E208C38B7B12B2218B2FE29B6D9135C53D7F253D05"
.
.
.

To had no choice but to "reset" the whole chain again. I stopped all my validators and sentries and this time, ONLY ran "unsafe-reset-all" on all my daemons. Of course, I also had to do some COSMOS setup again (staking, etc), but started everything again, and asked the community to again do the same steps listed above with yet a new genesis.json, etc.

Within an hour, the whole network went down again. Effectively the same error (different block, signature HASH this time):

.
.
.
E[2020-05-29|06:57:48.621] enterPrevote: ProposalBlock is invalid       module=consensus height=676 round=146 err="wrong signature (#8): 62A6A628CFB1F72D76C48F71A928DD628E29585DD4B861EDF3F216E77FBB0A7C492D2280B218FBA34A0751F02961C2657708711D3F212800CFE847B804F0360D
.
.
.

What you expected to happen:

I expect "clean" output, as so:

I[2020-05-30|21:53:04.286] Executed block                               module=state height=20416 validTxs=0 invalidTxs=2
I[2020-05-30|21:53:04.309] Committed state                              module=state height=20416 txs=2 appHash=80D70DC5FF062F34D3F79F15FC85CB367A5A7F9CF39B4EE6C1DC68E9F1958EA1

(The fact that invalidTxs is non-zero is the subject of another investigation)

Have you tried the latest version:

Not sure. I think so. Although in looking at the Tendermint Github, I see there are two minor versions available that are newer than what we have.

How to reproduce it:

I more or less explained how it came about above in the "what happened" section.

Looking at #2720, I see a similar error message, but not quite the same. But in looking at that issue, it was suggested that perhaps all the nodes did not start from the same "genesis" state. It is suggested that perhaps some node(s) have a stale "home folder" (.blzd, I presume?).

Does "unsafe-reset-all" actually clear out all state including the COSMOS KV stores, app state, etc? I assume this command is sufficient to accomplish a clean slate?

Is it possible that in "resetting" my chain as I did above, some members of the public possibly forgot to run that "blzd unsafe-reset-all" command, and by missing this step, when they started their node, it had data from the previous chain left over, and this somehow brought the whole network down? If so, it is a bit scary that a single node (or even a bunch of them) could do this. It is an excellent DoS attack vector, it seems, if so.

Logs:

Listed above.

Config:

No specific changes made to Tendermint.

node command runtime flags:

This is all running from within our daemon that was built with the COSMOS SDK.

/dump_consensus_state output for consensus bugs

Not sure how to do this.

Anything else we need to know:

Most details given above.

I did some searching ahead of time to see if I could resolve this myself. I saw some issues related to it but they are already closed.

The text was updated successfully, but these errors were encountered:

melekes · 2020-06-02T10:56:20Z

Does "unsafe-reset-all" actually clear out all state including the COSMOS KV stores, app state, etc? I assume this command is sufficient to accomplish a clean slate?

Yes, at least it's supposed to.

E[2020-05-29|06:57:48.621] enterPrevote: ProposalBlock is invalid module=consensus height=676 round=146 err="wrong signature (#8)

This error basically means validator #8 has an old genesis file (or state) => it's signature is incorrect. proposal block is considered invalid since the commit it contains has an invalid signature in it. Now, the question is how the invalid signature made it's way into the commit? we always verify signatures/votes before adding them. 2/3+ of voting power has to agree on content of commit (chainID has to be the same) in order to form a valid commit.

Is it possible that in "resetting" my chain as I did above, some members of the public possibly forgot to run that "blzd unsafe-reset-all" command, and by missing this step, when they started their node, it had data from the previous chain left over, and this somehow brought the whole network down?

It's possible in theory, but it's not a desired behavior. We haven't seen this before.

melekes · 2020-06-02T12:29:31Z

@njmurarka could you please share more logs? we'll need them to understand what has happened precisely.

njmurarka · 2020-06-03T02:04:04Z

@melekes I would love to. I have the log file of output from one of my daemons. It is attached here.

blzd.log

Thank you for clarifying what that "enterPrevote: ProposalBlock is invalid module=consensus" error means.

But it really begs the question that worries me. Even if someone had an old genesis file. How could they have caused the network to go into this state?

The chain gets started with my three validators, and they have such an enormous stake that it is not possible for anybody else to even get 10% power, if even 1% power. I did this intentionally for my testnet.

So with this in mind, how can a single "rogue" node that has "bad setup" accomplish this?

I am pretty confident now that somebody simply had "old data" of some sort, and that caused this. Maybe the old genesis. Maybe it was the old kv store?

I have not attempted to replicate this, but I can tell you that it happened twice to me. Really scary if this were to happen on a mainnet.

melekes · 2020-06-05T07:05:04Z

Do you happen to have the /consensus_state, /dump_consensus_state output from stale nodes?

curl --location --request GET 'http://a.sentry.testnet.public.bluzelle.com:1317/consensus_state'

melekes · 2020-06-05T08:02:07Z

Also, it would've been great to have logs with DEBUG level

njmurarka · 2020-06-09T01:31:59Z

Unfortunately, @melekes, I do not have these nodes running at this point.

In fact, to be honest, I do not even know which nodes were "stale" since I am not yet clear what happened, who was stale, and what stale means here. Is this a node that has a new validator private key? An old one? Or merely a validator that has the old genesis file?

It is actually pretty scary, but I might be wrong. The reason it is scary is I am pretty confident my 3 validators (who hold 99% voting power) did everything correct, and it was one of the other validators that joined (I have 100+ other validators being run by members of the public) that might be somehow responsible. But if this is in fact true, that means that someone with < 1% voting power had a means to crash the network. I sure hope I am wrong.

I think we really need to hunt down and find an explanation for this. It puts my my ill at ease knowing this happened twice and now, it is not happening. The best theory so far is a bad validator, yet it really puts the whole point of a decentralized network (with the ensuing security) in a bad light.

I am attaching a log of the console output from "blzd" for one of the nodes. I am unclear how helpful this is, but it is better than nothing. Please scroll down to line 4,666 to see where the issue manifested.

blzd.log

melekes · 2020-06-09T05:51:43Z

No worries!

Looks like the only path here is to try and replicate the issue + reason through the code.

njmurarka · 2020-06-09T09:03:56Z

@melekes

We are running a "Game of Stakes" type competition right now, so I am a bit heads down. But I want to quash/explain this issue, as it is worrisome and should be to others (if it is legitimate and not something dumb on my side).

I will try to make it happen intentionally in the next few weeks. I have the ability to launch new testnets pretty quickly.

Refs #4926

melekes · 2020-06-19T05:47:13Z

Hasn't been able to find a reason so far.

Refs #4926

njmurarka · 2020-06-28T19:19:02Z

Hello @melekes.

So, as a bit of fortunate news (well, not that fortunate), this "crash" has occurred again. I reached out to you on Discord but we can continue here too.

Here is some output that might look familiar:

I[2020-06-28|04:56:57.031] Executed block                               module=state height=385080 validTxs=0 invalidTxs=5

I[2020-06-28|04:56:57.053] Committed state                              module=state height=385080 txs=5 appHash=33BB2F44035D6A316682086B095670D81011DF8DA6EA83E73D197AE604CD34CA

I[2020-06-28|04:57:03.908] Executed block                               module=state height=385081 validTxs=0 invalidTxs=5

I[2020-06-28|04:57:03.931] Committed state                              module=state height=385081 txs=5 appHash=089451C122173385CCB904D34C7FBCC592248A6983C3201A07C04D4876DEE9A4

I[2020-06-28|04:57:10.632] Executed block                               module=state height=385082 validTxs=0 invalidTxs=5

I[2020-06-28|04:57:10.656] Committed state                              module=state height=385082 txs=5 appHash=80B48A34F8C110106607DA13AC55725D2D18EAA1D66CBA9D95FACD4CA0B21D94

I[2020-06-28|04:57:17.655] Executed block                               module=state height=385083 validTxs=0 invalidTxs=5

I[2020-06-28|04:57:17.660] Updates to validators                        module=state updates=00C890F67A03D76EEEA10B7AECBEE4FAD23ED19A:0

I[2020-06-28|04:57:17.688] Committed state                              module=state height=385083 txs=5 appHash=A2DE8FCB717D0898DB7A23F244B24B7455790D7114346EC5180DF300D1DD2C2B

I[2020-06-28|04:57:23.592] Executed block                               module=state height=385084 validTxs=0 invalidTxs=4

I[2020-06-28|04:57:23.616] Committed state                              module=state height=385084 txs=4 appHash=2CBEB80E0CBD3F3247DA20AF76CE96DCC533EBDF4489EB8EDEDAB789E6DD27EF

I[2020-06-28|04:57:30.333] Executed block                               module=state height=385085 validTxs=0 invalidTxs=5

I[2020-06-28|04:57:30.356] Committed state                              module=state height=385085 txs=5 appHash=398DA9C83A05C449238C129C785DA90A0D5EBED11AADBAE90A5810F239D62864

E[2020-06-28|04:57:30.975] MConnection flush failed                     module=p2p [email protected]:26656 err="write tcp 172.26.7.15:38078->54.163.188.170:26656: write: connection reset by peer"

E[2020-06-28|04:57:30.975] Stopping peer for error                      module=p2p peer="Peer{MConn{54.163.188.170:26656} f1b412c2a33bda98f42959e86602a0ca9a3ede22 out}" err=EOF

E[2020-06-28|04:57:35.607] enterPrevote: ProposalBlock is invalid       module=consensus height=385086 round=0 err="wrong signature (#0): 42C97B86D89D56F1A07D778C4F22196F23F3F47C67EF37AC39EC06C2D8FB493AB40C99CF92811573D3FE3F61744AA45166091EB1FDBFAD02B8DA65ADDCBC160F"

E[2020-06-28|04:57:38.201] enterPrevote: ProposalBlock is invalid       module=consensus height=385086 round=1 err="wrong signature (#0): 42C97B86D89D56F1A07D778C4F22196F23F3F47C67EF37AC39EC06C2D8FB493AB40C99CF92811573D3FE3F61744AA45166091EB1FDBFAD02B8DA65ADDCBC160F"

E[2020-06-28|04:57:40.801] enterPrevote: ProposalBlock is invalid       module=consensus height=385086 round=2 err="wrong signature (#0): 42C97B86D89D56F1A07D778C4F22196F23F3F47C67EF37AC39EC06C2D8FB493AB40C99CF92811573D3FE3F61744AA45166091EB1FDBFAD02B8DA65ADDCBC160F"

E[2020-06-28|04:57:44.311] enterPrevote: ProposalBlock is invalid       module=consensus height=385086 round=3 err="wrong signature (#0): 42C97B86D89D56F1A07D778C4F22196F23F3F47C67EF37AC39EC06C2D8FB493AB40C99CF92811573D3FE3F61744AA45166091EB1FDBFAD02B8DA65ADDCBC160F"

E[2020-06-28|04:57:48.299] enterPrevote: ProposalBlock is invalid       module=consensus height=385086 round=4 err="wrong signature (#0): 42C97B86D89D56F1A07D778C4F22196F23F3F47C67EF37AC39EC06C2D8FB493AB40C99CF92811573D3FE3F61744AA45166091EB1FDBFAD02B8DA65ADDCBC160F"

It appears to be the exact same problem.

When I restart a node that has "crashed" (presumably with the output above), I get the following:

blzd start
I[2020-06-28|10:20:48.228] starting ABCI with Tendermint                module=main
I[2020-06-28|10:20:48.241] Module setup                                 module=main bluzelle_crud=true

deleting 385022, 385022demo-d1272187875103293441
panic: Failed to reconstruct LastCommit: Failed to verify vote with ChainID bluzelle and PubKey PubKeyEd25519{3FD21AAD1FAAF2AF272D4C8B005F19B8F93B6C3C34A44E7FDAF07DD51EE54F56}: invalid signature

goroutine 1 [running]:
github.com/tendermint/tendermint/types.CommitToVoteSet(0xc004d18790, 0x8, 0xc00354c800, 0xc00544fa40, 0x0)
/go/pkg/mod/github.com/tendermint/[email protected]/types/block.go:594 +0x496
github.com/tendermint/tendermint/consensus.(*State).reconstructLastCommit(0xc000096000, 0xa, 0x0, 0xc004d18780, 0x6, 0xc004d18790, 0x8, 0x5e03d, 0xc0059d8ae0, 0x20, ...)
/go/pkg/mod/github.com/tendermint/[email protected]/consensus/state.go:493 +0x73
github.com/tendermint/tendermint/consensus.NewState(0xc00017a5a0, 0xa, 0x0, 0xc004d18780, 0x6, 0xc004d18790, 0x8, 0x5e03d, 0xc0059d8ae0, 0x20, ...)
/go/pkg/mod/github.com/tendermint/[email protected]/consensus/state.go:177 +0x502
github.com/tendermint/tendermint/node.createConsensusReactor(0xc0000da8c0, 0xa, 0x0, 0xc004d18780, 0x6, 0xc004d18790, 0x8, 0x5e03d, 0xc0059d8ae0, 0x20, ...)
/go/pkg/mod/github.com/tendermint/[email protected]/node/node.go:388 +0x170
github.com/tendermint/tendermint/node.NewNode(0xc0000da8c0, 0x1608900, 0xc000030820, 0xc000e2cb30, 0x15edfc0, 0xc000085240, 0xc000e2cd40, 0x140b500, 0xc000e2cd50, 0x160ee40, ...)
/go/pkg/mod/github.com/tendermint/[email protected]/node/node.go:661 +0x985
github.com/cosmos/cosmos-sdk/server.startInProcess(0xc000dab480, 0x140bd28, 0x1d, 0x0, 0x0)
/go/pkg/mod/github.com/cosmos/[email protected]/server/start.go:157 +0x4c1
github.com/cosmos/cosmos-sdk/server.StartCmd.func1(0xc000dd6840, 0x1f17138, 0x0, 0x0, 0x0, 0x0)
/go/pkg/mod/github.com/cosmos/[email protected]/server/start.go:67 +0xb4
github.com/spf13/cobra.(*Command).execute(0xc000dd6840, 0x1f17138, 0x0, 0x0, 0xc000dd6840, 0x1f17138)
/go/pkg/mod/github.com/spf13/[email protected]/command.go:840 +0x453
github.com/spf13/cobra.(*Command).ExecuteC(0xc0000c5340, 0x2, 0xc000dab820, 0x12a7fda)
/go/pkg/mod/github.com/spf13/[email protected]/command.go:945 +0x317
github.com/spf13/cobra.(*Command).Execute(...)
/go/pkg/mod/github.com/spf13/[email protected]/command.go:885
github.com/tendermint/tendermint/libs/cli.Executor.Execute(0xc0000c5340, 0x140c150, 0x2, 0xc000d7f390)
/go/pkg/mod/github.com/tendermint/[email protected]/libs/cli/setup.go:89 +0x3c
main.main()
/go/src/github.com/bluzelle/curium/cmd/blzd/main.go:84 +0x777
bash-5.0#

I have kept my nodes running so we can resolve this matter. It is quite worrisome. I don't want to make assumptions. It is possible this is a result of the code we have in the COSMOS layer.

That having been said, it seems like this is a genuine Tendermint issue. I don't see how any sort of code in the COSMOS layer should be able to cause this, even if we intentionally wanted it to be so.

It is worrisome because we want to run a Mainnet soon. But having a crash like this makes that an unwise choice till we positively REPRODUCE this issue, and then resolve it. I obviously cannot sign off on a mainnet with an issue like this that is outstanding.

Thoughts? Please advise what you need from me.

Thanks.

ebuchman · 2020-06-28T19:58:41Z

Thanks for the reports on this. Can you please provide output from the /consensus_state and /dump_consensus_state RPC endpoints?

Would it be possible for us to get remote access to the RPC end points of running nodes so we can dig in a bit ourselves?

ebuchman · 2020-06-28T20:07:04Z

Also to clarify - do these livelocks only happen after a genesis upgrade? And are you changing the chain-id after genesis upgrade? Technically once a chain-id has been used it shouldn't be used again in a genesis restart because validator signatures from the previous chain could be considered misbehaviour (eg. a signature for the old block 5 could conflict with a signature for the new block 5 from the same validator and could be seen as evidence, caused that validator to be slashed!). I'm not sure that's relevant in this case to the infinite "invalid proposal block" but it's something to keep in mind in general.

njmurarka · 2020-06-28T20:38:57Z

Also to clarify - do these livelocks only happen after a genesis upgrade? And are you changing the chain-id after genesis upgrade? Technically once a chain-id has been used it shouldn't be used again in a genesis restart because validator signatures from the previous chain could be considered misbehaviour (eg. a signature for the old block 5 could conflict with a signature for the new block 5 from the same validator and could be seen as evidence, caused that validator to be slashed!). I'm not sure that's relevant in this case to the infinite "invalid proposal block" but it's something to keep in mind in general.

Real quick replies to keep the momentum... we did not change or "upgrade" the genesis. I am not 100% sure what a "genesis" upgrade actually is, although I am 70% confident of my guess. We did nothing to status quo. Network was running for 3+ weeks and then boom -- this occurred!

No change to the chain-id. No upgrade. No changes other than validators getting jailed, getting unbonded, and new validators trying to join. All natural, "organic" actions that are part of any Tendermint/COSMOS chain.

Can you clarify what you mean when you say that a chain-id should not be re-used? This is a potentially a critical guideline. Indeed we have reset our network a few times and have re-used the same chain-id. That having been said, it scares the **** out of me that a network can crash like this. It contradicts the whole "high availability" value proposition of a blockchain, to have a "bug" like this. Even if a single validator has "bad DB data" (validator signatures from the previous chain), it makes no sense that this alone could cause the entire network to crash this way, does it? I mean really... if 99.9% of the voting power is ok with things and some validator shows up with bogus/antiquated data, how can that alone cause the whole network to crash like this? I sincerely hope I am wrong about this guess. And even if I am not, it would be great to resolve this. As it stands, our testnet is on its knees... stalled. I don't know how to recover it. Some nodes are running but no new blocks are forthcoming.

I will get on providing answers to your request for output from those two RPC endpoints immediately.

This having been said... I and some members of the community are very alarmed this has happened a few times... just to be clear, Bluzelle runs the three validators that collectively hold 97%+ of the voting power. So it is disturbing to think that some arbitrary validator out there, with "bad data" could cause this. It is a pretty good DoS attack vector, at a minimum.

Thanks so very much, btw. Appreciate the quick feedback. I think that if this is truly an issue in Tendermint (I am 60% confident it is), it behooves us to resolve it quickly.

njmurarka · 2020-06-28T23:18:22Z

Thanks for the reports on this. Can you please provide output from the /consensus_state and /dump_consensus_state RPC endpoints?

Would it be possible for us to get remote access to the RPC end points of running nodes so we can dig in a bit ourselves?

Consensus State (link and file attached):

http://dev-backup.testnet.public.bluzelle.com:26657/consensus_state

consensus_state.txt

Dump Consensus State (link and file attached):

http://dev-backup.testnet.public.bluzelle.com:26657/dump_consensus_state

dump_consensus_state.txt

Can you please confirm you have access at http://dev-backup.testnet.public.bluzelle.com:26657 to the endpoint you are asking for?

What else can I provide to assist?

Thanks.

ebuchman · 2020-06-29T02:23:33Z

Thanks, this is super helpful. Looks like we've identified the problem and managed to replicate this issue. Will publish a fix ASAP.

njmurarka · 2020-06-29T04:54:36Z

Thanks, this is super helpful. Looks like we've identified the problem and managed to replicate this issue. Will publish a fix ASAP.

Thanks. That's great!

Please do share here what the problem was, how you replicated, and how it was resolved? I'd love to see the discussion on this, if it was public.

Will we be able to "save" our existing testnet with this fix?

ebuchman · 2020-06-29T17:24:46Z

Yes we will share details on all of that once the fix is released.

And we will try to provide a script that should allow you to save your existing testnet. Thanks for your patience!

Closes #4926 The dump consensus state had this: "last_commit": { "votes": [ "Vote{0:04CBBF43CA3E 385085/00/2(Precommit) 1B73DA9FC4C8 42C97B86D89D @ 2020-05-27T06:46:51.042392895Z}", "Vote{1:055799E028FA 385085/00/2(Precommit) 652B08AD61EA 0D507D7FA3AB @ 2020-06-28T04:57:29.20793209Z}", "Vote{2:056024CFA910 385085/00/2(Precommit) 652B08AD61EA C8E95532A4C3 @ 2020-06-28T04:57:29.452696998Z}", "Vote{3:0741C95814DA 385085/00/2(Precommit) 652B08AD61EA 36D567615F7C @ 2020-06-28T04:57:29.279788593Z}", Note there's a precommit in there from the first val from May (2020-05-27) while the rest are from today (2020-06-28). It suggests there's a validator from an old instance of the network at this height (they're using the same chain-id!). Obviously a single bad validator shouldn't be an issue. But the Commit refactor work introduced a bug. When we propose a block, we get the block.LastCommit by calling MakeCommit on the set of precommits we saw for the last height. This set may include precommits for a different block, and hence the block.LastCommit we propose may include precommits that aren't actually for the last block (but of course +2/3 will be). Before v0.33, we just skipped over these precommits during verification. But in v0.33, we expect all signatures for a blockID to be for the same block ID! Thus we end up proposing a block that we can't verify.

Clarify how to get the `/dump_consensus_state` data. Eg. #4926 indicated: "Not sure how to do this."

njmurarka mentioned this issue May 31, 2020

p2p mode: "auth failure", "ProposalBlock is invalid" & "duplicate CONN" #2720

Closed

melekes added the C:consensus Component: Consensus label Jun 1, 2020

erikgrinaker added the P:High label Jun 3, 2020

melekes self-assigned this Jun 10, 2020

melekes added a commit that referenced this issue Jun 17, 2020

more test cases for TestValidatorSet_VerifyCommit

cf18821

Refs #4926

melekes mentioned this issue Jun 17, 2020

types: more test cases for TestValidatorSet_VerifyCommit #5018

Merged

mergify bot pushed a commit that referenced this issue Jun 19, 2020

types: more test cases for TestValidatorSet_VerifyCommit (#5018)

8b4a30f

Refs #4926

ebuchman mentioned this issue Jun 28, 2020

.github/issue_template: Update /dump_consensus_state request. #5060

Merged

tessr closed this as completed in 480b995 Jul 2, 2020

melekes reopened this Jul 2, 2020

melekes closed this as completed Jul 3, 2020

mergify bot pushed a commit that referenced this issue Jul 6, 2020

.github/issue_template: Update /dump_consensus_state request. (#5060)

bccc7fb

Clarify how to get the `/dump_consensus_state` data. Eg. #4926 indicated: "Not sure how to do this."

sergio-mena mentioned this issue Oct 18, 2022

docs: clarify BlockIDFlag variants #9590

Merged

3 tasks

grepsuzette mentioned this issue Mar 12, 2023

Cosmos-sdk vulnerability retrospective gnolang/gno#587

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consensus: "enterPrevote: ProposalBlock is invalid" - Error: "wrong signature" #4926

Consensus: "enterPrevote: ProposalBlock is invalid" - Error: "wrong signature" #4926

njmurarka commented May 31, 2020

melekes commented Jun 2, 2020

melekes commented Jun 2, 2020

njmurarka commented Jun 3, 2020

melekes commented Jun 5, 2020

melekes commented Jun 5, 2020

njmurarka commented Jun 9, 2020

melekes commented Jun 9, 2020

njmurarka commented Jun 9, 2020

melekes commented Jun 19, 2020

njmurarka commented Jun 28, 2020 •

edited

Loading

ebuchman commented Jun 28, 2020 •

edited

Loading

ebuchman commented Jun 28, 2020 •

edited

Loading

njmurarka commented Jun 28, 2020 •

edited

Loading

njmurarka commented Jun 28, 2020 •

edited

Loading

ebuchman commented Jun 29, 2020

njmurarka commented Jun 29, 2020 •

edited

Loading

ebuchman commented Jun 29, 2020

Consensus: "enterPrevote: ProposalBlock is invalid" - Error: "wrong signature" #4926

Consensus: "enterPrevote: ProposalBlock is invalid" - Error: "wrong signature" #4926

Comments

njmurarka commented May 31, 2020

melekes commented Jun 2, 2020

melekes commented Jun 2, 2020

njmurarka commented Jun 3, 2020

melekes commented Jun 5, 2020

melekes commented Jun 5, 2020

njmurarka commented Jun 9, 2020

melekes commented Jun 9, 2020

njmurarka commented Jun 9, 2020

melekes commented Jun 19, 2020

njmurarka commented Jun 28, 2020 • edited Loading

ebuchman commented Jun 28, 2020 • edited Loading

ebuchman commented Jun 28, 2020 • edited Loading

njmurarka commented Jun 28, 2020 • edited Loading

njmurarka commented Jun 28, 2020 • edited Loading

ebuchman commented Jun 29, 2020

njmurarka commented Jun 29, 2020 • edited Loading

ebuchman commented Jun 29, 2020

njmurarka commented Jun 28, 2020 •

edited

Loading

ebuchman commented Jun 28, 2020 •

edited

Loading

ebuchman commented Jun 28, 2020 •

edited

Loading

njmurarka commented Jun 28, 2020 •

edited

Loading

njmurarka commented Jun 28, 2020 •

edited

Loading

njmurarka commented Jun 29, 2020 •

edited

Loading