Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we implement the rewind feature in tendermint? #3196

Closed
bradyjoestar opened this issue Jan 23, 2019 · 17 comments
Closed

Should we implement the rewind feature in tendermint? #3196

bradyjoestar opened this issue Jan 23, 2019 · 17 comments
Labels
stale for use by stalebot T:question Type: Question

Comments

@bradyjoestar
Copy link
Contributor

bradyjoestar commented Jan 23, 2019

I was reading the code of go-ethereum and found that it had supported the rewind feature.

// SetHead rewinds the local chain to a new head. In the case of headers, everything
// above the new head will be deleted and the new one set. In the case of blocks
// though, the head may be further rewound if block bodies are missing (non-archive
// nodes after a fast sync).
func (bc *BlockChain) SetHead(head uint64) error {
	log.Warn("Rewinding blockchain", "target", head)

	bc.mu.Lock()
	defer bc.mu.Unlock()

	// Rewind the header chain, deleting all block bodies until then
	delFn := func(hash common.Hash, num uint64) {
		DeleteBody(bc.db, hash, num)
	}
	bc.hc.SetHead(head, delFn)
	currentHeader := bc.hc.CurrentHeader()

	// Clear out any stale content from the caches
	bc.bodyCache.Purge()
	bc.bodyRLPCache.Purge()
	bc.blockCache.Purge()
	bc.futureBlocks.Purge()

	// Rewind the block chain, ensuring we don't end up with a stateless head block
	if currentBlock := bc.CurrentBlock(); currentBlock != nil && currentHeader.Number.Uint64() < currentBlock.NumberU64() {
		bc.currentBlock.Store(bc.GetBlock(currentHeader.Hash(), currentHeader.Number.Uint64()))
	}
	if currentBlock := bc.CurrentBlock(); currentBlock != nil {
		if _, err := state.New(currentBlock.Root(), bc.stateCache); err != nil {
			// Rewound state missing, rolled back to before pivot, reset to genesis
			bc.currentBlock.Store(bc.genesisBlock)
		}
	}
	// Rewind the fast block in a simpleton way to the target head
	if currentFastBlock := bc.CurrentFastBlock(); currentFastBlock != nil && currentHeader.Number.Uint64() < currentFastBlock.NumberU64() {
		bc.currentFastBlock.Store(bc.GetBlock(currentHeader.Hash(), currentHeader.Number.Uint64()))
	}
	// If either blocks reached nil, reset to the genesis state
	if currentBlock := bc.CurrentBlock(); currentBlock == nil {
		bc.currentBlock.Store(bc.genesisBlock)
	}
	if currentFastBlock := bc.CurrentFastBlock(); currentFastBlock == nil {
		bc.currentFastBlock.Store(bc.genesisBlock)
	}
	currentBlock := bc.CurrentBlock()
	currentFastBlock := bc.CurrentFastBlock()
	if err := WriteHeadBlockHash(bc.db, currentBlock.Hash()); err != nil {
		log.Crit("Failed to reset head full block", "err", err)
	}
	if err := WriteHeadFastBlockHash(bc.db, currentFastBlock.Hash()); err != nil {
		log.Crit("Failed to reset head fast block", "err", err)
	}
	return bc.loadLastState()
}

Should we implement a similar feature in the tendermint?

If needed, I will write an ADR Documents based my prior work on it. Maybe it will take some time. :)

@melekes melekes added the T:question Type: Question label Jan 23, 2019
@melekes melekes changed the title Should we implement the rewind feature in tendermint Should we implement the rewind feature in tendermint? Jan 23, 2019
@melekes
Copy link
Contributor

melekes commented Jan 23, 2019

Why we'd want that?

@bradyjoestar
Copy link
Contributor Author

Why we'd want that?

Once a block is committed in Tendermint, it's final. Unlike PoW chains where you have many branches and you pick the one with the most "work" (which sometimes requires you to rewind and apply another branch), in Tendermint there's a single chain of blocks.

If we are running in a production enviroment, at a moment there is a thorny problems that happened and restart the node didnt work.

Maybe its better to rewind to previous blocks to keep system up and running. Then we have enough time to fix this bug.

@mattkanwisher
Copy link

Biggest reason, is we have seen the chain halt for consensus problems with App Hash. Once a node has a bad app hash, there is no simple way for it to rejoin the chain without a complete resync. This can create large downtimes. Also since tendermint isn't maintaining backwards compatibility, checkpoints of the chain have to periodically stored off chain, to even restore a node from scratch, cause you can't possible replay the chain after a tendermint upgrade.

@ebuchman
Copy link
Contributor

ebuchman commented Jan 24, 2019

Interesting. I could see us supporting this from a debuggability/maintenance perspective. How would it work? tendermint unsafe_revert_block <height> ? Might be tricky to ensure it works with the app too

@bradyjoestar
Copy link
Contributor Author

I add two flag into tendermint node

RollbackFlag bool `mapstructure:"rollback_data"` //false

RollbackHeight int `mapstructure:"rollback_height"`

And if RollabackFlag = true it will revert.

It is a good idea to create a new command unsafe_revert_block and run tendermint unsafe_revert_block <height>. We should access the blockstore.db and state.db in it.

I use etheremint(0.5.3) as ABCI app and it passed. As @melekes mentioned, it is easy for those PoW chains to rewind like ethereum, but it need some work on data processing for another APP.

@bradyjoestar
Copy link
Contributor Author

In testcase, I will use kvstore as ABCIAPP to judge whether revert works.

If we revert a node from height 1200 to height 1000, it will replay to 1000 when it restart and then use blockpool to get blocks from other nodes to catch up them.

@ebuchman
Copy link
Contributor

What about the priv_validator_state.json? it could cause double signing.

@bradyjoestar
Copy link
Contributor Author

reset it to the following state:

filePv := privval.LoadOrGenFilePV(config.PrivValidatorFile())
filePv.LastHeight = rollbackHeight
filePv.LastRound = 0
filePv.LastStep = 0
filePv.LastSignature = nil
filePv.LastSignBytes = nil
filePv.Save()

@ebuchman
Copy link
Contributor

This could still cause double signing though. I think we might want two cases.

Assuming rollbackHeight is the height we roll back to, we may want options to set filePv.LastHeight = rollbackHeight, so we can double sign if we're cool with that, and filePv.LastHeight = rollbackHeight + 1 so we can't.

Or maybe we can make the rollback command ensure that we don't process evidence from the next height? It's a bit weird. If your app handles evidence, this could get kind of tricky.

@bradyjoestar
Copy link
Contributor Author

I'm reading the consensus code to try to solve this problem.
There may be three situations in rollback, take a cluster of 7 nodes as a example and they are all in the height 1200:

  • rollback a node to 1000
  • rollback 3 nodes to 1000
  • rollback 7 nodes to 1000

I only considered the situation 1&3 previously. But if we rollback 3 nodes it may casue double signing.

@bradyjoestar
Copy link
Contributor Author

bradyjoestar commented Jan 29, 2019

It's a good idea to support two different case.

Take the example above:

  • if we rollback 7 nodes from 1200 to 1000
    we should reset the filePv.LastHeight = 1000 and all votes and datas above height 1000 will be discarded include walMessages in the walFile. All nodes are at the same line.
  • If we rollback a node from 1200 to 1000
    It may doesnt mather whether we change priv_validator_state.json, when the rollbacking node started, the cluster may have gone to height 1300 , so double signing doesnt exist.
  • rollback 3 nodes to from 1200 to 1000:
    we'd better not change priv_validator_state.json, I thought. If we rollbacked node, the cluster will stay height 1200 unless we restart the rollbacking node. This is similar as that three node crashed at height 1200. If the rollbacking node catch up with the current height, the priv_validator_state.json will used to prevent double signing?

@mattkanwisher
Copy link

Any sample code to try out? Trying to handle this situation right now

@bradyjoestar
Copy link
Contributor Author

@mattkanwisher I had wrote some code for test and it worked well with tendermint0.27.4

Usage: tendermint node --proxy_app=kvstore --rollback_data=true --rollback_height=$rollbackHeight

master...bradyjoestar:revert

Catuion: it wil directly modify data so be careful!!

Wal file error could be skipped for sample.

I think it will take some time to create a pull request because it is really a dangerous operation.

@bradyjoestar
Copy link
Contributor Author

@mattkanwisher
I created a pull request and there are three scripts in it as test case.
#3239

@ebuchman
Copy link
Contributor

ebuchman commented Feb 1, 2019

What if we do this as an independent tool for now rather than making it available in the tendermint command?

@ebuchman
Copy link
Contributor

ebuchman commented Feb 1, 2019

We shouldn't clutter the tendermint node command line options with things like this. It should be uncommon enough, and certainly dangerous enough, to warrant installing a separate binary.

@bradyjoestar
Copy link
Contributor Author

you're right. Considering of security, its better to create a independent tool.
I would change it.

@github-actions github-actions bot added the stale for use by stalebot label Sep 16, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 20, 2023
iKapitonau pushed a commit to scrtlabs/tendermint that referenced this issue Jul 16, 2024
… (tendermint#3308)

have different type. Adds `ValidatorSet#AllKeysHaveSameType`, which
checks exactly that.

Closes tendermint#3195



---

#### PR checklist

- [ ] ~~Tests written/updated~~
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] ~~Updated relevant documentation (`docs/` or `spec/`) and code
comments~~
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request tendermint#3196 done by
[Mergify](https://mergify.com).

---------

Co-authored-by: Anton Kaliaev <[email protected]>
iKapitonau pushed a commit to scrtlabs/tendermint that referenced this issue Jul 16, 2024
…endermint#3310) (tendermint#3345)

Refs tendermint#3196



---

#### PR checklist

- [x] Tests written/updated
- [ ] ~~Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)~~
- [x] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request tendermint#3310 done by
[Mergify](https://mergify.com).

---------

Co-authored-by: Anton Kaliaev <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale for use by stalebot T:question Type: Question
Projects
None yet
Development

No branches or pull requests

4 participants