You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(using Namada version v0.22.0 and Cometbft version 0.37.2)
All validators and full nodes on our local testnet would panic with the following error shortly after block height 100000, which corresponds to the default value of consensus_params.evidence.max_age_num_blocks:
I[2023-09-01|10:07:45.614] finalizing commit of block module=consensus height=100421 hash=F87B325287D0864A6B1D5F964B173137D801EBB9BE39C2D26A3C845B4F1396CB root=735053B51C4A798D319906F46C05351CD135F5078602277381606FE089161B0B num_txs=0
2023-09-01T10:07:45.618931Z INFO namada_core::ledger::storage::wl_storage: Began a new epoch 457
2023-09-01T10:07:45.618952Z INFO namada_apps::node::ledger::shell::finalize_block: Block height: 100421, epoch: 457, is new epoch: true.
The application panicked (crashed).
Message: index out of bounds: the len is 456 but the index is 456
Location: apps/src/lib/node/ledger/shell/finalize_block.rs:695
Backtrace omitted.
Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.
2023-09-01T10:07:47.281671Z INFO namada_apps::node::ledger::shims::abcipp_shim: ABCI response channel didn't respond
E[2023-09-01|10:07:47.282] Stopping abci.socketClient for error: read message: EOF module=abci-client connection=consensus
The application panicked (crashed).
Message: called `Result::unwrap()` on an `Err` value: RecvError(())
Location: /usr/local/cargo/git/checkouts/tower-abci-0d01b039e0b7a0c9/cf9573d/src/server.rs:163
Backtrace omitted.
Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.
I[2023-09-01|10:07:47.283] service stop module=abci-client connection=consensus msg="Stopping socketClient service" impl=socketClient
E[2023-09-01|10:07:47.283] error in proxyAppConn.EndBlock module=state err="read message: EOF"
E[2023-09-01|10:07:47.284] consensus connection terminated. Did the application crash? Please restart CometBFT module=proxy err="read message: EOF"
E[2023-09-01|10:07:47.284] CONSENSUS FAILURE!!! module=consensus err="failed to apply block; error read message: EOF" stack="goroutine 262 [running]:\nruntime/debug.Stack()\n\truntime/debug/stack.go:24 +0x65\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine.func2()\n\tgithub.com/cometbft/cometbft/consensus/state.go:732 +0x4c\npanic({0xe92440, 0xc001a434a0})\n\truntime/panic.go:838 +0x207\ngithub.com/cometbft/cometbft/consensus.(*State).finalizeCommit(0xc0000ae000, 0x18845)\n\tgithub.com/cometbft/cometbft/consensus/state.go:1709 +0xf05\ngithub.com/cometbft/cometbft/consensus.(*State).tryFinalizeCommit(0xc0000ae000, 0x18845)\n\tgithub.com/cometbft/cometbft/consensus/state.go:1609 +0x2ff\ngithub.com/cometbft/cometbft/consensus.(*State).enterCommit.func1()\n\tgithub.com/cometbft/cometbft/consensus/state.go:1544 +0xa5\ngithub.com/cometbft/cometbft/consensus.(*State).enterCommit(0xc0000ae000, 0x18845, 0x0)\n\tgithub.com/cometbft/cometbft/consensus/state.go:1582 +0xcb7\ngithub.com/cometbft/cometbft/consensus.(*State).addVote(0xc0000ae000, 0xc0017345a0, {0xc000108810, 0x28})\n\tgithub.com/cometbft/cometbft/consensus/state.go:2212 +0xcbf\ngithub.com/cometbft/cometbft/consensus.(*State).tryAddVote(0xc0000ae000, 0xc0017345a0, {0xc000108810?, 0xc00027ff00?})\n\tgithub.com/cometbft/cometbft/consensus/state.go:2001 +0x2c\ngithub.com/cometbft/cometbft/consensus.(*State).handleMsg(0xc0000ae000, {{0x127bfa0?, 0xc0012b6820?}, {0xc000108810?, 0x0?}})\n\tgithub.com/cometbft/cometbft/consensus/state.go:861 +0x44b\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine(0xc0000ae000, 0x0)\n\tgithub.com/cometbft/cometbft/consensus/state.go:768 +0x419\ncreated by github.com/cometbft/cometbft/consensus.(*State).OnStart\n\tgithub.com/cometbft/cometbft/consensus/state.go:379 +0x12d\n"
I[2023-09-01|10:07:47.284] service stop module=consensus wal=/root/.local/share/namada/luminara.79474f00ace3ef7ca2712/cometbft/data/cs.wal/wal msg="Stopping baseWAL service" impl=baseWAL
I[2023-09-01|10:07:47.284] signal trapped module=main msg="captured terminated, exiting..."
I[2023-09-01|10:07:47.285] service stop module=consensus wal=/root/.local/share/namada/luminara.79474f00ace3ef7ca2712/cometbft/data/cs.wal/wal msg="Stopping Group service" impl=Group
I[2023-09-01|10:07:47.285] service stop module=main msg="Stopping Node service" impl=Node
I[2023-09-01|10:07:47.285] Stopping Node module=main
At the start of a new epoch, PoS will recalculate inflation wrt the previous epoch by checking pred_epochs.first_block_heights at the index equal to the previous epoch:
let first_block_of_last_epoch = self
.wl_storage
.storage
.block
.pred_epochs
.first_block_heights[last_epoch.0 as usize]
.0;
but this doesn't take into account that pred_epochs is trimmed to only keep epochs from less than consensus_params.evidence.max_age_num_blocks ago and after reaching that height, indices will no longer directly correspond to epoch number. The first time through after reaching this height will incorrectly reference the current epoch instead of the previous and the second time through will result in an out of bounds error.
Steps:
Network nodes consistently panic at block height ~100200 on new epoch start
Set epoch duration to half; network will produce ~double the number of epochs but still panic at the same block height.
Change the constant evidence_max_age_num_blocks in core/src/ledger/storage/wl_storage.rs from 100000 to 1000; nodes now panic on second new epoch after height 1000 instead.
After making the following change and relaunching network; nodes no longer panic at heights > consensus_params.evidence.max_age_num_blocks. Our localnet is currently at block 111000 and counting
// Get the number of blocks in the last epoch
let last_epoch_index: usize = last_epoch.0 as usize - self.wl_storage.storage.block.pred_epochs.first_known_epoch.0 as usize;
let first_block_of_last_epoch = self
.wl_storage
.storage
.block
.pred_epochs
.first_block_heights[last_epoch_index]
.0;
(I'm not sure if this is the 'right' way to modify the code but at least it seems to confirm the cause/solution)
The text was updated successfully, but these errors were encountered:
(using Namada version
v0.22.0
and Cometbft version0.37.2
)All validators and full nodes on our local testnet would panic with the following error shortly after block height
100000
, which corresponds to the default value ofconsensus_params.evidence.max_age_num_blocks
:At the start of a new epoch, PoS will recalculate inflation wrt the previous epoch by checking
pred_epochs.first_block_heights
at the index equal to the previous epoch:but this doesn't take into account that
pred_epochs
is trimmed to only keep epochs from less thanconsensus_params.evidence.max_age_num_blocks
ago and after reaching that height, indices will no longer directly correspond to epoch number. The first time through after reaching this height will incorrectly reference the current epoch instead of the previous and the second time through will result in an out of bounds error.Steps:
evidence_max_age_num_blocks
incore/src/ledger/storage/wl_storage.rs
from100000
to1000
; nodes now panic on second new epoch after height 1000 instead.consensus_params.evidence.max_age_num_blocks
. Our localnet is currently at block 111000 and counting(I'm not sure if this is the 'right' way to modify the code but at least it seems to confirm the cause/solution)
The text was updated successfully, but these errors were encountered: