Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node should not corrupt its state on exit #5050

Closed
frol opened this issue Oct 20, 2021 · 3 comments
Closed

Node should not corrupt its state on exit #5050

frol opened this issue Oct 20, 2021 · 3 comments
Assignees
Labels
A-chain Area: Chain, client & related C-incident Category: issues that are related to or have caused some incident C-partner-request Category: feature requests from partners Node Node team P-high Priority: High T-node Team: issues relevant to the node experience team

Comments

@frol
Copy link
Collaborator

frol commented Oct 20, 2021

Describe the bug

We received 3 reports today that our partner Indexer nodes got corrupted after node was aborted with

if next_epoch_protocol_version > PROTOCOL_VERSION {
panic!("The client protocol version is older than the protocol version of the network. Please update nearcore");
}

image

After they recompiled the node with the newer nearcore, they could not boot the node back again:

thread 'main' panicked at 'index out of bounds: the len is 1 but the index is 1', /home/ubuntu/.cargo/git/checkouts/nearcore-e3f14b9758bedfa0/fb1e621/chain/epoch_manager/src/lib.rs:1193:26
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread '<unnamed>' panicked at 'The lock was poisoned.: PoisonError { .. }', /home/ubuntu/.cargo/git/checkouts/nearcore-e3f14b9758bedfa0/fb1e621/nearcore/src/runtime/mod.rs:993:69
thread '<unnamed>' panicked at 'The lock was poisoned.: PoisonError { .. }', /home/ubuntu/.cargo/git/checkouts/nearcore-e3f14b9758bedfa0/fb1e621/nearcore/src/runtime/mod.rs:1042:69
thread '<unnamed>' panicked at 'The lock was poisoned.: PoisonError { .. }', /home/ubuntu/.cargo/git/checkouts/nearcore-e3f14b9758bedfa0/fb1e621/nearcore/src/runtime/mod.rs:1042:69
thread '<unnamed>' panicked at 'The lock was poisoned.: PoisonError { .. }', /home/ubuntu/.cargo/git/checkouts/nearcore-e3f14b9758bedfa0/fb1e621/nearcore/src/runtime/mod.rs:1042:69
thread 'main' panicked at 'assertion failed: queue.send(msg).is_ok()', /home/ubuntu/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/actix-0.11.0-beta.2/src/sync.rs:161:25

To Reproduce

Probably, you need to let the node that does not support newer protocol version to run over the protocol upgrade, so it hits the condition.

Expected behavior

nearcore should not corrupt its database and should be able to boot fine after this abort.

Version (please complete the following information):

  • nearcore commit/branch: 4dbc9f8
  • testnet
@frol frol added A-chain Area: Chain, client & related P-high Priority: High C-partner-request Category: feature requests from partners C-incident Category: issues that are related to or have caused some incident T-node Team: issues relevant to the node experience team labels Oct 20, 2021
@bowenwang1996
Copy link
Collaborator

I believe this is the same issue as #3266. @mina86 please take a look

@mina86
Copy link
Contributor

mina86 commented Oct 20, 2021

I believe this is the same issue as #3266.

This is something different. #3266 is about shutting down after getting a signal. Here we have a panic which aborts the process.

@mina86
Copy link
Contributor

mina86 commented Nov 25, 2021

Closing in favour of #5340 as it’s the same underlying cause.

@mina86 mina86 closed this as completed Nov 25, 2021
@gmilescu gmilescu added the Node Node team label Oct 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-chain Area: Chain, client & related C-incident Category: issues that are related to or have caused some incident C-partner-request Category: feature requests from partners Node Node team P-high Priority: High T-node Team: issues relevant to the node experience team
Projects
None yet
Development

No branches or pull requests

5 participants