Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(rpc): Shut down the RPC server properly when Zebra shuts down #5591

Merged
merged 8 commits into from
Nov 10, 2022

Conversation

teor2345
Copy link
Contributor

@teor2345 teor2345 commented Nov 9, 2022

Motivation

Zebra's RPC server sometimes crashes with memory errors during tests.

Closes #5532.

The shutdown is a bit slow (2 minutes?), but it's better than crashing.

Solution

  • Return a shutdown handle from the RPC spawn() method
    • Shut down RPC tasks in a separate std::thread, to avoid tokio panics due to blocking executor shutdowns
  • Make the sendrawtransaction queue task only have a channel receiver, so we can shut it down when the sender is dropped

Related cleanups:

  • Remove a redundant Option<_> from the sendrawtransaction channel type

This PR also adds extra CHANGELOG info for the security fix.

Review

This is a high-priority potential security issue.

Reviewer Checklist

  • Will the PR name make sense to users?
  • Are the PR labels correct?
  • Does the code do what the ticket and PR says?
  • How do you know it works? Does it have tests?

@teor2345 teor2345 added C-bug Category: This is a bug P-High 🔥 C-security Category: Security issues I-crash Zebra crashes (without a panic) A-rpc Area: Remote Procedure Call interfaces I-memory-safety Vulnerable code in Zebra or dependencies labels Nov 9, 2022
@teor2345 teor2345 requested a review from a team as a code owner November 9, 2022 00:37
@teor2345 teor2345 self-assigned this Nov 9, 2022
@teor2345 teor2345 requested review from oxarbitrage and removed request for a team November 9, 2022 00:37
@github-actions github-actions bot added the C-trivial Category: A trivial change that is not worth mentioning in the CHANGELOG label Nov 9, 2022
@codecov
Copy link

codecov bot commented Nov 9, 2022

Codecov Report

Merging #5591 (9dadc21) into main (7e13677) will increase coverage by 0.05%.
The diff coverage is 80.17%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5591      +/-   ##
==========================================
+ Coverage   78.68%   78.74%   +0.05%     
==========================================
  Files         305      305              
  Lines       38211    38328     +117     
==========================================
+ Hits        30068    30182     +114     
- Misses       8143     8146       +3     

@oxarbitrage
Copy link
Contributor

oxarbitrage commented Nov 9, 2022

When i shut down zebra using this pull request (with and without getblocktemplate-rpcs feature and with rpc listen_addr configured) i can't see any of the Stopping RPC server info messages. How i can know this is actually working ?

...
^C2022-11-09T18:34:24.207264Z  INFO {zebrad="a14d263" net="Main"}:sig{kind=SignalKind(2) name="SIGINT"}: zebrad::signal: received SIGINT, starting shutdown
2022-11-09T18:34:24.207625Z  INFO {zebrad="a14d263" net="Main"}: zebrad::components::tokio: waiting for async tokio tasks to shut down TOKIO_SHUTDOWN_TIMEOUT=20s
2022-11-09T18:34:24.212775Z  INFO {zebrad="a14d263" net="Main"}: zebra_network::address_book_updater: stopping address book updater error=Err(AllAddressBookUpdaterSendersClosed)
2022-11-09T18:34:25.686165Z  INFO write_blocks_from_channels{finalized_state=FinalizedState { network: Mainnet, debug_stop_at_height: None, db: ZebraDb { db: DiskDb { ephemeral: false, db: RocksDB { path: "/home/oxarbitrage/.cache/zebra/state/v25/mainnet" } } } } non_finalized_state=NonFinalizedState { chain_set: {}, network: Mainnet }}: zebra_state::service::write: StateService closed the block reset channel. Is Zebra shutting down?
2022-11-09T18:34:44.208394Z  INFO {zebrad="a14d263" net="Main"}: zebrad::components::tokio: shutting down Zebra
2022-11-09T18:34:44.208885Z  INFO {zebrad="a14d263" net="Main"}: zebrad::commands::start: stopping zebrad
$ 

@teor2345
Copy link
Contributor Author

teor2345 commented Nov 9, 2022

i can't see any of the Stopping RPC server info messages. How i can know this is actually working ?

Good question!

When you shut down zebrad using Ctrl-C, it calls exit(), which bypasses shutdown and ends the process immediately. That's ok, and it doesn't seem to cause any memory errors.

But in the zebra-rpc tests, we've seen memory errors when the test process ends (#5532). You can see the shutdown logs in the tests by doing:

RUST_LOG=info cargo test --package zebra-rpc -- --nocapture | grep Stop

When I do this I see lots of:

   Compiling zebra-rpc v1.0.0-beta.17 (/home/dev/zebra/zebra-rpc)
    Finished test [optimized + debuginfo] target(s) in 18.02s
     Running unittests src/lib.rs (/home/dev/zebra/target/debug/deps/zebra_rpc-e3fe29890f0e5ac6)
2022-11-09T20:31:23.745893Z  INFO Stopping RPC server
2022-11-09T20:31:23.746004Z  INFO Stopped RPC server
2022-11-09T20:31:23.746086Z  INFO Stopped RPC endpoint
2022-11-09T20:31:23.777702Z  INFO rpc_server_spawn_unallocated_port{parallel_cpu_threads=true do_shutdown=true}: Stopping RPC server
2022-11-09T20:31:23.779355Z  INFO rpc_server_spawn_unallocated_port{parallel_cpu_threads=true do_shutdown=true}: Stopped RPC server
2022-11-09T20:31:23.779361Z  INFO rpc_server_spawn_unallocated_port{parallel_cpu_threads=true do_shutdown=true}: Stopped RPC endpoint
2022-11-09T20:31:23.779585Z  INFO rpc_server_spawn_unallocated_port{parallel_cpu_threads=true do_shutdown=true}: Stopping RPC server
2022-11-09T20:31:23.779598Z  INFO rpc_server_spawn_unallocated_port{parallel_cpu_threads=true do_shutdown=true}: Stopped RPC server
...

(You will also see panics, that's ok, some tests expect a panic.)

The shutdown code is also triggered when a Zebra task has an error or finishes unexpectedly. You can see this happen by replacing any start() task with a future::ready(_).

When I did that to:

rpc_tx_queue_task_handle,

I saw these Zebra logs:

$ zebrad start

2022-11-09T20:34:30.545827Z  INFO zebrad::application: Diagnostic Metadata:
version: 1.0.0-rc.1+36.ga6bbebf.modified
Zcash network: Testnet
state version: 25
branch: rpc-shutdown
git commit: a6bbebf
commit timestamp: 2022-11-09T20:33:25Z
target triple: x86_64-unknown-linux-gnu
build profile: release
...
2022-11-09T20:34:30.547436Z  INFO {zebrad="a6bbebf" net="Test"}: zebrad::commands::start: Starting zebrad
...
2022-11-09T20:34:51.660370Z  INFO {zebrad="a6bbebf" net="Test"}: zebrad::commands::start: initializing mempool
2022-11-09T20:34:51.660385Z  INFO {zebrad="a6bbebf" net="Test"}: zebra_rpc::server: Trying to open RPC endpoint at 0.0.0.0:38232...
2022-11-09T20:34:51.660631Z  INFO {zebrad="a6bbebf" net="Test"}: zebra_rpc::server: Opened RPC endpoint at 0.0.0.0:38232
2022-11-09T20:34:51.660669Z  INFO {zebrad="a6bbebf" net="Test"}: zebrad::components::sync::gossip: initializing block gossip task
...
2022-11-09T20:34:51.660692Z  INFO {zebrad="a6bbebf" net="Test"}: zebrad::commands::start: spawned initial Zebra tasks
...
2022-11-09T20:34:51.660699Z  INFO {zebrad="a6bbebf" net="Test"}: zebrad::commands::start: rpc task exited
2022-11-09T20:34:51.660701Z  INFO {zebrad="a6bbebf" net="Test"}: zebrad::commands::start: exiting Zebra because an ongoing task exited: stopping other tasks
2022-11-09T20:34:51.660868Z  INFO {zebrad="a6bbebf" net="Test"}: zebra_rpc::server: Stopping RPC server
2022-11-09T20:34:51.660955Z  INFO {zebrad="a6bbebf" net="Test"}: zebra_rpc::server: Stopped RPC server
2022-11-09T20:34:51.660989Z  INFO {zebrad="a6bbebf" net="Test"}: zebra_rpc::server: Stopped RPC endpoint
2022-11-09T20:34:51.661026Z  INFO {zebrad="a6bbebf" net="Test"}: zebra_rpc::server: Stopping RPC server
2022-11-09T20:34:51.661034Z  INFO {zebrad="a6bbebf" net="Test"}: zebra_rpc::server: Stopped RPC server
2022-11-09T20:34:51.661107Z  INFO write_blocks_from_channels{finalized_state=FinalizedState { network: Testnet, debug_stop_at_height: None, db: ZebraDb { db: DiskDb { ephemeral: false, db: Ro
cksDB { path: "/home/dev/.cache/zebra-custom/state/v25/testnet" } } } } non_finalized_state=NonFinalizedState { chain_set: {}, network: Testnet }}: zebra_state::service::write: StateService c
losed the block reset channel. Is Zebra shutting down?
2022-11-09T20:34:51.661132Z  INFO {zebrad="a6bbebf" net="Test"}: zebra_network::address_book_updater: stopping address book updater error=Err(AllAddressBookUpdaterSendersClosed)
2022-11-09T20:34:51.661793Z  INFO {zebrad="a6bbebf" net="Test"}:sync:read_state{req=Depth(block::Hash("05a60a92d99d85997cce3b87616c089f6124d7342af37106edc76126334a2c38"))}: zebra_state::servi
ce::finalized_state::disk_db: flushing database to disk path="/home/dev/.cache/zebra-custom/state/v25/testnet"
2022-11-09T20:34:51.661856Z  INFO {zebrad="a6bbebf" net="Test"}:sync:read_state{req=Depth(block::Hash("05a60a92d99d85997cce3b87616c089f6124d7342af37106edc76126334a2c38"))}: zebra_state::servi
ce: waiting for the block write task to finish
2022-11-09T20:34:51.661862Z  INFO {zebrad="a6bbebf" net="Test"}:sync:read_state{req=Depth(block::Hash("05a60a92d99d85997cce3b87616c089f6124d7342af37106edc76126334a2c38"))}: zebra_state::servi
ce: shutting down the state without waiting for the block write task
2022-11-09T20:34:51.661867Z  INFO {zebrad="a6bbebf" net="Test"}:sync:read_state{req=Depth(block::Hash("05a60a92d99d85997cce3b87616c089f6124d7342af37106edc76126334a2c38"))}: zebra_state::servi
ce::finalized_state::disk_db: flushing database to disk path="/home/dev/.cache/zebra-custom/state/v25/testnet"
2022-11-09T20:34:51.663182Z  INFO {zebrad="a6bbebf" net="Test"}: zebrad::components::sync::progress: estimated progress to chain tip sync_percent=99.995% current_height=Height(2112317) networ
k_upgrade=Nu5 remaining_sync_blocks=108 time_since_last_state_block=0s
2022-11-09T20:34:52.553086Z  INFO {zebrad="a6bbebf" net="Test"}:init{config=Config { checkpoint_sync: true, debug_skip_parameter_preload: false } network=Testnet debug_skip_parameter_preload=
false}: zebra_consensus::chain: Groth16 pre-download and check task finished
2022-11-09T20:35:03.688089Z  INFO {zebrad="a6bbebf" net="Test"}: zebrad::components::tokio: shutting down Zebra
2022-11-09T20:35:03.688102Z  INFO {zebrad="a6bbebf" net="Test"}: zebrad::commands::start: stopping zebrad

I also made some changes to the code so we call shutdown_blocking() on RpcServer::drop() and in zebrad's start().

@oxarbitrage
Copy link
Contributor

make sense, thank you for the details. i tried all that and it is working for me now.

oxarbitrage
oxarbitrage previously approved these changes Nov 9, 2022
Copy link
Contributor

@oxarbitrage oxarbitrage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code looks good to me, i made some tests locally and seems to work ok.

mergify bot added a commit that referenced this pull request Nov 10, 2022
@mergify mergify bot merged commit 074733d into main Nov 10, 2022
@mergify mergify bot deleted the rpc-shutdown branch November 10, 2022 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rpc Area: Remote Procedure Call interfaces C-bug Category: This is a bug C-security Category: Security issues C-trivial Category: A trivial change that is not worth mentioning in the CHANGELOG I-crash Zebra crashes (without a panic) I-memory-safety Vulnerable code in Zebra or dependencies
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RPC test crashes: signal: 10, SIGBUS: access to undefined memory
2 participants