Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(DO NOT MERGE) Add cometbft service cancellable context #2496

Draft
wants to merge 26 commits into
base: engine-api-retry
Choose a base branch
from

Conversation

fridrik01
Copy link
Contributor

@fridrik01 fridrik01 commented Feb 8, 2025

Currently, beacond hangs on SIGTERM if its inside a retry timeout. This is because the context used in the retry is not cancelled in our signal hander.

To address this, this PR adds a cancellable context to the cometbft service which it inherits from the context passed in to Start(). Now, when signal handler catches a signal it will call Stop() on the service which in turn cancels the context. This causes us to properly handle signal cancellation in the retry timeout.

HOWEVER, it seems that comet does not support cancellation, as I always get a panic from comet during shutdown:

C2025-02-08T16:02:02Z INFO Stopping services num=8
2025-02-08T16:02:02Z INFO Stopping service type=beacond
2025-02-08T16:02:02Z INFO Stopping CometBFT Node
2025-02-08T16:02:02Z INFO Stopping Node service impl=Node
2025-02-08T16:02:02Z INFO Stopping Node
2025-02-08T16:02:02Z ERRR Not stopping Pruner service -- has not been started yet module=state impl=Pruner
2025-02-08T16:02:02Z ERRR Error stopping the pruning service err=not started
2025-02-08T16:02:02Z INFO Stopping EventBus service module=events impl=EventBus
2025-02-08T16:02:02Z INFO Stopping PubSub service module=pubsub impl=PubSub
2025-02-08T16:02:02Z INFO Stopping P2P Switch service module=p2p impl=P2P Switch
2025-02-08T16:02:02Z INFO Stopping Reactor service module=blocksync impl=Reactor
2025-02-08T16:02:02Z INFO Stopping Consensus service module=consensus impl=ConsensusReactor
2025-02-08T16:02:02Z INFO Stopping State service module=consensus impl=ConsensusState
2025-02-08T16:02:02Z ERRR Failed trying to stop eventSwitch module=consensus error=not started
2025-02-08T16:02:02Z INFO Stopping TimeoutTicker service module=consensus impl=TimeoutTicker
2025-02-08T16:02:02Z INFO caught exit signal signal=interrupt
2025-02-08T16:02:02Z WARN prepare_proposal: s.ctx.Done
panic: context canceled

goroutine 4547 [running]:
github.com/cometbft/cometbft/internal/consensus.(*State).createProposalBlock(0xc0008d3808, {0x3485800, 0x4e106c0})
        github.com/cometbft/[email protected]/internal/consensus/state.go:1362 +0x3ee
github.com/cometbft/cometbft/internal/consensus.(*State).defaultDecideProposal(0xc0008d3808, 0x1, 0x0)
        github.com/cometbft/[email protected]/internal/consensus/state.go:1248 +0x65
github.com/cometbft/cometbft/internal/consensus.(*State).enterPropose(0xc0008d3808, 0x1, 0x0)
        github.com/cometbft/[email protected]/internal/consensus/state.go:1227 +0x923
github.com/cometbft/cometbft/internal/consensus.(*State).enterNewRound(0xc0008d3808, 0x1, 0x0)
        github.com/cometbft/[email protected]/internal/consensus/state.go:1137 +0xb25
github.com/cometbft/cometbft/internal/consensus.(*State).handleTimeout(0xc0008d3808, {0x10000000001?, 0x7f084d0d7538?, _, _}, {0x1, 0x0, 0x1, {0x4579bc1, 0xedf3976fa, ...}, ...})
        github.com/cometbft/[email protected]/internal/consensus/state.go:1005 +0x8c9
github.com/cometbft/cometbft/internal/consensus.(*State).readReplayMessage(0xc0008d3808, 0x0?, {0x0?, 0x0?})
        github.com/cometbft/[email protected]/internal/consensus/replay.go:88 +0x2e5
github.com/cometbft/cometbft/internal/consensus.(*State).catchupReplay(0xc0008d3808, 0x1)
        github.com/cometbft/[email protected]/internal/consensus/replay.go:163 +0x692
github.com/cometbft/cometbft/internal/consensus.(*State).OnStart(0xc0008d3808)
        github.com/cometbft/[email protected]/internal/consensus/state.go:347 +0x158
github.com/cometbft/cometbft/libs/service.(*BaseService).Start(0xc0008d3808)
        github.com/cometbft/[email protected]/libs/service/service.go:146 +0x2f5
github.com/cometbft/cometbft/internal/consensus.(*Reactor).OnStart(0xc000e0a900)
        github.com/cometbft/[email protected]/internal/consensus/reactor.go:93 +0xdd
github.com/cometbft/cometbft/libs/service.(*BaseService).Start(0xc000e0a900)
        github.com/cometbft/[email protected]/libs/service/service.go:146 +0x2f5
github.com/cometbft/cometbft/p2p.(*Switch).OnStart(0xc000dc3e60)
        github.com/cometbft/[email protected]/p2p/switch.go:234 +0x9c
github.com/cometbft/cometbft/libs/service.(*BaseService).Start(0xc000dc3e60)
        github.com/cometbft/[email protected]/libs/service/service.go:146 +0x2f5
github.com/cometbft/cometbft/node.(*Node).OnStart(0xc000db8c08)
        github.com/cometbft/[email protected]/node/node.go:619 +0x385
github.com/cometbft/cometbft/libs/service.(*BaseService).Start(0xc000db8c08)
        github.com/cometbft/[email protected]/libs/service/service.go:146 +0x2f5
github.com/berachain/beacon-kit/consensus/cometbft/service.(*Service).Start.func1()
        github.com/berachain/beacon-kit/consensus/cometbft/service/service.go:196 +0x2e
created by github.com/berachain/beacon-kit/consensus/cometbft/service.(*Service).Start in goroutine 1
        github.com/berachain/beacon-kit/consensus/cometbft/service/service.go:195 +0x511
make: *** [scripts/build/testing.mk:36: start] Interrupt

@fridrik01 fridrik01 self-assigned this Feb 8, 2025
@fridrik01 fridrik01 marked this pull request as ready for review February 8, 2025 15:55
@fridrik01 fridrik01 requested a review from a team as a code owner February 8, 2025 15:55
@fridrik01 fridrik01 force-pushed the add-cometbft-service-cancellable-context branch from 451ffef to c417507 Compare February 8, 2025 16:02
Copy link

codecov bot commented Feb 8, 2025

Codecov Report

Attention: Patch coverage is 0% with 101 lines in your changes missing coverage. Please review.

Project coverage is 32.10%. Comparing base (5e45224) to head (31fde13).

Files with missing lines Patch % Lines
consensus/cometbft/service/finalize_block.go 0.00% 32 Missing ⚠️
consensus/cometbft/service/process_proposal.go 0.00% 31 Missing ⚠️
consensus/cometbft/service/prepare_proposal.go 0.00% 28 Missing ⚠️
consensus/cometbft/service/service.go 0.00% 10 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@                 Coverage Diff                  @@
##           engine-api-retry    #2496      +/-   ##
====================================================
- Coverage             32.20%   32.10%   -0.11%     
====================================================
  Files                   350      350              
  Lines                 15665    15714      +49     
  Branches                 20       20              
====================================================
  Hits                   5045     5045              
- Misses                10257    10306      +49     
  Partials                363      363              
Files with missing lines Coverage Δ
consensus/cometbft/service/service.go 0.00% <0.00%> (ø)
consensus/cometbft/service/prepare_proposal.go 0.00% <0.00%> (ø)
consensus/cometbft/service/process_proposal.go 0.00% <0.00%> (ø)
consensus/cometbft/service/finalize_block.go 0.00% <0.00%> (ø)

@fridrik01 fridrik01 force-pushed the add-cometbft-service-cancellable-context branch from c417507 to 7062906 Compare February 8, 2025 16:04
@fridrik01 fridrik01 force-pushed the add-cometbft-service-cancellable-context branch from 992a862 to 31fde13 Compare February 12, 2025 20:07
@calbera calbera marked this pull request as draft February 20, 2025 23:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants