Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transaction stuck on IsBroadcast using fork-aware pool #174

Open
valentinfernandez1 opened this issue Jul 19, 2024 · 9 comments
Open

Transaction stuck on IsBroadcast using fork-aware pool #174

valentinfernandez1 opened this issue Jul 19, 2024 · 9 comments
Assignees

Comments

@valentinfernandez1
Copy link
Collaborator

valentinfernandez1 commented Jul 19, 2024

Mythical team has been testing the fork-aware transaction pool on Rococo Muse and have encountered an error while using the submitAndWatch rpc method.

Issue is the following:

  • A transaction is sent to the chain with submitAndWatch() to track the status of the transaction.
  • Normal flow when a transaction is included would be Ready -> Broadcast -> InBlock -> Finalized (Occationally it can get a retracted but that is expected if there is a fork).
  • In this case flow is the following Ready -> Broadcast. And no new updates are received after the Broadcast status, even though the transaction did get included in a finalized block.
@michalkucharczyk
Copy link

This will help with debugging #175
I will need timestamp (more or less) and tx hash.

@michalkucharczyk
Copy link

michalkucharczyk commented Jul 22, 2024

I reviewed a log.

The situation there actually looks like one where transaction events are stuck. Actually the InBlock event was not stuck forever, it would be delayed by about 60seconds.

Here is the excerpt from the log, and also explanation why it happend
Note: tx hash 0xe888ae9d185e209117b8d662c0e470ed4b66075f3d2da4a58fb9055e7ed9a042 - aliased to RECK):

2024-07-19 15:12:54.100  INFO tokio-runtime-worker substrate: [Parachain] 🏆 Imported #46053 (BLOCK46052 → BLOCK46053)    
2024-07-19 15:12:54.100  INFO tokio-runtime-worker txpool: [Parachain] maintain: txs:(0, 0) views:[1;[(46053, 0, 0)]] event:NewBestBlock { hash: BLOCK46053, tree_route: None }  took:415.01µs    
2024-07-19 15:12:58.486  INFO tokio-runtime-worker txpool: [Parachain] maintain: txs:(0, 0) views:[1;[(46053, 0, 0)]] event:Finalized { hash: BLOCK46048, tree_route: [] }  took:146.02µs    
2024-07-19 15:13:00.186  INFO tokio-runtime-worker substrate: [Parachain] 🆕 Imported #46051 (BLOCK46050 → BLOCK46051f02)    
2024-07-19 15:13:00.318  INFO tokio-runtime-worker substrate: [Parachain] 🆕 Imported #46052 (BLOCK46051f02 → BLOCK46052f01)    
2024-07-19 15:13:02.384 DEBUG tokio-runtime-worker txpool: [RECK] fatp::submit_and_watch views:1    
2024-07-19 15:13:02.385 DEBUG tokio-runtime-worker txpool: [RECK] view::submit_and_watch at:BLOCK46053    
2024-07-19 15:13:02.386 DEBUG tokio-runtime-worker txpool: [Parachain] [RECK] validate_transaction_blocking: at:BLOCK46053 took:1.1429ms    
2024-07-19 15:13:02.386 DEBUG tokio-runtime-worker txpool: [RECK] ValidatedPool::submit_one    
2024-07-19 15:13:02.386 DEBUG tokio-runtime-worker txpool: [RECK] Importing WaitingTransaction { imported_at: Instant { tv_sec: 3802733, tv_nsec: 979753932 }, transaction: Transaction { hash: RECK, priority: 200, valid_till: 18446744073709551614, bytes: 15096, propagate: true, source: TransactionSource::External, requires: [], provides: [7e5c1f19e8828c87483a8b09c3c230560923d1ff00000000], data: 847e5c1f19e8828c87483a8b09c3c230560923d1ffd9740fdb24ea29fc3886029b18d3abeacc0758e012d8b0afa048a602397972f602503f47ba53e441916facbac7b72e6b5f3f1abd6b459cb4164b2e265ac86df90000008d010402d1070a03d748ae6a868c97f402ef771642945a457354b61d0f0000c16ff286230a038df69534aadd135f21bdabb9d26bae825a70579d0f0000c16ff286230a03e454c7d3053c1a5a79640c32464045964a1f473f0f0000c16ff286230a03e48af90109ebce9f3e4a06e268b9111c0123c70c0f0000c16ff286230a03b1e22a13c86b4efd00467e233d5f09f931b332300f0000c16ff286230a03ae2a73dba61b776536e886f2c634a8feb64f5cca0f0000c16ff286230a03ea9a155c6cf9665fd9fca7a6d8195cac5af4995e0f0000c16ff286230a030edf8fcaf05ca7903825496ba482a8fb00193b5c0f0000c16ff286230a03414186137978a91cc660329f60b9ab3d38e7e2e30f0000c16ff286230a038a2653d2e05a05cfa15ce956100b0fd5bf61a5030f0000c16ff286230a03b116943d00f8d645a9aefe6d5cb5c333d943ded10f0000c16ff286230a0382e96f96428d0fac9fdcb4cd94354cdf81cc6d090f0000c16ff286230a037ae15ae99195bac51d155466ebf995f554013d290f0000c16ff286230a039c861717b04dc94641e5458303a606d3b8b678170f0000c16ff2...86230a03ceba6d3ee916cfb3b3df1fb6b3d20f088b7047b40f0000c16ff286230a03eb7e000465c2a0acbd94fe50fafd844022a644bb0f0000c16ff286230a0309768cc3310bb7614ca40289b60756bed332e25a0f0000c16ff286230a03091225edc4745931ae8c47e005f409997023aab70f0000c16ff286230a03b34073ba31fbea76ca5809bb57008ea3ce8617430f0000c16ff286230a03980ffade88fc08bd5df4c171f1317d12596e499d0f0000c16ff286230a030d32e1056ab7e75e4e4db350fd39232af12d79e60f0000c16ff286230a03c92cbf73714421b232a3f8bc6338276726a7bb140f0000c16ff286230a0328c86545f2fafccc54e30d5d667d65c2521c72180f0000c16ff286230a03c74237fcf62a71b25365c23a4fee4701a2b7b02a0f0000c16ff286230a03b5bbe1acdf846d79fa545fdfa8b513b5aa541cf00f0000c16ff286230a03c22d6d2ef556b9442215281e2d6527835901096a0f0000c16ff286230a03d0409222b006530d942ed1f1a0a8953587cdde0d0f0000c16ff286230a0344650f5c5d105fb76c6f72d380f8f130064eeb0c0f0000c16ff286230a031191b1f678b78947678444e0626e0a003ef00c1d0f0000c16ff286230a03f23aa7aa6ee99453f12ae4296f5e36ed72d486580f0000c16ff286230a030d1cb731cbd65eb520c0e722fc4b895a46edc6800f0000c16ff28623}, missing_tags: {}} to ready    
2024-07-19 15:13:02.386 DEBUG tokio-runtime-worker txpool: [RECK] sending out: Ready    
2024-07-19 15:13:02.386 DEBUG tokio-runtime-worker txpool: [Parachain] [RECK] import_sink_worker sending out imported    
2024-07-19 15:13:02.387 DEBUG tokio-runtime-worker txpool: [RECK] sending out: Broadcasted

The transaction is included in the view (for block number 46053):

2024-07-19 15:13:02.449  INFO tokio-runtime-worker txpool: [Parachain] maintain: txs:(0, 2) views:[1;[(46053, 2, 0)]] event:Finalized { hash: BLOCK46049, tree_route: [] }  took:142.58µs    

At this point the parachain starts a heavy re-org, which takes ~1 minute. During this period no NewBestBlock is notified to the transaction pool, and txpool performs a light-maintain.[note: light-maintain is intended to provide ready transaction as soon as possible w/o re-validating transaction. This means that we only take txs from the views to which tree route does not contain retracted blocks (as txs from retracted requires re-validation)].

2024-07-19 15:13:02.451 DEBUG tokio-runtime-worker txpool: [Parachain] [RECK] validate_transaction_blocking: at:BLOCK46049 took:1.44011ms    
2024-07-19 15:13:06.220  INFO tokio-runtime-worker substrate: [Parachain] 🆕 Imported #46053 (BLOCK46052f01 → BLOCK46053f01)    
2024-07-19 15:13:06.356  INFO tokio-runtime-worker substrate: [Parachain] 🆕 Imported #46053 (BLOCK46052f01 → BLOCK46053f02)    
2024-07-19 15:13:10.569  INFO tokio-runtime-worker txpool: [Parachain] maintain: txs:(0, 20) views:[1;[(46053, 20, 0)]] event:Finalized { hash: BLOCK46050, tree_route: [] }  took:461.51µs    
2024-07-19 15:13:18.109  INFO tokio-runtime-worker substrate: [Parachain] 🆕 Imported #46051 (BLOCK46050 → BLOCK46051f03)    
2024-07-19 15:13:18.242  INFO tokio-runtime-worker substrate: [Parachain] 🆕 Imported #46052 (BLOCK46051f03 → BLOCK46052f02)    
2024-07-19 15:13:24.243  INFO tokio-runtime-worker substrate: [Parachain] 🆕 Imported #46053 (BLOCK46052f02 → BLOCK46053f03)    
2024-07-19 15:13:36.179  INFO tokio-runtime-worker substrate: [Parachain] 🆕 Imported #46051 (BLOCK46050 → BLOCK46051f04)    
2024-07-19 15:13:36.313  INFO tokio-runtime-worker substrate: [Parachain] 🆕 Imported #46052 (BLOCK46051f04 → BLOCK46052f03)    
2024-07-19 15:13:42.205  INFO tokio-runtime-worker substrate: [Parachain] 🆕 Imported #46053 (BLOCK46052f03 → BLOCK46053f04)    
2024-07-19 15:13:54.150  INFO tokio-runtime-worker substrate: [Parachain] 🆕 Imported #46051 (BLOCK46050 → BLOCK46051f05)    
2024-07-19 15:13:54.284  INFO tokio-runtime-worker substrate: [Parachain] 🆕 Imported #46052 (BLOCK46051f05 → BLOCK46052f04)    
2024-07-19 15:14:00.195  INFO tokio-runtime-worker substrate: [Parachain] 🆕 Imported #46053 (BLOCK46052f04 → BLOCK46053f05)    
2024-07-19 15:14:06.314  INFO tokio-runtime-worker substrate: [Parachain] 🏆 Imported #46054 (BLOCK46053f05 → BLOCK46054)    

Here is how it looks like:
image

Once this re-org is done, transaction is included, but the listener is already disconnected:

2024-07-19 15:14:06.314 DEBUG tokio-runtime-worker txpool: [Parachain] [RECK] add_view_watcher_for_tx: send message failed: Closed(..)    
2024-07-19 15:14:06.317  INFO tokio-runtime-worker txpool: [Parachain] maintain: txs:(0, 20) views:[2;[(46054, 20, 0), (46053, 20, 0)]] event:NewBestBlock { hash: BLOCK46054, tree_route: Some(TreeRoute { route: [HashAndNumber { number: 46053, hash: BLOCK46053 }, HashAndNumber { number: 46052, hash: BLOCK46052 }, HashAndNumber { number: 46051, hash: BLOCK46051 }, HashAndNumber { number: 46050, hash: BLOCK46050 }, HashAndNumber { number: 46051, hash: BLOCK46051f05 }, HashAndNumber { number: 46052, hash: BLOCK46052f04 }, HashAndNumber { number: 46053, hash: BLOCK46053f05 }], pivot: 3 }) }  took:3.31243ms    
2024-07-19 15:14:06.318 DEBUG tokio-runtime-worker txpool: [Parachain] [RECK] validate_transaction_blocking: at:BLOCK46054 took:700.52µs    
2024-07-19 15:14:06.334 DEBUG tokio-runtime-worker txpool: [Parachain] [RECK] view::revalidate_later result: Ok(Ok(ValidTransaction { priority: 200, requires: [], provides: [[126, 92, 31, 25, 232, 130, 140, 135, 72, 58, 139, 9, 195, 194, 48, 86, 9, 35, 209, 255, 0, 0, 0, 0]], longevity: 18446744073709505560, propagate: true }))    
2024-07-19 15:14:14.103 DEBUG tokio-runtime-worker txpool: [Parachain] [RECK] Importing WaitingTransaction { imported_at: Instant { tv_sec: 3802805, tv_nsec: 696396749 }, transaction: Transaction { hash: RECK, priority: 200, valid_till: 18446744073709551614, bytes: 15096, propagate: true, source: TransactionSource::External, requires: [], provides: [7e5c1f19e8828c87483a8b09c3c230560923d1ff00000000], data: 847e5c1f19e8828c87483a8b09c3c230560923d1ffd9740fdb24ea29fc3886029b18d3abeacc0758e012d8b0afa048a602397972f602503f47ba53e441916facbac7b72e6b5f3f1abd6b459cb4164b2e265ac86df90000008d010402d1070a03d748ae6a868c97f402ef771642945a457354b61d0f0000c16ff286230a038df69534aadd135f21bdabb9d26bae825a70579d0f0000c16ff286230a03e454c7d3053c1a5a79640c32464045964a1f473f0f0000c16ff286230a03e48af90109ebce9f3e4a06e268b9111c0123c70c0f0000c16ff286230a03b1e22a13c86b4efd00467e233d5f09f931b332300f0000c16ff286230a03ae2a73dba61b776536e886f2c634a8feb64f5cca0f0000c16ff286230a03ea9a155c6cf9665fd9fca7a6d8195cac5af4995e0f0000c16ff286230a030edf8fcaf05ca7903825496ba482a8fb00193b5c0f0000c16ff286230a03414186137978a91cc660329f60b9ab3d38e7e2e30f0000c16ff286230a038a2653d2e05a05cfa15ce956100b0fd5bf61a5030f0000c16ff286230a03b116943d00f8d645a9aefe6d5cb5c333d943ded10f0000c16ff286230a0382e96f96428d0fac9fdcb4cd94354cdf81cc6d090f0000c16ff286230a037ae15ae99195bac51d155466ebf995f554013d290f0000c16ff286230a039c861717b04dc94641e5458303a606d3b8b678170f0000c16ff2...86230a03ceba6d3ee916cfb3b3df1fb6b3d20f088b7047b40f0000c16ff286230a03eb7e000465c2a0acbd94fe50fafd844022a644bb0f0000c16ff286230a0309768cc3310bb7614ca40289b60756bed332e25a0f0000c16ff286230a03091225edc4745931ae8c47e005f409997023aab70f0000c16ff286230a03b34073ba31fbea76ca5809bb57008ea3ce8617430f0000c16ff286230a03980ffade88fc08bd5df4c171f1317d12596e499d0f0000c16ff286230a030d32e1056ab7e75e4e4db350fd39232af12d79e60f0000c16ff286230a03c92cbf73714421b232a3f8bc6338276726a7bb140f0000c16ff286230a0328c86545f2fafccc54e30d5d667d65c2521c72180f0000c16ff286230a03c74237fcf62a71b25365c23a4fee4701a2b7b02a0f0000c16ff286230a03b5bbe1acdf846d79fa545fdfa8b513b5aa541cf00f0000c16ff286230a03c22d6d2ef556b9442215281e2d6527835901096a0f0000c16ff286230a03d0409222b006530d942ed1f1a0a8953587cdde0d0f0000c16ff286230a0344650f5c5d105fb76c6f72d380f8f130064eeb0c0f0000c16ff286230a031191b1f678b78947678444e0626e0a003ef00c1d0f0000c16ff286230a03f23aa7aa6ee99453f12ae4296f5e36ed72d486580f0000c16ff286230a030d1cb731cbd65eb520c0e722fc4b895a46edc6800f0000c16ff28623}, missing_tags: {}} to ready    
2024-07-19 15:14:14.103  INFO tokio-runtime-worker substrate: [Parachain] 🏆 Imported #46055 (BLOCK46054 → BLOCK46055)    
2024-07-19 15:14:14.103 DEBUG tokio-runtime-worker txpool: [Parachain] [RECK] Pruning transaction.    
2024-07-19 15:14:14.104 DEBUG tokio-runtime-worker txpool: [Parachain] [RECK] Resubmitting transaction: Invalid(RECK, Pool(TemporarilyBanned))    
2024-07-19 15:14:14.104 DEBUG tokio-runtime-worker txpool: [Parachain] [RECK] ValidatedPool::submit_one invalid: Pool(TemporarilyBanned)    
2024-07-19 15:14:14.104 DEBUG tokio-runtime-worker txpool::watcher: [Parachain] [RECK] Pruned at BLOCK46055    

After some more blocks tx is finalized:

2024-07-19 15:14:14.105  INFO tokio-runtime-worker txpool: [Parachain] maintain: txs:(0, 20) views:[2;[(46053, 20, 0), (46055, 18, 0)]] event:NewBestBlock { hash: BLOCK46055, tree_route: None }  took:2.986251ms    
2024-07-19 15:14:19.796  INFO tokio-runtime-worker substrate: [Parachain] 🏆 Imported #46056 (BLOCK46055 → BLOCK46056)    
2024-07-19 15:14:19.800  INFO tokio-runtime-worker txpool: [Parachain] maintain: txs:(0, 20) views:[2;[(46053, 20, 0), (46056, 16, 0)]] event:NewBestBlock { hash: BLOCK46056, tree_route: None }  took:4.37966ms    
2024-07-19 15:14:22.950  INFO tokio-runtime-worker txpool: [Parachain] maintain: txs:(0, 20) views:[2;[(46053, 20, 0), (46056, 16, 0)]] event:Finalized { hash: BLOCK46051f05, tree_route: [] }  took:2.23764ms    
2024-07-19 15:14:26.257  INFO tokio-runtime-worker substrate: [Parachain] 🏆 Imported #46057 (BLOCK46056 → BLOCK46057)    
2024-07-19 15:14:26.260  INFO tokio-runtime-worker txpool: [Parachain] maintain: txs:(0, 20) views:[2;[(46053, 20, 0), (46057, 13, 0)]] event:NewBestBlock { hash: BLOCK46057, tree_route: None }  took:2.98852ms    
2024-07-19 15:14:26.984  INFO tokio-runtime-worker txpool: [Parachain] maintain: txs:(0, 20) views:[2;[(46053, 20, 0), (46057, 13, 0)]] event:Finalized { hash: BLOCK46052f04, tree_route: [] }  took:1.74605ms    
2024-07-19 15:14:27.628  INFO tokio-runtime-worker substrate: [Parachain] 🆕 Imported #46057 (BLOCK46056 → BLOCK46057f01)    
2024-07-19 15:14:31.918  INFO tokio-runtime-worker substrate: [Parachain] 🏆 Imported #46058 (BLOCK46057f01 → BLOCK46058)    
2024-07-19 15:14:31.925  INFO tokio-runtime-worker txpool: [Parachain] maintain: txs:(0, 20) views:[3;[(46053, 20, 0), (46058, 7, 0), (46057, 13, 0)]] event:NewBestBlock { hash: BLOCK46058, tree_route: Some(TreeRoute { route: [HashAndNumber { number: 46057, hash: BLOCK46057 }, HashAndNumber { number: 46056, hash: BLOCK46056 }, HashAndNumber { number: 46057, hash: BLOCK46057f01 }], pivot: 1 }) }  took:7.45975ms    
2024-07-19 15:14:35.063  INFO tokio-runtime-worker txpool: [Parachain] maintain: txs:(0, 20) views:[2;[(46058, 7, 0), (46057, 13, 0)]] event:Finalized { hash: BLOCK46053f05, tree_route: [] }  took:961.571µs    
2024-07-19 15:14:39.029  INFO tokio-runtime-worker txpool: [Parachain] maintain: txs:(0, 20) views:[2;[(46058, 7, 0), (46057, 13, 0)]] event:Finalized { hash: BLOCK46054, tree_route: [] }  took:162.78µs    
2024-07-19 15:14:39.952  INFO tokio-runtime-worker substrate: [Parachain] 🏆 Imported #46059 (BLOCK46058 → BLOCK46059)    
2024-07-19 15:14:39.954  INFO tokio-runtime-worker txpool: [Parachain] maintain: txs:(0, 20) views:[2;[(46059, 3, 0), (46057, 13, 0)]] event:NewBestBlock { hash: BLOCK46059, tree_route: None }  took:3.587849ms    
2024-07-19 15:14:43.441  INFO tokio-runtime-worker substrate: [Parachain] 🏆 Imported #46060 (BLOCK46059 → BLOCK46060)    
2024-07-19 15:14:43.444  INFO tokio-runtime-worker txpool: [Parachain] maintain: txs:(0, 20) views:[2;[(46060, 0, 0), (46057, 13, 0)]] event:NewBestBlock { hash: BLOCK46060, tree_route: None }  took:3.34758ms    
2024-07-19 15:14:47.087 DEBUG tokio-runtime-worker txpool: [Parachain] [RECK] purged finalized transactions    

I had an implemetnation where txpool is processing all the blocks (not only best or finalized) - see this commit.
It is hard to judge to me if this is actually required, as this re-org with significant number of forks should be quite rare. Also light-maintain will work fine when the txpool is constantly filled with the transactions.

If the requirement is to have immediate InBlock event, even for blocks which are not recognized as best, we may include this commit.

If occasional delays in InBlock are fine then we may keep current implementation (and this would be my humble recommendation).

Please let me know if above analysis makes sense for the case you observed. Especially I'd like to confirm the period of time after which you concluded the event as stuck.

@bkchr
Copy link
Member

bkchr commented Jul 22, 2024

the listener is already disconnected:

This sounds like the problem to me? Did the listener disconnect on its own? If yes, this is the problem.

@michalkucharczyk
Copy link

michalkucharczyk commented Jul 22, 2024

the listener is already disconnected:

This sounds like the problem to me? Did the listener disconnect on its own? If yes, this is the problem.

From the discussion on element, I understood the listener was terminated by the client.

Quoting element:

It seems that now the transactions get stuck until client that is subscribed to their statuses disconnects?

@michalkucharczyk
Copy link

Please let me know if above analysis makes sense for the case you observed. Especially I'd like to confirm the period of time after which you concluded the event as stuck.

Confirming the period of time after the events were concluded as stuck is important. Was it ~1 minute or more?
Also hash of stuck transaction would be helpful.

@bkchr
Copy link
Member

bkchr commented Jul 22, 2024

By client you mean the rpc client?

@michalkucharczyk
Copy link

By client you mean the rpc client?

Yes

@bkchr
Copy link
Member

bkchr commented Jul 22, 2024

Confirming the period of time after the events were concluded as stuck is important. Was it ~1 minute or more?
Also hash of stuck transaction would be helpful.

@valentinfernandez1 @gautamdhameja this would be really good to know. Otherwise we can just guess.

@bkchr
Copy link
Member

bkchr commented Jul 22, 2024

If the requirement is to have immediate InBlock event, even for blocks which are not recognized as best, we may include this commit.

If occasional delays in InBlock are fine then we may keep current implementation (and this would be my humble recommendation).

paritytech/polkadot-sdk#1202 (comment) as said here. I would just send Included events for every block. Otherwise we again need the retracted event, if the new fork doesn't contain the transaction anymore.

We need at least something to report that the block is not part of the best chain anymore. The new json-rpc-spec handles it this way.

However, even there I would like to switch it to just report Inclusion: paritytech/json-rpc-interface-spec#87
But I also see the point on having this simple for Dapp developers to know if the tx is part of the best chain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants