-
Notifications
You must be signed in to change notification settings - Fork 710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix OurViewChange small race #5356
Conversation
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
So the message traveled faster via TCP to a different peer and this answered us faster with a new message that got processed before the ourviewchange? And wouldn't the rx side also be first required to process these messages before they are propagated to the subsystem? |
* master: (35 commits) Fix OurViewChange small race (#5356) Make ticket non-optional and add ensure_successful method to Consideration trait (#5359) [tests] dedup test code, add more tests, improve naming and docs (#5338) Stop running the wishlist workflow on forks (#5297) Migrate foreign assets v3::Location to v4::Location (#4129) Minor clean up (#5284) [Pools] Ensure members can always exit the pool gracefully (#4998) StorageWeightReclaim: set to node pov size if higher (#5281) [Bot] Add prdoc generation (#5331) Small nits found accidentally along the way (#5341) Create subsystem-benchmarks.yml (#5325) Bump libp2p-identity from 0.2.8 to 0.2.9 (#5232) Bump authoring duration for async backing to 2s. (#5195) Fix spelling issues (#5206) Bump the known_good_semver group across 1 directory with 3 updates (#5315) `polkadot-node-core-pvf-common`: Fix test compilation error (#5310) ci: Paused `cmd-action` commenter (#5287) Remove unnecessary mut (#5318) chain-spec: minor clarification on the genesis config patch (#5324) fix av-distribution Jaeger spans mem leak (#5321) ...
* master: (35 commits) Fix OurViewChange small race (#5356) Make ticket non-optional and add ensure_successful method to Consideration trait (#5359) [tests] dedup test code, add more tests, improve naming and docs (#5338) Stop running the wishlist workflow on forks (#5297) Migrate foreign assets v3::Location to v4::Location (#4129) Minor clean up (#5284) [Pools] Ensure members can always exit the pool gracefully (#4998) StorageWeightReclaim: set to node pov size if higher (#5281) [Bot] Add prdoc generation (#5331) Small nits found accidentally along the way (#5341) Create subsystem-benchmarks.yml (#5325) Bump libp2p-identity from 0.2.8 to 0.2.9 (#5232) Bump authoring duration for async backing to 2s. (#5195) Fix spelling issues (#5206) Bump the known_good_semver group across 1 directory with 3 updates (#5315) `polkadot-node-core-pvf-common`: Fix test compilation error (#5310) ci: Paused `cmd-action` commenter (#5287) Remove unnecessary mut (#5318) chain-spec: minor clarification on the genesis config patch (#5324) fix av-distribution Jaeger spans mem leak (#5321) ...
Yes, that was the chronology from timestamps on versi where the CPU is 8 times over-committed.
Yes they would come through rx, I guess your question is where is the context switch between this task and the rx one, right ? |
* master: (167 commits) Upgrade accidentally downgraded deps (#5365) [Pools] Fix issues with member migration to `DelegateStake` (#4822) Unify `no_genesis` check (#5360) [CI] Fix prdoc command (#5358) Beefy: add benchmarks for `report_fork_voting()` (#5188) Fix OurViewChange small race (#5356) Make ticket non-optional and add ensure_successful method to Consideration trait (#5359) [tests] dedup test code, add more tests, improve naming and docs (#5338) Stop running the wishlist workflow on forks (#5297) Migrate foreign assets v3::Location to v4::Location (#4129) Minor clean up (#5284) [Pools] Ensure members can always exit the pool gracefully (#4998) StorageWeightReclaim: set to node pov size if higher (#5281) [Bot] Add prdoc generation (#5331) Small nits found accidentally along the way (#5341) Create subsystem-benchmarks.yml (#5325) Bump libp2p-identity from 0.2.8 to 0.2.9 (#5232) Bump authoring duration for async backing to 2s. (#5195) Fix spelling issues (#5206) Bump the known_good_semver group across 1 directory with 3 updates (#5315) ...
Just seen that the orchestra and network event handler are two different tasks. Still crazy that this worked, but explains how it could work. Ty! |
Always queue OurViewChange event before we send view changes to our peers, because otherwise we risk the peers sending us a message that can be processed by our subsystems before OurViewChange.
Normally, this is not really a problem because the latency of the ViewChange we send to our peers is way higher that our subsystem processing OurViewChange, however on testnets like versi where CPU is sometimes overcommitted this race gets triggered occasionally, so let's fix it by sending the messages in the right order.