-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: deflake: use 2 miners for flaky tests #10764
Conversation
The two affected tests passed once, rerunning! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love it because it makes flakes less likely but still possible, unless I'm missing something.
I won't stop you giving it a try to see if we get lots of practical improvement. It might be a quick way to make things manageable in the short term.
@ZenGround0 My theory is that the miner being actively tested has a lot going on (sector imports, commits, etc.) that it needs to do with little hardware while also PoSting. I think it's reasonable say "we'll give you a pass on needing to PoSt for this test". |
Attempt 2 was good too, let's do a third! |
I agree with all this but it doesn't mean that the second miner can't fail for similar reasons even if they are less likely to happen. This miner is usually in the same process as the actively tested miner as far as I understand and can also be impacted by tests on the first miner. It's possible that in some tests it actually is impossible for second miner to fail post too but there is no good general reason. To truly remove flakiness we need rigorous guarantees on failure. |
Attempt #3 also good. I'd like to merge this, and continue to monitor. Separately, batch deals failed every single time, so that's going to be my next area of focus :P |
It seems unlikely to be a resource issue given the shared process. Maybe a message pool issue? If adding a second miner helps, it sounds like we have some kind of heavily contended lock somewhere. However, I think "less flaky" is better than nothing. If this doesn't help, we can always revert. I would add some code comments explaining why we're starting two (in case this code gets refactored and git history gets muddy). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a "can't hurt to try it" kind of fix.
8280a98
to
e91bb64
Compare
Related Issues
These tests fail quite often, and it's always because the PoSt process doesn't kick off, causing the network to halt eventually (because no blocks can be produced).
Proposed Changes
Run the network with two miners, in the hope that the second miner (that isn't being actively tested) will keep the blockchain going. If that doesn't work, we can try using the
BeginMiningMustPoSt
mode.Additional Info
Checklist
Before you mark the PR ready for review, please make sure that:
<PR type>: <area>: <change being made>
fix: mempool: Introduce a cache for valid signatures
PR type
: fix, feat, build, chore, ci, docs, perf, refactor, revert, style, testarea
, e.g. api, chain, state, market, mempool, multisig, networking, paych, proving, sealing, wallet, deps