fix: deflake: use 2 miners for flaky tests #10764

arajasek · 2023-04-26T14:34:22Z

Related Issues

These tests fail quite often, and it's always because the PoSt process doesn't kick off, causing the network to halt eventually (because no blocks can be produced).

Proposed Changes

Run the network with two miners, in the hope that the second miner (that isn't being actively tested) will keep the blockchain going. If that doesn't work, we can try using the BeginMiningMustPoSt mode.

Additional Info

Checklist

Before you mark the PR ready for review, please make sure that:

Commits have a clear commit message.
PR title is in the form of of <PR type>: <area>: <change being made>
- example: fix: mempool: Introduce a cache for valid signatures
- PR type: fix, feat, build, chore, ci, docs, perf, refactor, revert, style, test
- area, e.g. api, chain, state, market, mempool, multisig, networking, paych, proving, sealing, wallet, deps
New features have usage guidelines and / or documentation updates in
- Lotus Documentation
- Discussion Tutorials
Tests exist for new functionality or change in behavior
CI is green

arajasek · 2023-04-26T15:26:28Z

The two affected tests passed once, rerunning!

ZenGround0

I don't love it because it makes flakes less likely but still possible, unless I'm missing something.

I won't stop you giving it a try to see if we get lots of practical improvement. It might be a quick way to make things manageable in the short term.

arajasek · 2023-04-26T17:27:04Z

@ZenGround0 My theory is that the miner being actively tested has a lot going on (sector imports, commits, etc.) that it needs to do with little hardware while also PoSting. I think it's reasonable say "we'll give you a pass on needing to PoSt for this test".

arajasek · 2023-04-26T17:33:04Z

Attempt 2 was good too, let's do a third!

ZenGround0 · 2023-04-26T17:53:43Z

@ZenGround0 My theory is that the miner being actively tested has a lot going on (sector imports, commits, etc.) that it needs to do with little hardware while also PoSting. I think it's reasonable say "we'll give you a pass on needing to PoSt for this test".

I agree with all this but it doesn't mean that the second miner can't fail for similar reasons even if they are less likely to happen. This miner is usually in the same process as the actively tested miner as far as I understand and can also be impacted by tests on the first miner. It's possible that in some tests it actually is impossible for second miner to fail post too but there is no good general reason.

To truly remove flakiness we need rigorous guarantees on failure.

arajasek · 2023-04-26T18:47:33Z

Attempt #3 also good. I'd like to merge this, and continue to monitor.

Separately, batch deals failed every single time, so that's going to be my next area of focus :P

Stebalien · 2023-04-27T00:07:05Z

It seems unlikely to be a resource issue given the shared process. Maybe a message pool issue? If adding a second miner helps, it sounds like we have some kind of heavily contended lock somewhere.

However, I think "less flaky" is better than nothing. If this doesn't help, we can always revert.

I would add some code comments explaining why we're starting two (in case this code gets refactored and git history gets muddy).

Stebalien

This is a "can't hurt to try it" kind of fix.

arajasek requested a review from a team as a code owner April 26, 2023 14:34

ZenGround0 reviewed Apr 26, 2023

View reviewed changes

Stebalien approved these changes Apr 27, 2023

View reviewed changes

fix: deflake: use 2 miners for flaky tests

e91bb64

arajasek force-pushed the asr/flaky-tests branch from 8280a98 to e91bb64 Compare April 27, 2023 12:43

arajasek merged commit 727a711 into master Apr 27, 2023

arajasek deleted the asr/flaky-tests branch April 27, 2023 15:14

arajasek mentioned this pull request May 11, 2023

feat: deflake sector_import_simple #10858

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: deflake: use 2 miners for flaky tests #10764

fix: deflake: use 2 miners for flaky tests #10764

arajasek commented Apr 26, 2023

arajasek commented Apr 26, 2023

ZenGround0 left a comment

arajasek commented Apr 26, 2023

arajasek commented Apr 26, 2023

ZenGround0 commented Apr 26, 2023

arajasek commented Apr 26, 2023

Stebalien commented Apr 27, 2023

Stebalien left a comment

fix: deflake: use 2 miners for flaky tests #10764

fix: deflake: use 2 miners for flaky tests #10764

Conversation

arajasek commented Apr 26, 2023

Related Issues

Proposed Changes

Additional Info

Checklist

arajasek commented Apr 26, 2023

ZenGround0 left a comment

Choose a reason for hiding this comment

arajasek commented Apr 26, 2023

arajasek commented Apr 26, 2023

ZenGround0 commented Apr 26, 2023

arajasek commented Apr 26, 2023

Stebalien commented Apr 27, 2023

Stebalien left a comment

Choose a reason for hiding this comment