Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: sealing: Fix RetryCommitWait loop when sector cron activation fails #11046

Merged
merged 3 commits into from
Aug 7, 2023

Conversation

magik6k
Copy link
Contributor

@magik6k magik6k commented Jul 4, 2023

Related Issues

Fixes #9841

Proposed Changes

In the CommitFailed state handler, if we see a successful commit message still check if the sector is visible on chain. If it's not run full checks, which will catch things like expired deals, and most likely will remove the sector.

Testing:

  • ITest
  • Real setup: check that a sector which is stuck in CommitWait (e.g. due to low feecap) beyond its start epoch is correctly removed, and doesn't spin through CommitWaitFailed

Additional Info

Checklist

Before you mark the PR ready for review, please make sure that:

  • Commits have a clear commit message.
  • PR title is in the form of of <PR type>: <area>: <change being made>
    • example: fix: mempool: Introduce a cache for valid signatures
    • PR type: fix, feat, build, chore, ci, docs, perf, refactor, revert, style, test
    • area, e.g. api, chain, state, market, mempool, multisig, networking, paych, proving, sealing, wallet, deps
  • New features have usage guidelines and / or documentation updates in
  • Tests exist for new functionality or change in behavior
  • CI is green

@magik6k magik6k force-pushed the fix/fsm-commfail-loop branch from c3dd512 to 402895f Compare July 6, 2023 07:27
@magik6k magik6k marked this pull request as ready for review July 6, 2023 09:26
@magik6k magik6k requested a review from a team as a code owner July 6, 2023 09:26
@jennijuju
Copy link
Member

Can you update the description with testing plan?

Copy link
Contributor

@ZenGround0 ZenGround0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per our new PR approval strategy I'll wait to approve until:

Real setup: check that a sector which is stuck in CommitWait (e.g. due to low feecap) beyond its start epoch is correctly removed, and doesn't spin through CommitWaitFailed

@gitaspick
Copy link

lotus-miner error message:

14. 2023-07-31 14:10:42 +0800 CST: [event;sealing.SectorCommitSubmitted] {"User":{"Message":{"/":"bafy2bzacearnj3ze6dqc5z5jnpagi3k5bzj7vvggqqxk3anz3jbvbqkh7zl4m"}}}
15. 2023-07-31 14:14:00 +0800 CST: [event;sealing.SectorCommitFailed] {"User":{}}
 proof validation failed, sector not found in sector set after cron
16. 2023-07-31 14:14:00 +0800 CST: [event;sealing.SectorRetryCommitWait] {"User":{}}
17. 2023-07-31 14:14:00 +0800 CST: [event;sealing.SectorCommitFailed] {"User":{}}
 proof validation failed, sector not found in sector set after cron
18. 2023-07-31 14:14:00 +0800 CST: [event;sealing.SectorRetryCommitWait] {"User":{}}
19. 2023-07-31 14:14:00 +0800 CST: [event;sealing.SectorCommitFailed] {"User":{}}
 proof validation failed, sector not found in sector set after cron
20. 2023-07-31 14:14:00 +0800 CST: [event;sealing.SectorRetryCommitWait] {"User":{}}
21. 2023-07-31 14:14:00 +0800 CST: [event;sealing.SectorCommitFailed] {"User":{}}
 proof validation failed, sector not found in sector set after cron
22. 2023-07-31 14:14:00 +0800 CST: [event;sealing.SectorRetryCommitWait] {"User":{}}
23. 2023-07-31 14:14:00 +0800 CST: [event;sealing.SectorCommitFailed] {"User":{}}
 proof validation failed, sector not found in sector set after cron
24. 2023-07-31 14:14:00 +0800 CST: [event;sealing.SectorRetryCommitWait] {"User":{}}
25. 2023-07-31 14:14:00 +0800 CST: [event;sealing.SectorCommitFailed] {"User":{}}
 proof validation failed, sector not found in sector set after cron
26. 2023-07-31 14:14:00 +0800 CST: [event;sealing.SectorRetryCommitWait] {"User":{}}
27. 2023-07-31 14:14:00 +0800 CST: [event;sealing.SectorCommitFailed] {"User":{}}

The contract error message is as follows:

[INFO]<StorageMiner::2245628> error activating deals on sector 17303: send aborted with code 16
[INFO]<StorageMiner::2245628> failed to activate deals on sector 17303, dropping from prove commit set
[ERROR]<StoragePower::4> failed to confirm sector proof validity to f02245628, error code ActorError(exit_code: ExitCode { value: 16 }, msg: send aborted with code 16)

The contract code search should be for an error in the 'activate_deals' method within the market contract, i.e., the specific location.

                let s: Option<DealState> = st.find_deal_state(rt.store(), deal_id)?;

                if s.is_some() {
                    return Err(actor_error!(
                        illegal_argument,
                        "deal {} already activated",
                        deal_id
                    ));
                }


But I have not executed 'activate_deals' before.
is there any new progress here
i want more help @jennijuju @magik6k

@magik6k
Copy link
Contributor Author

magik6k commented Aug 4, 2023

@gitaspick Your issue is not really related to this PR (other than in this PR we'll correctly stop retrying when there is no hope left for the sector). Please ask your question in the Filecoin Slack in #fil-lotus-help

@magik6k magik6k merged commit 8842466 into master Aug 7, 2023
@magik6k magik6k deleted the fix/fsm-commfail-loop branch August 7, 2023 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

proof validation failed, sector not found in sector set after cron
4 participants