Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky F3 itests #12519

Closed
rvagg opened this issue Sep 27, 2024 · 7 comments · Fixed by #12597
Closed

Flaky F3 itests #12519

rvagg opened this issue Sep 27, 2024 · 7 comments · Fixed by #12597
Assignees
Labels

Comments

@rvagg
Copy link
Member

rvagg commented Sep 27, 2024

Happening with some frequency, here's some recent examples:

@rvagg
Copy link
Member Author

rvagg commented Oct 1, 2024

🤞

@rvagg rvagg closed this as completed Oct 1, 2024
@github-project-automation github-project-automation bot moved this from 📌 Triage to 🎉 Done in FilOz Oct 1, 2024
@masih
Copy link
Member

masih commented Oct 1, 2024

Reopening as it seem to continue to fail intermittently, though at a much lower failure rate. Example

@masih masih reopened this Oct 1, 2024
@github-project-automation github-project-automation bot moved this from 🎉 Done to 📌 Triage in FilOz Oct 1, 2024
@Stebalien
Copy link
Member

Ok, I think this is caused by #12557 and should be "fixed" on the latest master because we now forcibly align "catchup" instances to 1/2 block time.

@masih
Copy link
Member

masih commented Oct 8, 2024

Still flakes here

@rjan90 rjan90 moved this from 📌 Triage to 🐱 Todo in FilOz Oct 8, 2024
@Stebalien
Copy link
Member

That one looks like a timeout that's just slightly too short.

@Stebalien
Copy link
Member

Hm. No, it just never made progress after bootstrapping.

@Stebalien
Copy link
Member

Ok, I did some digging and I'm not sure if F3 is to blame here (or the lotus integration side of it might be to blame?). See https://github.com/filecoin-project/lotus/actions/runs/11306342327/job/31446846079?pr=12591

I'm seeing "badger blockstore closed" errors 6 seconds into the test when the timeout is 20s. Then I see the test failures 15 seconds later.

I had originally assumed that the "badger blockstore closed" errors were due to shutting down the node because the test failed, but they're happening way too early for that.

Ideas:

  1. The blockstore is closing itself (unlikely?).
  2. Something is failing on start (e.g., some part of F3) causing the node to start shutting down.
  3. The splitstore is doing something.

@github-project-automation github-project-automation bot moved this from 🐱 Todo to 🎉 Done in FilOz Oct 17, 2024
@rjan90 rjan90 moved this from 🎉 Done to ☑️ Done (Archive) in FilOz Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: ☑️ Done (Archive)
Development

Successfully merging a pull request may close this issue.

3 participants