Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore AlreadyInChain error in the syncer #2890

Merged
merged 6 commits into from
Oct 20, 2021
Merged

Conversation

conradoplg
Copy link
Collaborator

Motivation

Everytime there is an error from the block downloader and verifier, the sync is restarted. There is an exception for already verified blocks, but a "already in chain" error can also occur that should not cause a restart. This makes sync slower.

Specifications

Designs

Solution

  • Change the block downloader and verifier to return a specific error type
  • Change the sync to match the error type to check for exceptions
  • Move check to a separate function since it's used twice

I've attempted improve the code by checking the error type instead of the error string. However there's a bit of risk involved, see comment in code.

Closes #2883

Review

Anyone can review.

To see if it works, you can start a not-fully-synced node with trace level for zebrad and search the logs. It should not contain something like this:

tail -f log2.txt | grep 'error downloading and verifying block'
Oct 15 12:18:14.366  WARN {zebrad="d88e44f" net="Main"}:sync: zebrad::components::sync: error downloading and verifying block e=Invalid(Block(Commit("block is already committed to the state")))
Oct 15 12:20:26.865  WARN {zebrad="d88e44f" net="Main"}:sync: zebrad::components::sync: error downloading and verifying block e=Invalid(Block(Commit("block is already committed to the state")))
Oct 15 12:23:44.745  WARN {zebrad="d88e44f" net="Main"}:sync: zebrad::components::sync: error downloading and verifying block e=Invalid(Block(Commit("block is already committed to the state")))

If the log contains block is already in chain then this positively confirm this works. However I couldn't make it happen again (it seems easier if the state it's a bit more out of sync). I'm currently waiting some hours to start the node and check again.

Reviewer Checklist

  • Code implements Specs and Designs
  • Tests for Expected Behaviour
  • Tests for Errors

Follow Up Work

Copy link
Contributor

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - and thanks for using explicit error types.

I have some suggestions to improve the reliability of this fix.
(And a suggestion to improve diagnostics.)

I've also added the new missing_docs to ticket #453.

zebrad/src/components/sync/downloads.rs Show resolved Hide resolved
zebrad/src/components/sync.rs Show resolved Hide resolved
zebrad/src/components/sync.rs Show resolved Hide resolved
zebrad/src/components/sync/downloads.rs Outdated Show resolved Hide resolved
@conradoplg
Copy link
Collaborator Author

I've had a single instance of this error

Oct 18 14:46:54.444  WARN {zebrad="23b0360" net="Main"}:sync: zebrad::components::sync: error downloading and verifying block e=Invalid(Block(Commit("block is already committed to the state")))

This will be a bit tricky to filter since I'd need to enumerate the Commit errors (currently it's just a BoxError) in order to match this specific one, unless it's fine to ignore all Commit errors (I guess not). Let me know if we should leave that to a separate PR or not.

@teor2345
Copy link
Contributor

teor2345 commented Oct 19, 2021

I've had a single instance of this error

Oct 18 14:46:54.444  WARN {zebrad="23b0360" net="Main"}:sync: zebrad::components::sync: error downloading and verifying block e=Invalid(Block(Commit("block is already committed to the state")))

This will be a bit tricky to filter since I'd need to enumerate the Commit errors (currently it's just a BoxError) in order to match this specific one, unless it's fine to ignore all Commit errors (I guess not). Let me know if we should leave that to a separate PR or not.

I suggest you convert the BoxError to a string and use string matching.
(And open a ticket for a proper fix that converts all the commit errors to enum variants.)

Copy link
Collaborator Author

@conradoplg conradoplg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is useful for everyone so it would be good to merge it soon, so I created tickets for the remaining issues (as suggested)

I suggest you convert the BoxError to a string and use string matching. (And open a ticket for a proper fix that converts all the commit errors to enum variants.)

Done in c01c0d9 and #2908

zebrad/src/components/sync/downloads.rs Show resolved Hide resolved
Copy link
Contributor

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is better than the previous code, and we have tickets to follow up the remaining issues.

Thanks for diagnosing and fixing this!

@teor2345 teor2345 merged commit 84f2c07 into main Oct 20, 2021
@teor2345 teor2345 deleted the ignore-duplicate-block-error branch October 20, 2021 01:07
@teor2345 teor2345 mentioned this pull request Oct 24, 2021
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ignore duplicate block AlreadyInChain errors in the syncer
2 participants