-
Notifications
You must be signed in to change notification settings - Fork 20.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic sync fails after transient error syncing to Freezer DB with "missing parent" error #22112
Comments
Maybe this is instigated by containerization/k8s? Are you using kubectl proxy by any chance? |
No, we are not. In general, there is no failure to sync blocks when adding new nodes to an existing chain (even one which is > 90k blocks). So direct and ongoing push to the freezer DB works just fine. The |
Well if you're hosting a node, you have to sync (which requires downloading things off the Ethereum blockchain). So a failure in the underlying filesystem could instigate an error in syncing. So if the filesystem is broken/there are issues with it, then a subsequent recovery would probably error out as well, right? |
I mentioned in my initial summary that we are running a private chain |
This is the root of you problems:
Apparently, this failure used the I suspect that the |
Btw, you're running a fairly old release (20 releases old). I think it's probably time to upgrade before trying again :) |
Oh, I just saw that you reported this against Whatever happened on I'll close this, recommend that you use a more recent version of geth, and if you experience a similar issue again, please file a new ticket. |
Oh and I just saw that @karalabe already commented just that :/ |
System information
Geth version:
1.9.7
OS & Version: Linux (Kubernetes on Azure) - running
clique/POA
Commit hash : stable
Expected behaviour
After a transient failure to write blocks to freezer the node restarts (as expected), and should be able to sync with it's peers and catch up to the rest of the chain.
Actual behaviour
Node truncates freezer to a very old block (due to a missing full block) and when it receives a batch of blocks from a peer, it is unable to sync those blocks with a
missing parent
error. As seen in the logs below, the node experiences a temporary failure to flush entries to the freezer. On a restart, it finds that:150571
240572
33605
This is a
non-archive
node, but even then it is strange that the most recent full block should be so far behind. This results in the freezer getting truncated and the node attempts to sync the most recent blocks. However, that action fails because of amissing parent
error on a batch of 26 most recent blocks. The same error keeps repeated across a rolling window of blocks. Eventually, we were able to recover the node only after performing a fresh sync (wiping all chain data).Steps to reproduce the behaviour
We do not know what causes the original issue with flushing contents to freezer DB. However, if one could re-create such a failure via a unit test, it should be pretty straight forward to see the rest of the failure.
Backtrace
None
The text was updated successfully, but these errors were encountered: