-
Notifications
You must be signed in to change notification settings - Fork 606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wallet: don't panic, retry instead #870
Conversation
Unexpected panics are a problem for consumers. Remove them. If the birthday does not look correct, log the error and continue with a nil birthday. If there is a problem with initial sync, log and try again later. Add a configurable wait interval duration for the new sync loop.
@@ -106,17 +141,23 @@ func (w *Wallet) handleChainNotifications() { | |||
birthdayBlock, err := birthdaySanityCheck( | |||
chainClient, birthdayStore, | |||
) | |||
if err != nil && !waddrmgr.IsError(err, waddrmgr.ErrBirthdayBlockNotSet) { | |||
panic(fmt.Errorf("unable to sanity "+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we able to reliably produce this error? I think it was left unresolved for so long as it seemed to just fix itself on restart.
I think the common case was something like moving from a full node backend to neutrino or the other way around?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to address things like lightningnetwork/lnd#7449 and lightningnetwork/lnd#4818. Yes, those are usually resolved with a restart of lnd
, but re-trying instead of panicking does sound like the better approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, also lightningnetwork/lnd#6292.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I understand what we're trying to fix. I'm curious if we have a way to reliably reproduce it in a test environment, so we can directly test this patch. At a glance should do what it says on the tin, but whenever I tried I wasn't able to hit the issue reliably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry, I misread.
Yes, I was able to reproduce this quite easily: Sync lnd
to a local regtest network, mine a couple of blocks. Then stop everything and wipe test regtest block chain but keep the lnd
state (forcing the wallet to be synced to a different chain). On next startup lnd
will panic with panic: unable to synchronize wallet to chain: -8: Block height out of range
.
So I can see how this might also happen on mainnet depending on the error bitcoind
returns.
With the patch it doesn't panic anymore but re-try (which in this example won't ever succeed, but on mainnet if it's just a temporary startup error might help):
2023-05-30 13:49:51.466 [INF] LTND: Waiting for chain backend to finish sync, start_height=119
2023-05-30 13:49:56.462 [ERR] LNWL: Unable to synchronize wallet to chain, trying again in 5s: -8: Block height out of range
2023-05-30 13:50:01.463 [ERR] LNWL: Unable to synchronize wallet to chain, trying again in 5s: -8: Block height out of range
2023-05-30 13:50:06.464 [ERR] LNWL: Unable to synchronize wallet to chain, trying again in 5s: -8: Block height out of range
2023-05-30 13:50:11.466 [ERR] LNWL: Unable to synchronize wallet to chain, trying again in 5s: -8: Block height out of range
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know about the birthday block error, but there were a few ways to hit the syncWithChain
panic listed in #827 (comment).
I often hit the couldn't retrieve block <hash> from network
panic if I try to rescan/reseed a wallet using SPV. Briefly interrupting your machine's network to force a peer drop when a block is requested can do it, depending on timing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👑
From btcwallet btcsuite#870 from guggero
Taken from btcwallet btcsuite#870 by guggero
From btcwallet btcsuite#870 from guggero
wallet: don't panic, retry instead
wallet: don't panic, retry instead
Replaces #840.
Fixes #827.
Main diff vs. #840:
OpenWithRetry()
function to wallet, add functional options to loader)cc @JoeGruffins, @chappjc, @Roasbeef