Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wallet: don't panic, retry instead #870

Merged
merged 2 commits into from
May 30, 2023
Merged

Conversation

guggero
Copy link
Collaborator

@guggero guggero commented May 25, 2023

Replaces #840.

Fixes #827.

Main diff vs. #840:

  • Don't expose the sync retry interval to the user level config, choose a sane default instead (5 seconds sounds reasonable to me for most cases)
  • Make all changes non-breaking to existing API users (add new OpenWithRetry() function to wallet, add functional options to loader)

cc @JoeGruffins, @chappjc, @Roasbeef

JoeGruffins and others added 2 commits May 25, 2023 12:58
Unexpected panics are a problem for consumers. Remove them. If the
birthday does not look correct, log the error and continue with a nil
birthday. If there is a problem with initial sync, log and try again
later. Add a configurable wait interval duration for the new sync loop.
@guggero guggero mentioned this pull request May 25, 2023
@@ -106,17 +141,23 @@ func (w *Wallet) handleChainNotifications() {
birthdayBlock, err := birthdaySanityCheck(
chainClient, birthdayStore,
)
if err != nil && !waddrmgr.IsError(err, waddrmgr.ErrBirthdayBlockNotSet) {
panic(fmt.Errorf("unable to sanity "+
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we able to reliably produce this error? I think it was left unresolved for so long as it seemed to just fix itself on restart.

I think the common case was something like moving from a full node backend to neutrino or the other way around?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to address things like lightningnetwork/lnd#7449 and lightningnetwork/lnd#4818. Yes, those are usually resolved with a restart of lnd, but re-trying instead of panicking does sound like the better approach.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I understand what we're trying to fix. I'm curious if we have a way to reliably reproduce it in a test environment, so we can directly test this patch. At a glance should do what it says on the tin, but whenever I tried I wasn't able to hit the issue reliably.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, I misread.

Yes, I was able to reproduce this quite easily: Sync lnd to a local regtest network, mine a couple of blocks. Then stop everything and wipe test regtest block chain but keep the lnd state (forcing the wallet to be synced to a different chain). On next startup lnd will panic with panic: unable to synchronize wallet to chain: -8: Block height out of range.

So I can see how this might also happen on mainnet depending on the error bitcoind returns.

With the patch it doesn't panic anymore but re-try (which in this example won't ever succeed, but on mainnet if it's just a temporary startup error might help):

2023-05-30 13:49:51.466 [INF] LTND: Waiting for chain backend to finish sync, start_height=119
2023-05-30 13:49:56.462 [ERR] LNWL: Unable to synchronize wallet to chain, trying again in 5s: -8: Block height out of range
2023-05-30 13:50:01.463 [ERR] LNWL: Unable to synchronize wallet to chain, trying again in 5s: -8: Block height out of range
2023-05-30 13:50:06.464 [ERR] LNWL: Unable to synchronize wallet to chain, trying again in 5s: -8: Block height out of range
2023-05-30 13:50:11.466 [ERR] LNWL: Unable to synchronize wallet to chain, trying again in 5s: -8: Block height out of range

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know about the birthday block error, but there were a few ways to hit the syncWithChain panic listed in #827 (comment).

I often hit the couldn't retrieve block <hash> from network panic if I try to rescan/reseed a wallet using SPV. Briefly interrupting your machine's network to force a peer drop when a block is requested can do it, depending on timing.

@guggero guggero requested a review from Roasbeef May 26, 2023 08:01
Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👑

@Roasbeef Roasbeef merged commit 2f8238d into btcsuite:master May 30, 2023
@guggero guggero deleted the no-panic-retry branch May 30, 2023 21:29
buck54321 pushed a commit to buck54321/btcwallet that referenced this pull request Nov 27, 2023
From btcwallet btcsuite#870 from guggero
buck54321 pushed a commit to buck54321/btcwallet that referenced this pull request Nov 27, 2023
Taken from btcwallet btcsuite#870 by guggero
ukane-philemon pushed a commit to ukane-philemon/ltcwallet that referenced this pull request Feb 27, 2024
From btcwallet btcsuite#870 from guggero
buck54321 pushed a commit to buck54321/btcwallet that referenced this pull request Apr 21, 2024
buck54321 pushed a commit to buck54321/btcwallet that referenced this pull request Apr 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

chainntfns: Panics during initial sync blow up consumers.
4 participants