Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get latest block from zcashd #348

Closed
wants to merge 1 commit into from

Conversation

adityapk00
Copy link
Contributor

I propose that we get the latest block from zcashd instead of from the cache.

This fixes an issue where in some rare cases, the value returned from GetLightDInfo doesn't match what GetLatestBlock returns if the cache is slightly behind.

Also, If the cache is behind or if the server is just starting, we should ask zcashd for the latest block height, because when the clients request for a block and we don't have it, we go to zcashd anyway.

@LarryRuane
Copy link
Collaborator

I'd like to understand better the motivation for this change. Is this strictly a performance improvement? Or does this fix incorrect behavior? I would guess this isn't a performance-only improvement, because, as you mentioned, it happens only rarely. If it fixes a behavioral problem, then I don't think it will actually fix anything, because even with this change, the result of GetLightdInfo can differ from the result of GetLatestBlock anyway. It will just be rarer still. But wallets (lightwalletd clients) must be tolerant of these two gRPCs returning different results.

I'm not aware if this was intentional, but GetLightdInfo always queries zcashd to generate its results, whereas GetLatestBlock does not, it should have been named GetLatestCachedBlock (or more accurately yet, GetLatestCachedHeight since it returns only a height, not a block).

It may be helpful to understand that lightwalletd does something perhaps unexpected: Every few seconds, it forgets the latest block. If it knows about block 10 as the latest block, then it polls (every couple of seconds) for a block at height 11. Usually, the reply is "no such block" (until block 11 is mined). What I was concerned about while coding this is: What happens if a one-block reorg occurs, replacing block 10? Simply polling for block 11 will continue to return "no such block" -- with no indication that a reorg has occurred. Only when a new block 11 is mined (which takes, on average, 75 seconds but can be much longer) will lightwalletd notice (after fetching this version of block 11) that its prev-hash doesn't refer to block 10, so it will begin to back up, looking for the fork point, and then move forward along the new chain.

This delay (in recognizing that a reorg has occurred) seemed to me to be much too long.

The simplest, most bullet-proof way to solve this problem is to completely forget about the tip every few seconds. If block 10 is the tip, lightwalletd suddenly thinks 9 is the tip, and immediately request 10, and, if there's been a reorg, it will get the new version of 10. (If no reorg, it will get the version of 10 that it forgot, no harm done.) One side-effect of this approach is that every few seconds, there's a small window during which GetLatestBlock will return 9 instead of 10 -- because that really is the latest block in the cache.

At first I thought, that's very bad! But then I realized that wallets have to be able to deal with this situation anyway. There could be a real reorg that makes 9 the tip, at least for an instant, while we're fetching the new version of 10 from zcashd.

And here's a situation that's even more subtle: Suppose 10 is the tip, and a reorg happens with 5 as the fork point, and there's a new 6 and a new 7, and that's the new tip. It's true that the better branch will almost always be at least as long as the old (abandoned) branch, but "best chain" isn't based on length, it's based on total work. So it's possible that the new best chain has 7 as its tip. If we simply keep asking for 11, we'll have to wait for 8, 9, 10, and 11 to be mined, before we realize there's been a reorg.

The way lightwalletd works today in this (admittedly unlikely) scenario is that (as I said) it will forget about 10 every few seconds (think 9 is the tip), so it will re-request 10, get "no such block", so immediately request 9, then 8, getting "no such block" to those requests, then 7, and get the new block 7, but notice its prev-hash doesn't refer to the 6 we know about, so it will request 6, and will see that its prev-hash does indeed refer to the 5 we know about, so it will keep that 6, then move ahead in the normal way, requesting 7, and then finally 8 but will get a "no such block" and now we're back to our normal synced state. So, during this time, GetLatestBlock could return a height as low as 6! The wallets must be able to handle this.

@gmale @pacu

@LarryRuane
Copy link
Collaborator

By the way, it seems to me that the wallets must implement an approach equivalent to this for the same reason. If the wallet polls for new blocks by requesting the latest height, or requesting the block at what they think is the tip height plus one, then it won't know there's been a reorg until quite some time has passed.

I'm sorry I didn't think of this before now, but it seems like we could improve this by adding a new gRPC that takes as its argument the hash of the block that the wallet considers to be the tip, and the gRPC will return the child of that hash, or "no such block" if it hasn't been mined yet. Both of those are sunny-day cases. But, let's take the previous example of a one-block reorg. If the wallet thinks the tip is height 10, the wallet would make a gRPC call passing its hash, and lightwalletd would notice that has doesn't match the hash of what it considers to be block 10 (or that block doesn't exist at all), so it would return an error that says, in effect, "the hash you sent me doesn't correspond to a block I know about". So then the wallet can immediately request block 9 by sending its hash, and lightwalletd would reply with the new (post-reorg) version of block 10.

So, to summarize, there should be a form of getblock gRPC that takes a block hash, and returns one of three ways:

  • the block whose prev-hash matches the argument (that is, the child of the block with that hash, we're still syncing)
  • the error "this is the tip" (or "no such child") -- this would mean the wallet is synced
  • the error "unknown hash" -- this would mean there's been a reorg, wallet should begin looking backwards for the branch point

This same change could be implemented in zcashd (new RPC) so that lightwalletd could also use this algorithm, which would be more efficient. But it works as is, so it's probably not worth it. It is worth it in the interface between the wallet and lightwalletd, because bandwidth is at a premium.

@pacu
Copy link
Contributor

pacu commented Jul 16, 2021

Thanks for this great explanation @LarryRuane. I think it's very important to have it surfaced somewhere. As you mentioned this is probably the same logic than the one needed for chain reorgs, but probably people don't expect the lightclient server to do this 'forget' logic on its own.

@LarryRuane
Copy link
Collaborator

Having worked on #358, I think I now see the motivation for this PR better. As I was testing #358, I noticed that the GetMempoolStream() gRPC returned early (when there wasn't a new block), far more often than I would have expected. This doesn't break anything but definitely is inefficient.

But also, I noticed there's a way to fix this problem without changing the zcashd RPC interface at all. There already exists a getbestblockhash RPC, so what the ingestor loop should do is, instead of requesting the block at the next height every few seconds, it should do something like this (Golang-ish pseudo-code, may not be quite right):

loop {
    hash = zcashd_getbestblockhash()
    latest = blocks[-1]
    if latest.hash == hash {
        // synced
        sleep(2 seconds)
        continue
    }
    // may be a new block
    block, error = zcashd_getblock(latest.height + 1)
    if error == ok {
        // have a later block (we may be syncing)
        if block.prevhash == latest.hash {
            // extend the chain ("push" or append to the list of blocks)
            blocks = append(blocks, block)
            continue
        }
    }
    // this is a reorg, back up
    blocks = blocks[0:-1] // "pop" off the latest block, back up one
}

This way we'll detect a reorg within a few seconds even if a one-block (or any length) reorg changes the tip. It won't be necessary to "forget" the latest block as is currently done, and we won't have this problem of GetLatestBlock backing up one block for a small window every few seconds.

I don't know exactly how the various wallets detect a reorg, but we could implement the same functionality in the gRPC interface -- either add a GetBestBlockHash gRPC, or what I mentioned above, a form of GetBlock gRPC that takes a block hash and returns the next (child) block (or one of two errors, and correction on that, the "unknown hash" should really be "not part of the best chain").

@adityapk00
Copy link
Contributor Author

This is very cool. I think detecting 1-block reorgs is very important, and I think Zecwallet has not been doing that properly. I'm 100% in favor of @LarryRuane 's comment above - LightwalletD should detect the 1-block reorg, and clients should too.

Thanks for looking into this and fixing it!

@defuse
Copy link
Collaborator

defuse commented Jul 27, 2021

@LarryRuane the pseudocode above looks good to me. The "this is a reorg, back up" part should probably go inside the if error == ok { ... } so that if there's some kind of transient rpc connection error it won't delete all of the blocks.

@LarryRuane
Copy link
Collaborator

Closing in favor of #363 (see #348 (comment) above).

@LarryRuane LarryRuane closed this Sep 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants