-
Notifications
You must be signed in to change notification settings - Fork 20.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible error re withdrawals empty or missing in rlp-decoded body #26647
Comments
It seems to be ok. The rlp optional in geth behaves as I had expected, either
two empty lists ( txs, receipts) for blocks that are lacking withdrawals (pre-shanghai), or
Three empty lists, if the withdrawals are present but empty. No ambiguity on the rlp level. |
Go unit test for the decoder, also confirms that the rlp decoding in geth looks okay func TestWithdrawalEncoding(t *testing.T) {
check := func(f string, got, want interface{}) {
if !reflect.DeepEqual(got, want) {
t.Errorf("%s mismatch: got %v, want %v", f, got, want)
}
}
header := &Header{
WithdrawalsHash: &EmptyRootHash,
}
withdrawals := make([]*Withdrawal, 0)
block := NewBlock(header, nil, nil, nil, nil).WithWithdrawals(withdrawals)
encRLP, err := rlp.EncodeToBytes(block)
if err != nil {
t.Fatal(err)
}
var decoded Block
if err := rlp.DecodeBytes(encRLP, &decoded); err != nil {
t.Fatal("decode error: ", err)
}
check("withdrawalsHash", decoded.header.WithdrawalsHash, header.WithdrawalsHash)
if decoded.withdrawals == nil {
panic("asdf")
}
} |
I added some logging and indeed it seems to be a header with the correct hash, and a block body with
the code change: +++ b/core/block_validator.go
@@ -71,7 +71,7 @@ func (v *BlockValidator) ValidateBody(block *types.Block) error {
if header.WithdrawalsHash != nil {
// Withdrawals list must be present in body after Shanghai.
if block.Withdrawals() == nil {
- return fmt.Errorf("missing withdrawals in block body")
+ return fmt.Errorf("missing withdrawals in block body. withdrawals hash is %x, header hash is %x", header.WithdrawalsHash, header.Hash())
}
if hash := types.DeriveSha(block.Withdrawals(), trie.NewStackTrie(nil)); hash != *header.WithdrawalsHash {
return fmt.Errorf("withdrawals root hash mismatch (header value %x, calculated %x)", *header.WithdrawalsHash, hash) I also enabled debug logs to see which client was sending the faulty block body. Just after the error occurred I had only 1 connected peer:
And just prior to that I see a request to peer
And right back at node start up the peer identification is
Maybe the issue isn't actually a bug in Geth's serialisation/deserialisation but in the handling of dodgy data from misconfigured/malicious peers. I tried to poke around to find what validation is applied to block bodies after fetching headers, and AFAICT it only happens in the go-ethereum/core/blockchain_insert.go Line 124 in 77380b9
At that point, it seems the downloader expects bodies to be valid and won't recover if go-ethereum/eth/downloader/downloader.go Lines 1553 to 1555 in 77380b9
I went looking for other body validation and found the logic in go-ethereum/eth/downloader/queue.go Lines 783 to 788 in 77380b9
All in all I'm thoroughly confused about what's going on, but can consistently reproduce this issue syncing Lighthouse + Geth on Zhejiang. It just takes about 20-30 minutes each time after nuking Geth's DB and re-syncing it. Someone more experienced with Geth would probably have an easier time debugging it. |
@michaelsproul I think Mario has found the issue to be in EthereumJS. One problem is that both [] and nil hash to the same emptyRootHash, so we compute the header hash correctly even if the withdrawals are wrong. One fix that we can do is to manually set the withdrawals list to [] if we have an emptyRootHash in the header (and the withdrawals are nil). This would fix it on our side. Either way ethereumjs needs to fix their encoding |
Fixed by #26675 |
Received via discord from @michaelsproul. CC @fjl
I think I may have identified a small Geth bug. I'm not sure though because my Go-fu is terrible so I haven't even tried to look at the code.
Out of the box, you can't sync Lighthouse+Geth on Zhejiang at the moment. Several users reported this, and I reproduced it just now.
The error that Geth reports is
(full error: https://gist.github.com/michaelsproul/9082b5763f7cb90044a646f10aefeb38)
My current guess is that Geth is deserializing withdrawals: nil instead of withdrawals: [] when decoding blocks from devp2p. The block hash check passes (because nil and [] are RLP-equivalent, right?) but then Geth realises that there are no withdrawals when there should be based on the timestamp. So a valid block hash gets marked invalid and the chain breaks.
Nodes following the chain since genesis won't have had this issue because they'll have got a JSON execution payload straight from the CL. The reason it happens more often with Lighthouse is that we have an optimisation that skips the newPayload message while syncing, so we don't drip feed every payload to the EL (forcing the EL to download its own payloads). This feature can be turned off with --disable-optimistic-finalized-sync, and indeed syncing Lighthouse-Geth with this flag doesn't trigger the issue.
The text was updated successfully, but these errors were encountered: