-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bankhash mismatch running #34623 against mainnet #34876
Comments
e2c2029ac4974687f38ec711fa67512fbd710386
against mainnethttps://github.com/solana-labs/solana/pull/34623
against mainnet
https://github.com/solana-labs/solana/pull/34623
against mainnet
243107999 matches 243108000 mismatches |
And you have confirmed it with and without the commit you linked above, 6a9f729 ? |
bad
good
|
It seems on 243108000 the epoch_accounts_hash mismatched, which could cause the bank hash mismatch? |
Gotcha, maybe we need to bisect then. We seemingly don't have a last known good commit, but I also would have expected canaries to hit this if the issue was too long ago. And just making sure, you were running 6a9f729 with no other modifications ? |
@brooksprumo My understanding is that If epoch_account_hash_mismatched, then at the bank when we look for epoch_accounts_hash (i.e. 3/4 of the epoch??), we will get a mismatch at the bank hash? Is that right? |
The epoch accounts hash is a (full) accounts hash calculation taken at the rooted slot 1/4 into the epoch. That value is saved into accounts-db and then hashed into the bank (when freezing) that's 3/4 into the epoch. If the EAH values are different, that would imply an account hash calculation mismatch. And yes, if the EAH values are different, that will cause the bank hashes to be different. |
Here is the snippet where the EAH gets mixed into the bank hash as I was looking up for my own understanding as well: Lines 6963 to 7004 in 9db4e84
|
Yeah. 243108000 is at the 75% of epoch.
|
Hmm yeah, it looks like the only thing that differs here are the epoch_accounts_hash and capitalization. The fact that you node did not diverge previously would suggest that the account that caused the EAH to diverge did NOT appear as part of any bank hashes recently. So, I don't think replaying the slot will give us any useful information. Rather, I think we would have to examine each account to find the offending one. |
Also the EAH is only part of the "bank frozen" log line for the one1 bank that's including the EAH. Footnotes
|
Yeah, that's what I would think too. The accounts hash calculation happens way after the bank hash is calculated (the eah start slot), so replaying the slot won't include anything about the EAH. We occasionally see accounts hash mismatches on snapshots, and it hasn't been reproducible before. Often theorized to be a HW issue, esp disk related. |
Note that I see this at startup when unpacking the snapshot:
|
yeah. I see that too.
I am working on a fix for this. |
I think this is due to the reward PDA account was created in the previous epoch when run my node with partitioned rewards enabled (#34809). That PR is incomplete. It only patches the bank-hash to ignore the PDA accounts. but it didn't take care of epoch_hash and bank lamport adjustment. Therefore, we fail at the slot when we include epoch hash into the bank hash. I have pushed fixes for this just now. |
So to confirm, your believe that an account from a PR that has not landed yet altered your account state and caused your node to diverge ? And the fixes were pushed to your PR? |
Yes, that's correct. All these are specific to my node, which was running with |
Close the issue since this is only specific related to my node. |
Problem
My node running a recent commit (6a9f729) from master against mainnet crashed with bankhash mismatch on slot
243108000
(https://explorer.solana.com/block/243108000).Not sure if any of our canary testing nodes catch this error too?
Proposed Solution
Not sure yet, but I will investigate.
The text was updated successfully, but these errors were encountered: