-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exit with an error if mappings can’t be created for the chosen block #3242
Conversation
// If the user chooses an object mapping start block we don't have the data for, we can't | ||
// create mappings for it, so the node must exit with an error. | ||
let Some(last_archived_block) = client.block(last_archived_block_hash)? else { | ||
let mapping = if create_object_mappings { | ||
"mapping " | ||
} else { | ||
"" | ||
}; | ||
let error = format!( | ||
"Missing data for {mapping}block {last_archived_block_number} hash {last_archived_block_hash}" | ||
); | ||
return Err(sp_blockchain::Error::Application(error.into())); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about checking this in initialize_archiver
instead? It should be sufficient to query best_block_to_archive
there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn’t possible, because find_last_archived_block searches back through the segments until it finds a segment with a last archived block which is less than or equal to best_block_to_archive.
initialize_archiver doesn’t know about this logic, so it can’t check the actual last_archived_block_number found by find_last_archived_block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Achiving has the invariant that from the tip of the chain (best block) we can traverse back to some block that allows us to initialize archiver. By overriding best_block_to_archive
with block for which to create object mappings we constrain where to start.
Looking carefully we can see (best_block_to_archive..best_block_number)
iteration that ensures to step up closer to the tip to find the blocks that actually exist (right now tip-100 may not actually exist in the database if we use Snap sync and with future Snap sync we may have holes in the database).
One edge case that was problematic before was genesis block, which you handled in #3247, but that was actually not necessary strictly speaking since it is a special case of a more general problem. The same exact issue can happen if we prune blocks since Substrate still never prunes headers (Shamil is working on fixing this in paritytech/polkadot-sdk#6451), so quick check for block existence using header in above mentioned loop will succeed, while block body is already gone.
So to handle both genesis block case after Snap sync and this, all you need to do is to add additional check after the loop that block body exists. Since the only reason for it to not exist is incorrect user input for object mapping creation CLI option, we can return a corresponding error back. In all other cases node sync maintains mentioned invariants already, hence no corresponding error handling in find_last_archived_block
and why I'm insisting no changes need to be done there.
Moving override of best_block_to_archive
to after the loop breaks these invariants and simply moves its handling to a different place unnecessarily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, but I'm still concerned there's an edge case where the data for the chosen mapping block is present, but the data for the block at the end of the segment before the chosen block doesn't have data (which is what find_last_archived_block
checks and can panic on).
But maybe that will never happen, and if it does we'll find it in testing.
And the user can fix that by only choosing blocks at the end of segments, or by wiping their node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is actually one edge case here. We constrain pruning of state to 140k blocks because state pruning is decoupled from finalization (in contrast to pruning of block bodies), which means it is possible for block body to be present and state not be present anymore.
In practice we chose 140k blocks as something that is the most conservative case for 1 segment worth of empty blocks, in practice blocks are typically not empty (contain votes at least), so this will span more than 1 segment.
To handle this as well you may add another check for state by doing a runtime API call (might query object mapping with an empty list of extrinsics, just to see if it fails altogether), that will cover 100% of necessary checks to the best of my knowledge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in PR #3252
I’ve been working through the different cases here, it’s a bit trickier than I thought. Ideally, if we’re doing what the user said on the command line, we should handle it this way
In those two cases, we’d never get to the new error case in find_last_archived_block(), because the empty archiver case follows a different code path. And a block number in any other segment should just work, unless the block data has been pruned. In that case, we can either:
I’m not sure if we can do the check for the new error outside find_last_archived_block(), because we’re not starting with the data of the exact block number the user asked for on the command line. We’re starting with the last block in the last segment before that (and a few other conditions). |
This is already happening, the only exception if node is started with default
The same as above, it should already work correctly with
We certainly don't want to do any networking for the purposes of returning object mappings, just return an error. The only exception is if we have a fresh node, then we can take advantage of target block support that Shamil introduced for domain snap sync, but I wouldn't go there unless we have an explicit request for such a feature.
|
cc29d2b
to
87521ee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here’s where I got to today after checking the code in detail.
// If the user chooses an object mapping start block we don't have the data for, we can't | ||
// create mappings for it, so the node must exit with an error. | ||
let Some(last_archived_block) = client.block(last_archived_block_hash)? else { | ||
let mapping = if create_object_mappings { | ||
"mapping " | ||
} else { | ||
"" | ||
}; | ||
let error = format!( | ||
"Missing data for {mapping}block {last_archived_block_number} hash {last_archived_block_hash}" | ||
); | ||
return Err(sp_blockchain::Error::Application(error.into())); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn’t possible, because find_last_archived_block searches back through the segments until it finds a segment with a last archived block which is less than or equal to best_block_to_archive.
initialize_archiver doesn’t know about this logic, so it can’t check the actual last_archived_block_number found by find_last_archived_block.
87521ee
to
7ae4ff8
Compare
7ae4ff8
to
47c241b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a state check to handle, but overall this looks good to me, thanks for the patience
// There aren't any mappings in the genesis block, so starting there is pointless. | ||
// (And causes errors on restart, because genesis block data is never stored during snap sync.) | ||
best_block_to_archive = best_block_to_archive.min(block_number).max(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Block is already constrained to be non-zero in create_object_mappings
, so not sure how much this is actually needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it’s still possible due to the saturating subtraction, see my comments in this thread:
#3242 (comment)
I'll follow up with a PR for the state check.
// If the user chooses an object mapping start block we don't have the data for, we can't | ||
// create mappings for it, so the node must exit with an error. | ||
let Some(last_archived_block) = client.block(last_archived_block_hash)? else { | ||
let mapping = if create_object_mappings { | ||
"mapping " | ||
} else { | ||
"" | ||
}; | ||
let error = format!( | ||
"Missing data for {mapping}block {last_archived_block_number} hash {last_archived_block_hash}" | ||
); | ||
return Err(sp_blockchain::Error::Application(error.into())); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is actually one edge case here. We constrain pruning of state to 140k blocks because state pruning is decoupled from finalization (in contrast to pruning of block bodies), which means it is possible for block body to be present and state not be present anymore.
In practice we chose 140k blocks as something that is the most conservative case for 1 segment worth of empty blocks, in practice blocks are typically not empty (contain votes at least), so this will span more than 1 segment.
To handle this as well you may add another check for state by doing a runtime API call (might query object mapping with an empty list of extrinsics, just to see if it fails altogether), that will cover 100% of necessary checks to the best of my knowledge.
Replaces #3241.
If we can’t create mappings for the block chosen on the command line, we need to exit with an error.
Choosing the genesis block is useless because it has no mappings, and we often don’t have data for it in our store. So this PR changes mapping creation at the genesis block to start at block 1 instead.
Code contributor checklist: