Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Fix the runtime-version choice. #11209

Conversation

RGafiyatullin
Copy link
Contributor

Adresses: paritytech/polkadot-sdk#64

When called in context of a block, the runtime-version, defined before that block's computation, is used; with a little exception for the cases when the context is the genesis-block. (example for such a call)

When running in a constrained pruning mode (i.e. when a limited quantity of state-db snapshots is available) such an invocation will be available for all the non-pruned blocks except for the last one.

@RGafiyatullin RGafiyatullin added A0-please_review Pull request needs code review. B3-apinoteworthy C1-low PR touches the given topic and has a low impact on builders. D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit labels Apr 12, 2022
@skunert
Copy link
Contributor

skunert commented Apr 13, 2022

Basically this is Solution #3 from the issue. I am not sure I understand this comment:

When running in a constrained pruning mode (i.e. when a limited quantity of state-db snapshots is available) such an invocation will be available for all the non-pruned blocks except for the last one.

If I call the runtime on some non-pruned block, why are we guaranteed to have the runtime from the previous block available and not pruned?

@wigy-opensource-developer
Copy link
Contributor

If I call the runtime on some non-pruned block, why are we guaranteed to have the runtime from the previous block available and not pruned?

There is no guarantee for that at all.

@skunert
Copy link
Contributor

skunert commented Apr 13, 2022

If I call the runtime on some non-pruned block, why are we guaranteed to have the runtime from the previous block available and not pruned?

There is no guarantee for that at all.

I think I misinterpreted the comment. I read "last one" as "best block", but it is the opposite, the first available non-pruned block. Got it 👍 . So we would accept that for these blocks you just can not call.

client/state-db/src/lib.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@andresilva andresilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments. In general there are two things that we need to think about the current approach:

  • We will need to make the distinction between block construction and general runtime calls, to make sure we take the :code from the correct block
  • We might need to make some changes to pruning, of the top of my head I was thinking that we might want to make sure we keep around the state of the parent of the last finalized block (currently we'll also prune on finality regardless of window iirc)

Solution 4 from the original will also require that we do the first point above (i.e. distinguish between block execution and other calls), since we need to pick the code from different storage entries, but it would not require any changes to pruning.

fn state_with_runtime_code(
&self,
block_id: BlockId<Block>,
) -> sp_blockchain::Result<<B as backend::Backend<Block>>::State> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed earlier I think here we should fetch the header first (either by block number or block hash, depending on the block id that's passed), and then we fetch the parent hash from it. The assumption we make is that the caller of this method will make the right decision on whether to use a block number or a hash.

Comment on lines +174 to +177
let state_data = self.backend.state_at(*at)?;
let state_code = self.state_with_runtime_code(*at)?;

let state_runtime_code = sp_state_machine::backend::BackendRuntimeCode::new(&state_code);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This executor will be used in both the context of general runtime calls as well as block construction (correct me if I'm wrong here @bkchr). We will need to somehow make that distinction because in the case of block construction we don't want to use the logic of picking the code from the parent, otherwise we'd end up in a situation where: we want to build block N, therefore we use the state from N-1 and the code from N-2. In the case of block construction we basically want to keep the logic as is, i.e. to create block N we already use the code from block N-1 (since N doesn't even exist in the first place).

@RGafiyatullin
Copy link
Contributor Author

Disclaimer

I might have slightly oversimplified the things, but nonetheless the following is how I understand it.

@skunert , @wigy-opensource-developer , please feel free to correct me if I am criminally wrong.

Preface

When a block is computed — it yields:

  • its hash;
  • the State (conceptually — the State-trie, technically — the State-trie's root node's hash).

Once a block is canonicalised, the StateDb's "Pruning RefWindow" is notified.
The "Pruning RefWindow" appends that Block to the queue (aptly named death_rows).
The overall length of death_rows is kept under the specified max_blocks.

That as I understand it gives us "sort of guarantee" (a bit oxymoronic, hence the quotes) that

  • the states produced by the computation of non-canonicalised blocks are not pruned;
  • the states produced by the computation of the N last canonicalised blocks are not pruned.

The corner case here is when the death_rows queue is in fact shorter than the max_blocks:

  • the whole history is shorter (the whole chain has just been started);
  • the node has just restarted with the value for max_blocks larger than it used to run before the restart.

Option # 3

There is an existing API sc-client-api::Backend<B>::state_at(&self, block_id: BlockId<B>), it returns the State that is available as the result of the computation of block block_id.

So in order to
a) compute that block again (CallExecutor<B>::prove_execution(...));
b) query the data of that state (CallExecutor<B>::call(...));
c) ... erm... (CallExecutor::contextual_call(...) — please help me out with this one, what is that?)

we would need to execute the version of the runtime available just before that block was computed, i.e. state_at(block_id - 1).get(WellKnownKeys::CODE).

What states can be queried?

In essence all states to which both of the following is true:
a) the state with the data is not pruned;
b) the state with the runtime is not pruned.

That is (for all max_blocks > 0):

  • for every non-canonicalised block;
  • for almost every non-pruned block (except for the oldest one).

@bkchr
Copy link
Member

bkchr commented Apr 13, 2022

Disclaimer, I have written this before @RGafiyatullin and @andresilva commented, so this may contains duplicate information. Just posting it for posterity.

I'm personally not convinced that using the wasm blob from the parent block is the best way to move forward (yes I know that I proposed it initially). When we do it that way we will need to change the pruning to keep one more block in a hidden way inside the unpruned blocks to always have the parent wasm blob around.
We also will need to "tag" the block production and block import to use the wasm blob from block X and not X - 1. Otherwise we would use the wrong wasm blob. Imagine we applied a runtime upgrade at block X with a migration. This migration needs to run while building/importing block X + 1. Building/importing a block is always done on the state of the parent block from which we currently take the runtime blob as well. With your changes we would take the runtime blob from the parent block of the parent of the block we are trying to build/import. If we would always have done it this way, it would have been no problem. However, now we need to stay backwards compatible and this means when you applied a runtime upgrade at block X, it needs to be used in block X + 1. (I hope you get what I want to say, if not please ask)

If we go with proposal 4, we would also need to annotate block production/import to switch to the new wasm file.

@wigy-opensource-developer
Copy link
Contributor

Correct me if I am mistaken, but we cannot stay completely backward compatible for each of the following cases:

  1. block importing
  2. RPC runtime calls
  3. block authoring

I really like the simplicity of this solution opposed to the really overengineered idea I had that also would allow multi-block migrations. If the dead_row keeps the state of a few finalized blocks alive in storage, I really do not see the issue with this direction.

@wigy-opensource-developer
Copy link
Contributor

wigy-opensource-developer commented Apr 13, 2022

let block_num_code = if block_num.is_zero() { block_num } else { block_num - One::one() };
let block_id_code = BlockId::<Block>::Number(block_num_code);
let state_code = self.backend.state_at(block_id_code)?;

Also, @RGafiyatullin, I think André tried to get you navigate here using the previous block hash instead of just decreasing the block number. That implementation would be fork-aware.

@bkchr
Copy link
Member

bkchr commented Apr 13, 2022

Correct me if I am mistaken, but we cannot stay completely backward compatible for each of the following cases:

1. block importing

2. RPC runtime calls

3. block authoring

These are basically all kind of interactions with the runtime ;)

We will need to stay backwards compatible anyway, because otherwise we can not import all the old blocks.

@RGafiyatullin
Copy link
Contributor Author

@bkchr

My changes are based upon the assumption that the following is true:

pub trait Backend<Block: BlockT>: AuxStore + Send + Sync {
	// >8 >8 >8 >8 >8 >8 >8 >8 >8 >8 >8 >8

         /// Returns state backend with post-state of given block.
	fn state_at(&self, block: BlockId<Block>) -> sp_blockchain::Result<Self::State>;

Which gave me the following:

  • block N is calculated based on the following inputs:
    • code taken from state_at(N - 1).get(CODE) ;
    • state yielded by the execution of block N - 1, i.e. state_at(N - 1);
  • the State resulting from the execution of Block N contains the data compatible with the code kept in state_at(N - 1).get(CODE).

With your changes we would take the runtime blob from the parent block of the parent of the block we are trying to build/import.

Sorry, I do not get how that is true.

@RGafiyatullin
Copy link
Contributor Author

In that if block N - 1 invoked system::set_code (or has set the state-db value under the key ":code" directly), then this block's post-state (i.e. backend.state_at(N - 1)) contains the runtime to compute block N.

@RGafiyatullin
Copy link
Contributor Author

@bkchr please disregard the above... I traced a little bit more, I misunderstood what the input argument at: BlockId<B> for CallExecutor's method meant.

@RGafiyatullin
Copy link
Contributor Author

I found a critical mistake in the very base of the reasoning that stands behind that PR-draft, therefore I will close it.

I will draw several doodles reflecting my up-to-date vision of the problem and post it in paritytech/polkadot-sdk#64 .

@bkchr
Copy link
Member

bkchr commented Apr 13, 2022

Ty @RGafiyatullin

@RGafiyatullin RGafiyatullin deleted the rgafiyatullin-substrate-9997 branch April 13, 2022 17:40
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. C1-low PR touches the given topic and has a low impact on builders. D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants