-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Tracking issue: Pre-checking for PVF Compilation time #3211
Comments
It's tricky.. I've changed my tune somewhat.. In my mind, approval checking had three advantages: First, we avoid all validators checking parathread code upgrades, but an alternative would simply be making parathread code upgrades expensive. Second we're already building a approvals and disputes procedure, so this seems simplest, but yes important differences exist. We'll also want new validators to join, and for some validators to churn between relay chains when we move towards multiple relay chains, which both require examining our assumptions about the build artifacts caching, so lazy builds sounded nice, but maybe this sucks. Approval checking would require disputes and slashing since parachains could repeatedly register invalid code until one passed approval checks, which then enables attacks. Is this worth the risk if everyone needs to build the code eventually anyways and validators all have the storage to cache the artifact? It's trick but maybe no, so an approval checking design might start parachain code upgrade blocks already escalated. As for parathreads, we'll maybe have more confidence in build times but the time we build parathreads, but maybe we just make them pay more for upgrades. We've accepted a model where parachain code gets referenced by hash, so multiple code blobs exist simultaneously already. If parachain code upgrades are special blocks that do not otherwise interact with XCMP, etc., then we need not roll back invalid code upgrades and could merely invalidate the code upgrade. Yes, this sounds like a big difference from the usual inclusion pipeline. At some level I think this boils down primarily to slashing: Are we happy with slashing and forensics for build failures? If no, then we need escalated build checking, which no longer looks much like approvals. |
We might leverage the approval's logic for grandpa though: If a parachain posts a code block then we transport the code to all validators, perhaps via availability and reconstructing, and then all validators build the code. We then block finality upon build results similarly to blocking finality upon approval votes. We do not currently know about finality on the relay chain though, so I guess this buys us nothing really. |
Right now, most of operations that sign stuff in polkadot protocol are handled by a very convenient tool - `Signed`. However `Signed` assumes that whatever is signed is anchored to some `parent_hash` which works for most cases, but does not work for others. One instance of such a case is pre-checking (#3211). There validators submit signed votes on-chain. A vote is valid for the entire session. If we were to use `Signed` we would have to root a vote in some block of that session and during vote verification check that this block is indeed within the session. This is especially annoying since we agreed to use unsigned extrinsics to submit votes and we need to make the unsigned extrinsic validation as slim as possible. (FWIW, the definition of a pre-checking vote can be seen in the next diff in the stack) That's the reason why we opted-out from using `Signed` for pre-checking and decided to go with the manual signing approach. Almost every piece of machinery is in place except for signing which is presented in this PR.
This is an insubstantial PR that just unlocks PRs down the line. This PR is a part of #3211. Regarding the `PvfCheckStatement` struct itself: this is a structure that will be used to convert from/into the binary representation and ultimately will be used to sign and submit votes onto chain.
Right now, most of operations that sign stuff in polkadot protocol are handled by a very convenient tool - `Signed`. However `Signed` assumes that whatever is signed is anchored to some `parent_hash` which works for most cases, but does not work for others. One instance of such a case is pre-checking (#3211). There validators submit signed votes on-chain. A vote is valid for the entire session. If we were to use `Signed` we would have to root a vote in some block of that session and during vote verification check that this block is indeed within the session. This is especially annoying since we agreed to use unsigned extrinsics to submit votes and we need to make the unsigned extrinsic validation as slim as possible. (FWIW, the definition of a pre-checking vote can be seen in the next diff in the stack) That's the reason why we opted-out from using `Signed` for pre-checking and decided to go with the manual signing approach. Almost every piece of machinery is in place except for signing which is presented in this PR.
This is an insubstantial PR that just unlocks PRs down the line. This PR is a part of #3211. Regarding the `PvfCheckStatement` struct itself: this is a structure that will be used to convert from/into the binary representation and ultimately will be used to sign and submit votes onto chain.
This PR is a part of #3211. This PR prepares ground for the following runtime changes required for PVF pre-checking. Specifically, we do several changes here: 1. We remove `validation_code_at` and `validation_code_hash_at`. Those functions are not used. They were added in the early days with intent to use it later but turned out that we do not need them. 2. We replace `validation_code_hash_at` with just `current_code_hash` for the case of inclusion and candidate checking. 3. We also replace `last_code_upgrade` with a direct query into `FutureCodeHash` and `UpgradeRestrictionSignal`. Those in conjunction should replace the logic that was used for allowing/disallowing upgrades. This requires special attention of the reviewers. 4. Then we remove the machinery required to support those queries. Specifically the code related to `UseCodeAt`. We do not need it since we do not answer the historical queries. However, we still leave all the data on-chain. At some point we may clean it up, but that would be needed to be done with a dedicated migration which can be done as follow-up. 5. Some now irrelevant tests were removed and/or adapted.
Right now, most of operations that sign stuff in polkadot protocol are handled by a very convenient tool - `Signed`. However `Signed` assumes that whatever is signed is anchored to some `parent_hash` which works for most cases, but does not work for others. One instance of such a case is pre-checking (#3211). There validators submit signed votes on-chain. A vote is valid for the entire session. If we were to use `Signed` we would have to root a vote in some block of that session and during vote verification check that this block is indeed within the session. This is especially annoying since we agreed to use unsigned extrinsics to submit votes and we need to make the unsigned extrinsic validation as slim as possible. (FWIW, the definition of a pre-checking vote can be seen in the next diff in the stack) That's the reason why we opted-out from using `Signed` for pre-checking and decided to go with the manual signing approach. Almost every piece of machinery is in place except for signing which is presented in this PR.
* pvf-precheck: Add `sign` in subsystem-util Right now, most of operations that sign stuff in polkadot protocol are handled by a very convenient tool - `Signed`. However `Signed` assumes that whatever is signed is anchored to some `parent_hash` which works for most cases, but does not work for others. One instance of such a case is pre-checking (#3211). There validators submit signed votes on-chain. A vote is valid for the entire session. If we were to use `Signed` we would have to root a vote in some block of that session and during vote verification check that this block is indeed within the session. This is especially annoying since we agreed to use unsigned extrinsics to submit votes and we need to make the unsigned extrinsic validation as slim as possible. (FWIW, the definition of a pre-checking vote can be seen in the next diff in the stack) That's the reason why we opted-out from using `Signed` for pre-checking and decided to go with the manual signing approach. Almost every piece of machinery is in place except for signing which is presented in this PR. * pvf-precheck: Add `PvfCheckStatement` to polkadot-primitives This is an insubstantial PR that just unlocks PRs down the line. This PR is a part of #3211. Regarding the `PvfCheckStatement` struct itself: this is a structure that will be used to convert from/into the binary representation and ultimately will be used to sign and submit votes onto chain.
This commit implements the last major piece of #3211: the subsystem that tracks PVFs that require voting, issues pre-check requests to candidate-validation and makes sure that the votes are submitted to the chain. TODO: - [ ] Run once more with polkadot-launch
This commit implements the last major piece of #3211: the subsystem that tracks PVFs that require voting, issues pre-check requests to candidate-validation and makes sure that the votes are submitted to the chain. TODO: - [ ] Run once more with polkadot-launch
This commit implements the last major piece of #3211: the subsystem that tracks PVFs that require voting, issues pre-check requests to candidate-validation and makes sure that the votes are submitted to the chain. TODO: - [ ] Run once more with polkadot-launch
This commit implements the last major piece of #3211: the subsystem that tracks PVFs that require voting, issues pre-check requests to candidate-validation and makes sure that the votes are submitted to the chain. TODO: - [ ] Run once more with polkadot-launch
This commit implements the last major piece of #3211: the subsystem that tracks PVFs that require voting, issues pre-check requests to candidate-validation and makes sure that the votes are submitted to the chain. TODO: - [ ] Run once more with polkadot-launch
* pvf-precheck: Integrate PVF pre-checking into paras module Closes #4009 This is the most of the runtime-side change needed for #3211. Here is how it works. The PVF pre-checking can be triggered either by an upgrade or by onboarding (i.e. calling `schedule_para_initialize`). The PVF pre-checking process is identified by the PVF code hash that is being voted on. If there is already PVF pre-checking process running, then no new PVF pre-checking process will be started. Instead, we just subscribe to the existing one. If there is no PVF pre-checking process running but the PVF code hash was already saved in the storage, that necessarily means (I invite the reviewers to double-check this invariant) that the PVF already passed pre-checking. This is equivalent to instant approving of the PVF. The pre-checking process can be concluded either by obtaining a supermajority or if it expires. Each validator checks the list of PVFs available for voting. The vote is binary, i.e. accept or reject a given PVF. As soon as the supermajority of votes are collected for one of the sides of the vote, the voting is concluded in that direction and the effects of the voting are enacted. Only validators from the active set can participate in the vote. The set of active validators can change each session. That's why we reset the votes each session. A voting that observed a certain number of sessions will be rejected. The effects of the PVF accepting depend on the operations requested it: 1. All onboardings subscribed to the approved PVF pre-checking process will get scheduled and after passing 2 session boundaries they will be onboarded. 2. All upgrades subscribed to the approved PVF pre-checking process will get scheduled very similarly to the existing process. Upgrades with pre-checking are really the same process that is just delayed by the time required for pre-checking voting. In case of instant approval the mechanism is exactly the same. This is important from parachains compatibility standpoint since following the delayed upgrade requires the parachain to implement paritytech/cumulus#517. In case, PVF pre-checking process was concluded with rejection, then all the requesting operations get cancelled. For onboarding it means it gets without movement: the lifecycle of such parachain is terminated on the `Onboarding` state and after rejection the lifecycle is none. That in turn means that the caller can attempt registering the parachain once more. For upgrading it means that the upgrade process is aborted: that flashes go-ahead signal with `Abort` flag. Rejection leads to removing the allegedly bad validation code from the chain storage. Among other things, this implies that the operation can be re-requested. That allows for retrying an operation in case there was some bug. At the same time it does not look as a DoS vector due to the caching performed by the nodes. PVF pre-checking can be enabled and disabled. Initially, according to the changes in #4420, this mechanism is disabled. Triggering the PVF pre-checking when it is disabled just means that we insta approve the requesting operation. This should lead to the behavior being unchanged. Follow-ups: - expose runtime APIs * cargo run --quiet --release --features=runtime-benchmarks -- benchmark --chain=polkadot-dev --steps=50 --repeat=20 --pallet=runtime_parachains::paras --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --header=./file_header.txt --output=./runtime/polkadot/src/weights/runtime_parachains_paras.rs * cargo run --quiet --release --features=runtime-benchmarks -- benchmark --chain=westend-dev --steps=50 --repeat=20 --pallet=runtime_parachains::paras --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --header=./file_header.txt --output=./runtime/westend/src/weights/runtime_parachains_paras.rs * cargo run --quiet --release --features=runtime-benchmarks -- benchmark --chain=kusama-dev --steps=50 --repeat=20 --pallet=runtime_parachains::paras --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --header=./file_header.txt --output=./runtime/kusama/src/weights/runtime_parachains_paras.rs * cargo run --quiet --release --features runtime-benchmarks -- benchmark --chain=rococo-dev --steps=50 --repeat=20 --pallet=runtime_parachains::paras --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --header=./file_header.txt --output=./runtime/rococo/src/weights/runtime_parachains_paras.rs * Review fixes Co-authored-by: Parity Bot <[email protected]>
This commit implements the last major piece of #3211: the subsystem that tracks PVFs that require voting, issues pre-check requests to candidate-validation and makes sure that the votes are submitted to the chain. TODO: - [ ] Run once more with polkadot-launch
This commit implements the last major piece of #3211: the subsystem that tracks PVFs that require voting, issues pre-check requests to candidate-validation and makes sure that the votes are submitted to the chain.
This commit implements the last major piece of #3211: the subsystem that tracks PVFs that require voting, issues pre-check requests to candidate-validation and makes sure that the votes are submitted to the chain.
This commit implements the last major piece of #3211: the subsystem that tracks PVFs that require voting, issues pre-check requests to candidate-validation and makes sure that the votes are submitted to the chain.
This commit implements the last major piece of #3211: the subsystem that tracks PVFs that require voting, issues pre-check requests to candidate-validation and makes sure that the votes are submitted to the chain.
This commit implements the last major piece of paritytech#3211: the subsystem that tracks PVFs that require voting, issues pre-check requests to candidate-validation and makes sure that the votes are submitted to the chain.
This PR is a part of paritytech/polkadot#3211. This PR prepares ground for the following runtime changes required for PVF pre-checking. Specifically, we do several changes here: 1. We remove `validation_code_at` and `validation_code_hash_at`. Those functions are not used. They were added in the early days with intent to use it later but turned out that we do not need them. 2. We replace `validation_code_hash_at` with just `current_code_hash` for the case of inclusion and candidate checking. 3. We also replace `last_code_upgrade` with a direct query into `FutureCodeHash` and `UpgradeRestrictionSignal`. Those in conjunction should replace the logic that was used for allowing/disallowing upgrades. This requires special attention of the reviewers. 4. Then we remove the machinery required to support those queries. Specifically the code related to `UseCodeAt`. We do not need it since we do not answer the historical queries. However, we still leave all the data on-chain. At some point we may clean it up, but that would be needed to be done with a dedicated migration which can be done as follow-up. 5. Some now irrelevant tests were removed and/or adapted.
@slumber There are a few check boxes left, can you make sure they are taken care of please. Thank you! |
This was my intention right after async backing work on the cumulus side, no worries. |
We are going to use this issue for tracking progress on the PVF compilation time pre-checking.
pvf_checking_enabled
configuration #7138)pvf_checking_enabled
configuration #7138)pvf_checking_enabled
parameter #7396)expected_at
from the inclusion #4601 (paras: count upgrade delay from inclusion #7486)Original Description
Problem
If compilation of a Parachain Validation Function (PVF) takes too long or uses too much memory, this can leave a node in limbo as to whether a candidate of that parachain is valid or not.
The amount of time that a PVF takes to compile is a subjective resource limit and as such PVFs may be maliciously crafted so that there is e.g. a 50/50 split of validators which can and cannot compile and execute the PVF.
This has the following implications:
As a result of this issue we need a fairly hard guarantee that the PVFs of registered parachains/threads can be compiled within a reasonable amount of time.
Solution
To have all validators pre-check that the amount of time the validation code takes to compile is under some reasonable limit. Before a parathread can be registered by users, and before a PVF upgrade can be applied by any parathread or chain, the validation code must be signed off on by 2/3 of the current validator set as taking less than some fixed timeout, specified on-chain.
Validators should constantly observe the chain to learn about PVFs which are pending approval, and should acquire and compile these PVFs. They then submit an unsigned transaction in either the affirmative or negative about the PVF. PVFs which don't receive a supermajority of validators attesting to low time taken to compile, within a certain amount of blocks, are discarded.
When compiling PVFs outside of this check, a much more permissive timeout should be used, and as such the artifact will eventually be compiled and approval checks, dispute votes, etc. will be issued, leading to no stall of the relay chain or of finality.
As a downside, this does mean that every change to the PVF requires additional work to be done by every validator at the time of submission, which means that parathread registration and upgrades should be more expensive / more limited.
Alternative Solution: Approval Checking
Alternatively, we could have approval checking check that:
a) the Wasm blob produced by any candidate compiles within a reasonable amount of time
b) any new Wasm blob registered on-chain by a transaction compiles within a reasonable amount of time
However, there are issues with this solution.
For these reasons, the initial pre-check should be used.
The text was updated successfully, but these errors were encountered: