Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Add PVF module documentation #6293

Merged
merged 9 commits into from
Nov 23, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion node/core/pvf/src/executor_intf.rs
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ pub fn prevalidate(code: &[u8]) -> Result<RuntimeBlob, sc_executor_common::error
}

/// Runs preparation on the given runtime blob. If successful, it returns a serialized compiled
/// artifact which can then be used to pass into [`execute`] after writing it to the disk.
/// artifact which can then be used to pass into `Executor::execute` after writing it to the disk.
pub fn prepare(blob: RuntimeBlob) -> Result<Vec<u8>, sc_executor_common::error::WasmError> {
sc_executor_wasmtime::prepare_runtime_artifact(blob, &CONFIG.semantics)
}
Expand Down
57 changes: 47 additions & 10 deletions node/core/pvf/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -18,16 +18,21 @@

//! A crate that implements PVF validation host.
//!
//! # Entrypoint
//!
//! This crate provides a simple API. You first [`start`] the validation host, which gives you the
//! [handle][`ValidationHost`] and the future you need to poll.
//!
//! Then using the handle the client can send two types of requests:
//! Then using the handle the client can send three types of requests:
//!
//! (a) PVF pre-checking. This takes the PVF [code][`Pvf`] and tries to prepare it (verify and
bkontur marked this conversation as resolved.
Show resolved Hide resolved
//! compile) in order to pre-check its validity.
//!
//! (a) PVF execution. This accepts the PVF [`params`][`polkadot_parachain::primitives::ValidationParams`]
//! (b) PVF execution. This accepts the PVF [`params`][`polkadot_parachain::primitives::ValidationParams`]
//! and the PVF [code][`Pvf`], prepares (verifies and compiles) the code, and then executes PVF
//! with the `params`.
//!
//! (b) Heads up. This request allows to signal that the given PVF may be needed soon and that it
//! (c) Heads up. This request allows to signal that the given PVF may be needed soon and that it
//! should be prepared for execution.
//!
//! The preparation results are cached for some time after they either used or was signaled in heads up.
Expand All @@ -39,27 +44,57 @@
//! PVF execution requests can specify the [priority][`Priority`] with which the given request should
//! be handled. Different priority levels have different effects. This is discussed below.
//!
//! Preparation started by a heads up signal always starts in with the background priority. If there
//! Preparation started by a heads up signal always starts with the background priority. If there
//! is already a request for that PVF preparation under way the priority is inherited. If after heads
//! up, a new PVF execution request comes in with a higher priority, then the original task's priority
//! will be adjusted to match the new one if it's larger.
//!
//! Priority can never go down, only up.
//!
//! # Mitigating disputes
//!
//! ## Retrying execution requests
//!
//! If the execution request fails during **preparation**, we will retry if it is possible that the
//! preparation error was transient (i.e. it was of type [`PrepareError::Panic`],
//! [`PrepareError::TimedOut`], or [`PrepareError::DidNotMakeIt`]). We will only retry preparation
//! if another requests comes in after 15 minutes, to ensure any potential transient conditions had
//! time to be resolved. We will retry up to 5 times. See `can_retry_prepare_after_failure`.
//!
//! If the actual **execution** of the artifact fails, we will retry once if it was an
//! [`InvalidCandidate::AmbiguousWorkerDeath`] error, after a 1 second delay to allow any potential
//! transient conditions to clear. This occurs outside this module, in the Candidate Validation
//! subsystem.
//!
//! ## Preparation timeouts
//!
//! We use a timeout for preparation to limit the amount of time it can take. As the time for
//! preparation can vary depending on the machine and load on the machine, this can potentially lead
//! to disputes where some validators are able to execute a PVF and others aren't.
//!
//! One mitigation we have in place is a more lenient timeout for preparation during execution than
//! during pre-checking. The rationale is that the PVF has already passed pre-checking, so we know
//! it should be valid, and we allow it to take longer than expected as this is likely due to an
//! issue with the machine and not the PVF.
//!
//! # Under the hood
//!
//! ## The flow
//!
//! Under the hood, the validation host is built using a bunch of communicating processes, not
//! dissimilar to actors. Each of such "processes" is a future task that contains an event loop that
//! processes incoming messages, potentially delegating sub-tasks to other "processes".
//!
//! Two of these processes are queues. The first one is for preparation jobs and the second one is for
//! execution. Both of the queues are backed by separate pools of workers of different kind.
//!
//! Preparation workers handle preparation requests by preverifying and instrumenting PVF wasm code,
//! Preparation workers handle preparation requests by prevalidating and instrumenting PVF wasm code,
//! and then passing it into the compiler, to prepare the artifact.
//!
//! Artifact is a final product of preparation. If the preparation succeeded, then the artifact will
//! contain the compiled code usable for quick execution by a worker later on.
//! ## Artifacts
//!
//! An artifact is the final product of preparation. If the preparation succeeded, then the artifact
//! will contain the compiled code usable for quick execution by a worker later on.
//!
//! If the preparation failed, then the worker will still write the artifact with the error message.
//! We save the artifact with the error so that we don't try to prepare the artifacts that are broken
Expand All @@ -68,12 +103,14 @@
//! The artifact is saved on disk and is also tracked by an in memory table. This in memory table
//! doesn't contain the artifact contents though, only a flag that the given artifact is compiled.
//!
//! A pruning task will run at a fixed interval of time. This task will remove all artifacts that
//! weren't used or received a heads up signal for a while.
//!
//! ## Execution
//!
//! The execute workers will be fed by the requests from the execution queue, which is basically a
//! combination of a path to the compiled artifact and the
//! [`params`][`polkadot_parachain::primitives::ValidationParams`].
//!
//! Each fixed interval of time a pruning task will run. This task will remove all artifacts that
//! weren't used or received a heads up signal for a while.

mod artifacts;
mod error;
Expand Down
2 changes: 1 addition & 1 deletion node/core/pvf/src/priority.rs
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ pub enum Priority {
Normal,
/// This priority is used for requests that are required to be processed as soon as possible.
///
/// For example, backing is on critical path and require execution as soon as possible.
/// For example, backing is on a critical path and requires execution as soon as possible.
Critical,
}

Expand Down
1 change: 0 additions & 1 deletion roadmap/implementers-guide/src/glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,4 +48,3 @@ exactly one downward message queue.
Also of use is the [Substrate Glossary](https://substrate.dev/docs/en/knowledgebase/getting-started/glossary).

[0]: https://wiki.polkadot.network/docs/learn-consensus
[1]: #pvf