Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add transcript events: init, snapshot save/load, shutdown #7484

Merged
merged 7 commits into from
Apr 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
45 changes: 45 additions & 0 deletions packages/SwingSet/docs/transcript.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Vat Transcripts

SwingSet records a "transcript" of each vat's activity, to support orthogonal persistence of the vat image. The transcript basically contains a record of each delivery to the vat, plus all syscalls made by the vat during that delivery, plus the overall results/consequences of the delivery.

The transcript is stored in the swing-store, in a component named `transcriptStore`.

Vat transcripts are broken up into segments which correspond to the various stages of the vat's lifecycle.

Vats are long-lived entities. Each vat has a specific creation point: static vats are created from `config.vats` during `initializeSwingSet()`, while dynamic vats are created when some existing ("parent") vat calls `E(vatAdminService).createVat()`. Vats survive until they make a vat-fatal error, request self-termination (`vatPowers.exitVat()`), or are terminated externally (`E(adminFacet).terminateWithFailure()`).

If the vat is upgraded at any point, this create-to-terminate lifespan is broken up into numbered "incarnations". The newly-created vat is running as "incarnation 0". The first upgrade will cause incarnation 0 to finish, and incarnation 1 to start. The vat retains durable storage between incarnations, but not virtual or ephemeral/RAM -hosted objects. Each new incarnation restarts the JS engine, from a (potentially) new vat bundle, with a call to `buildRootObject`.

Within a single incarnation, we break up the transcript into "spans", which are bounded by heap snapshot saves and reloads. Only the XS/xsnap worker supports heap snapshots, so other kinds of vats have a single span per incarnation. But for an xsnap worker, the kernel periodically instructs the worker to write its JS engine state into a snapshot blob, which is saved in the swing-store DB (in the `snapStore` component). The previous transcript span is ended, and a new one begun.

In parallel with spans, the kernel itself has a per-execution lifetime, which bounds the execution lifetimes of the workers. When the kernel is shut down (e.g. the host computer is rebooted, or the kernel process is terminated in anticipation of a software upgrade), all worker processes also shut down. These events are not part of consensus, and are not recorded in the transcript.

To support orthogonal persistence, each new execution of the kernel must be able to bring each worker process back up to its former state, maintaining the illusion of immortal worker processes. In the absence of heap snapshots, it does this by initializing a new worker, then replaying the entire contents of the current span of transcript entries. Since vat execution is deterministic, replaying the same deliveries will result in the same syscalls, and the same JS engine state.

If the worker *does* support snapshots, then we can bypass most of the replay process by starting from a heap snapshot instead of a blank slate. In this mode, we still replay the entire current span, however this span is much shorter, because all the deliveries before the heap snapshot was taken belong to some earlier span, and do not need to be replayed.

As a result, every span either starts with an `initialize-worker` event, or a `load-snapshot` event. Every non-current span ends with a `save-snapshot` event, or a `shutdown-worker` event.

## One Incarnation

Putting aside vat-upgrade for a moment, a vat which has experienced two snapshot events will have three transcript spans, like this:

![Single Incarnation Transcript](./images/transcript/image-1.jpg "Single Incarnation Transcript")

The current span is just the last two entries: the `load-snapshot` and the `baz` message. The kerneldb remembers that the most recent snapshot for this vat was recorded as `snapPos = 9`.

Note that the `startPos` bound for each span is inclusive, whereas `endPos` is exclusive. So when the first span uses `{ startPos: 0, endPos: 7}`, that really means `[0,7)` (in math notation), and contains exactly items `0,1,2,3,4,5,6`. The `endPos` of the current span could also be named `nextPosition`, holding the position number at which the next item will be added.

## Vat Upgrade

If we immediately upgrade the vat in the previous example, we'll start a new incarnation. This requires us to shut down the old worker (appending a final entry to the old incarnation's last span). Then we start both a new span and a new incarnation. The `initialize-worker` entry represents the creation of a brand new worker, using the new version's source code bundle. Then we send a `startVat` message into the worker, which allows the new `buildRootObject` to run, which is where all Kinds are redefined and the vat's upgrade code gets to run.

![Vat Upgrade Transcript](./images/transcript/image-2.jpg "Vat Upgrade Transcript")

At this point, the vat has no latest snapshot. If we restart here, the new worker will be created from scratch (the first event in the current span is `initialize-worker`).

## Post-Upgrade Snapshot

The vat will evolve further, and eventually enough deliveries will be made to provoke the creation of a heap snapshot. That will establish a "latest snapshot" for this vat, and start a new span. Just after a snapshot, the current span will be very short: just the single `load-snapshot` entry.

![Vat Upgrade Plus One Snapshot Transcript](./images/transcript/image-3.jpg "Vat Upgrade Plus One Snapshot Transcript")
8 changes: 4 additions & 4 deletions packages/SwingSet/misc-tools/extract-xs-snapshot.js
Original file line number Diff line number Diff line change
Expand Up @@ -34,13 +34,13 @@ if (!vatIDToExtract) {
const h = `all snapshots: pos hash compressed raw`;
console.warn(h);
for (const info of snapStore.listAllSnapshots()) {
const { vatID, inUse, endPos, hash } = info;
const { vatID, inUse, snapPos, hash } = info;
const name = namedVats.get(vatID) || '?';
const used = inUse ? 'used' : 'old';
const sVatID = vatID.padEnd(3);
const sName = name.padEnd(15);
const sUsed = used.padStart(4);
const sPos = endPos.toString().padStart(6);
const sPos = snapPos.toString().padStart(6);
const sHash = `${hash.slice(0, 10)}..`;
const sCompressed = info.compressedSize.toString().padStart(7);
const sRaw = info.uncompressedSize.toString().padStart(8);
Expand All @@ -50,10 +50,10 @@ if (!vatIDToExtract) {
}
} else {
const info = snapStore.getSnapshotInfo(vatIDToExtract);
const { endPos, hash } = info;
const { snapPos, hash } = info;
const write = async tmpFilePath => {
const snapshot = fs.readFileSync(tmpFilePath);
const fn = `${vatIDToExtract}-${endPos}-${hash}.xss`;
const fn = `${vatIDToExtract}-${snapPos}-${hash}.xss`;
fs.writeFileSync(fn, snapshot);
console.log(`wrote snapshot to ${fn}`);
};
Expand Down
4 changes: 2 additions & 2 deletions packages/SwingSet/misc-tools/replay-transcript.js
Original file line number Diff line number Diff line change
Expand Up @@ -232,8 +232,8 @@ async function replay(transcriptFile) {

if (argv.useCustomSnapStore) {
snapStore = /** @type {SnapStore} */ ({
async saveSnapshot(_vatID, endPos, saveRaw) {
const snapFile = `${vatID}-${endPos}-${
async saveSnapshot(_vatID, snapPos, saveRaw) {
const snapFile = `${vatID}-${snapPos}-${
saveSnapshotID || 'unknown'
}.xss`;
const { duration: rawSaveSeconds } = await measureSeconds(() =>
Expand Down
13 changes: 8 additions & 5 deletions packages/SwingSet/src/kernel/kernel.js
Original file line number Diff line number Diff line change
Expand Up @@ -373,7 +373,7 @@ export default function buildKernel(
* @typedef { {
* abort?: boolean, // changes should be discarded, not committed
* consumeMessage?: boolean, // discard the aborted delivery
* didDelivery?: boolean, // we made a delivery to a vat, for run policy
* didDelivery?: VatID, // we made a delivery to a vat, for run policy and save-snapshot
* computrons?: BigInt, // computron count for run policy
* meterID?: string, // deduct those computrons from a meter
* decrementReapCount?: { vatID: VatID }, // the reap counter should decrement
Expand Down Expand Up @@ -462,7 +462,7 @@ export default function buildKernel(
// TODO metering.allocate, some day

/** @type {CrankResults} */
const results = { didDelivery: true, computrons };
const results = { didDelivery: vatID, computrons };

if (meterID && computrons) {
results.meterID = meterID; // decrement meter when we're done
Expand Down Expand Up @@ -702,7 +702,7 @@ export default function buildKernel(
console.log('error during createDynamicVat', err);
const info = makeError(`${err}`);
const results = {
didDelivery: true, // ok, it failed, but we did spend the time
didDelivery: vatID, // ok, it failed, but we did spend the time
warner marked this conversation as resolved.
Show resolved Hide resolved
abort: true, // delete partial vat state
consumeMessage: true, // don't repeat createVat
terminate: { vatID, reject: true, info },
Expand Down Expand Up @@ -1266,8 +1266,11 @@ export default function buildKernel(
crankResults.consumeMessage ? 'deliver' : 'start',
);
} else {
// eslint-disable-next-line @jessie.js/no-nested-await
await vatWarehouse.maybeSaveSnapshot();
const vatID = crankResults.didDelivery;
if (vatID) {
// eslint-disable-next-line @jessie.js/no-nested-await
await vatWarehouse.maybeSaveSnapshot(vatID);
}
}
const { computrons, meterID } = crankResults;
if (computrons) {
Expand Down
82 changes: 67 additions & 15 deletions packages/SwingSet/src/kernel/state/vatKeeper.js
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,9 @@ import { enumeratePrefixedKeys } from './storageHelper.js';
* @typedef { import('../../types-external.js').TranscriptStore } TranscriptStore
* @typedef { import('../../types-internal.js').VatManager } VatManager
* @typedef { import('../../types-internal.js').RecordedVatOptions } RecordedVatOptions
* @typedef { import('../../types-external.js').TranscriptEntry } TranscriptEntry
* @typedef { import('../../types-internal.js').TranscriptEntry } TranscriptEntry
* @typedef {import('../../types-internal.js').TranscriptDeliverySaveSnapshot} TDSaveSnapshot
* @typedef {import('../../types-internal.js').TranscriptDeliveryLoadSnapshot} TDLoadSnapshot
*/

// makeVatKeeper is a pure function: all state is kept in the argument object
Expand Down Expand Up @@ -463,16 +465,28 @@ export function makeVatKeeper(
}
}

function transcriptSize() {
const bounds = transcriptStore.getCurrentSpanBounds(vatID);
const { startPos, endPos } = bounds;
return endPos - startPos;
}

/**
* Generator function to return the vat's transcript, one entry at a time.
*
* @param {number} [startPos] Optional position to begin reading from
warner marked this conversation as resolved.
Show resolved Hide resolved
* Generator function to return the vat's current-span transcript,
* one entry at a time.
*
* @yields { TranscriptEntry } a stream of transcript entries
* @yields { [number, TranscriptEntry] } a stream of deliveryNum and transcript entries
*/
function* getTranscript(startPos) {
for (const entry of transcriptStore.readSpan(vatID, startPos)) {
yield /** @type { TranscriptEntry } */ (JSON.parse(entry));
function* getTranscript() {
const bounds = transcriptStore.getCurrentSpanBounds(vatID);
let deliveryNum = bounds.startPos;
// readSpan() starts at startPos and ends just before endPos
for (const entry of transcriptStore.readSpan(vatID)) {
const te = /** @type { TranscriptEntry } */ (JSON.parse(entry));
/** @type { [number, TranscriptEntry]} */
const retval = [deliveryNum, te];
yield retval;
deliveryNum += 1;
}
}

Expand All @@ -498,42 +512,79 @@ export function makeVatKeeper(
function transcriptSnapshotStats() {
const totalEntries = getTranscriptEndPosition();
const snapshotInfo = getSnapshotInfo();
const snapshottedEntries = snapshotInfo ? snapshotInfo.endPos : 0;
const snapshottedEntries = snapshotInfo ? snapshotInfo.snapPos : 0;
return { totalEntries, snapshottedEntries };
}

/**
* @param {string} snapshotID
* @returns {TranscriptEntry}
*/
function makeSaveSnapshotItem(snapshotID) {
return {
d: /** @type {TDSaveSnapshot} */ ['save-snapshot'],
sc: [],
r: { status: 'ok', snapshotID },
};
}

/**
* @param {string} snapshotID
* @returns {TranscriptEntry}
*/
function makeLoadSnapshotItem(snapshotID) {
const loadConfig = { snapshotID };
return {
d: /** @type {TDLoadSnapshot} */ ['load-snapshot', loadConfig],
sc: [],
r: { status: 'ok' },
};
}

/**
* Store a snapshot, if given a snapStore.
*
* @param {VatManager} manager
* @returns {Promise<boolean>}
* @returns {Promise<void>}
*/
async function saveSnapshot(manager) {
if (!snapStore || !manager.makeSnapshot) {
return false;
return;
}

// tell the manager to save a heap snapshot to the snapStore
const endPosition = getTranscriptEndPosition();
const info = await manager.makeSnapshot(endPosition, snapStore);
transcriptStore.rolloverSpan(vatID);

const {
hash,
hash: snapshotID,
uncompressedSize,
rawSaveSeconds,
compressedSize,
compressSeconds,
} = info;

// push a save-snapshot transcript entry
addToTranscript(makeSaveSnapshotItem(snapshotID));

// then start a new transcript span
transcriptStore.rolloverSpan(vatID);

// then push a load-snapshot entry, so that the current span
// always starts with an initialize-worker or load-snapshot
// pseudo-delivery
addToTranscript(makeLoadSnapshotItem(snapshotID));

kernelSlog.write({
type: 'heap-snapshot-save',
vatID,
hash,
snapshotID,
uncompressedSize,
rawSaveSeconds,
compressedSize,
compressSeconds,
endPosition,
});
return true;
}

function deleteSnapshotsAndTranscript() {
Expand Down Expand Up @@ -611,6 +662,7 @@ export function makeVatKeeper(
hasCListEntry,
deleteCListEntry,
deleteCListEntriesForKernelSlots,
transcriptSize,
getTranscript,
transcriptSnapshotStats,
addToTranscript,
Expand Down
4 changes: 2 additions & 2 deletions packages/SwingSet/src/kernel/vat-loader/manager-helper.js
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ import {
/**
*
* @typedef { { getManager: (shutdown: () => Promise<void>,
* makeSnapshot?: (endPos: number, ss: SnapStore) => Promise<SnapshotResult>) => VatManager,
* makeSnapshot?: (snapPos: number, ss: SnapStore) => Promise<SnapshotResult>) => VatManager,
* syscallFromWorker: (vso: VatSyscallObject) => VatSyscallResult,
* setDeliverToWorker: (dtw: unknown) => void,
* } } ManagerKit
Expand Down Expand Up @@ -170,7 +170,7 @@ function makeManagerKit(retainSyscall = false) {
/**
*
* @param { () => Promise<void>} shutdown
* @param {(endPos: number, ss: SnapStore) => Promise<SnapshotResult>} [makeSnapshot]
* @param {(snapPos: number, ss: SnapStore) => Promise<SnapshotResult>} [makeSnapshot]
* @returns {VatManager}
*/
function getManager(shutdown, makeSnapshot) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -218,12 +218,12 @@ export function makeXsSubprocessFactory({
return worker.close().then(_ => undefined);
}
/**
* @param {number} endPos
* @param {number} snapPos
* @param {SnapStore} snapStore
* @returns {Promise<SnapshotResult>}
*/
function makeSnapshot(endPos, snapStore) {
return snapStore.saveSnapshot(vatID, endPos, fn => worker.snapshot(fn));
function makeSnapshot(snapPos, snapStore) {
return snapStore.saveSnapshot(vatID, snapPos, fn => worker.snapshot(fn));
}

return mk.getManager(shutdown, makeSnapshot);
Expand Down
Loading