Skip to content

Commit

Permalink
add transcript events: init, snapshot save/load, shutdown
Browse files Browse the repository at this point in the history
This introduces four new pseudo-delivery events to the transcript:

* 'initialize-worker': a new empty worker is created
* 'load-snapshot': a worker is loaded from heap snapshot
* 'save-snapshot': we tell the worker to write a heap snapshot
* 'shutdown-worker': we stop the worker (e.g. during upgrade)

These events are not actually delivered to the worker: they are not
VatDeliveryObjects. However many of them are implemented with commands
to the worker (just not `deliver()` commands). The vat-warehouse
records these events in the transcript to help
subsequent (manual/external) replay tools know what happened. Without
them, we'd need to deduce e.g. the heap-snapshot writing schedule by
counting deliveries and comparing them against
snapshotInitial/snapshotInterval .

The 'save-snapshot'/'load-snapshot' pair indicates what a replay would
do. It does not mean that the vat-warehouse actually tore down the old
worker and immediately replaced it with a new one (from snapshot). It
might choose to do that, or the worker itself might choose to replace
its XS engine instance with a fresh one, or it might keep using the
old engine. The 'save-snapshot' command has side-effects (it does a
forced GC), so it is important to keep track of when it happened.

The transcript is broken up into "spans", delimited by heap snapshots
or upgrade-related shutdowns. To bring a worker up to date, we want to
start a worker (either a blank one, or from a snapshot), and then
replay the "current span".

With this change, the current span always starts either with
'initialize-worker' or with 'load-snapshot', telling us exactly what
needs to be done. The span then contains all the deliveries that must
be replayed. The current span will never include a 'save-snapshot' or
'shutdown-worker': the span is closed immediately after those events
are added, so replay will never see them. But a tool which replays a
historical span will see them at the end.

The types were improved to make `TranscriptDelivery` be a superset of
`VatDeliveryObject`. We also record TranscriptDeliveryResult, which is
currently a stripped down subset of VatDeliveryResult (just the "ok"
status), except that save-snapshot includes the snapshot hash in its
results. In the future, we'll probably record the deterministic subset
of metering results (computrons, maybe something about memory
allocation).

refs #7199
refs #6770
  • Loading branch information
warner committed Apr 23, 2023
1 parent 18a0516 commit 6a9b91d
Show file tree
Hide file tree
Showing 10 changed files with 533 additions and 85 deletions.
13 changes: 8 additions & 5 deletions packages/SwingSet/src/kernel/kernel.js
Original file line number Diff line number Diff line change
Expand Up @@ -372,7 +372,7 @@ export default function buildKernel(
* @typedef { {
* abort?: boolean, // changes should be discarded, not committed
* consumeMessage?: boolean, // discard the aborted delivery
* didDelivery?: boolean, // we made a delivery to a vat, for run policy
* didDelivery?: VatID, // we made a delivery to a vat, for run policy and save-snapshot
* computrons?: BigInt, // computron count for run policy
* meterID?: string, // deduct those computrons from a meter
* decrementReapCount?: { vatID: VatID }, // the reap counter should decrement
Expand Down Expand Up @@ -461,7 +461,7 @@ export default function buildKernel(
// TODO metering.allocate, some day

/** @type {CrankResults} */
const results = { didDelivery: true, computrons };
const results = { didDelivery: vatID, computrons };

if (meterID && computrons) {
results.meterID = meterID; // decrement meter when we're done
Expand Down Expand Up @@ -701,7 +701,7 @@ export default function buildKernel(
console.log('error during createDynamicVat', err);
const info = makeError(`${err}`);
const results = {
didDelivery: true, // ok, it failed, but we did spend the time
didDelivery: vatID, // ok, it failed, but we did spend the time
abort: true, // delete partial vat state
consumeMessage: true, // don't repeat createVat
terminate: { vatID, reject: true, info },
Expand Down Expand Up @@ -1267,8 +1267,11 @@ export default function buildKernel(
crankResults.consumeMessage ? 'deliver' : 'start',
);
} else {
// eslint-disable-next-line @jessie.js/no-nested-await
await vatWarehouse.maybeSaveSnapshot();
const vatID = crankResults.didDelivery;
if (vatID) {
// eslint-disable-next-line @jessie.js/no-nested-await
await vatWarehouse.maybeSaveSnapshot(vatID);
}
}
const { computrons, meterID } = crankResults;
if (computrons) {
Expand Down
58 changes: 47 additions & 11 deletions packages/SwingSet/src/kernel/state/vatKeeper.js
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,9 @@ import { enumeratePrefixedKeys } from './storageHelper.js';
* @typedef { import('../../types-internal.js').VatManager } VatManager
* @typedef { import('../../types-internal.js').RecordedVatOptions } RecordedVatOptions
* @typedef { import('../../types-external.js').TranscriptEntry } TranscriptEntry
* @typedef {import('../../types-external.js').TranscriptDeliverySaveSnapshot} TDSaveSnapshot
* @typedef {import('../../types-external.js').TranscriptDeliverySaveSnapshotResults} TDSaveSnapshotResults
* @typedef {import('../../types-external.js').TranscriptDeliveryLoadSnapshot} TDLoadSnapshot
*/

// makeVatKeeper is a pure function: all state is kept in the argument object
Expand Down Expand Up @@ -469,16 +472,28 @@ export function makeVatKeeper(
}
}

function transcriptSize() {
const bounds = transcriptStore.getCurrentSpanBounds(vatID);
const { startPos, endPos } = bounds;
return endPos - startPos;
}

/**
* Generator function to return the vat's transcript, one entry at a time.
* Generator function to return the vat's current-span transcript,
* one entry at a time.
*
* @param {number} [startPos] Optional position to begin reading from
*
* @yields { TranscriptEntry } a stream of transcript entries
* @yields { [number, TranscriptEntry] } a stream of deliveryNum and transcript entries
*/
function* getTranscript(startPos) {
for (const entry of transcriptStore.readSpan(vatID, startPos)) {
yield /** @type { TranscriptEntry } */ (JSON.parse(entry));
function* getTranscript() {
const bounds = transcriptStore.getCurrentSpanBounds(vatID);
let deliveryNum = bounds.startPos;
// readSpan() starts at startPos and ends just before endPos
for (const entry of transcriptStore.readSpan(vatID)) {
const te = /** @type { TranscriptEntry } */ (JSON.parse(entry));
/** @type { [number, TranscriptEntry]} */
const retval = [deliveryNum, te];
yield retval;
deliveryNum += 1;
}
}

Expand Down Expand Up @@ -512,23 +527,44 @@ export function makeVatKeeper(
* Store a snapshot, if given a snapStore.
*
* @param {VatManager} manager
* @returns {Promise<boolean>}
* @returns {Promise<void>}
*/
async function saveSnapshot(manager) {
if (!snapStore || !manager.makeSnapshot) {
return false;
return undefined;
}

// tell the manager to save a heap snapshot to the snapStore
const endPosition = getTranscriptEndPosition();
const info = await manager.makeSnapshot(endPosition, snapStore);
transcriptStore.rolloverSpan(vatID);

const {
hash,
uncompressedSize,
rawSaveSeconds,
compressedSize,
compressSeconds,
} = info;

// push a save-snapshot transcript entry
addToTranscript({
d: /** @type {TDSaveSnapshot} */ ['save-snapshot'],
sc: [],
r: /** @type {TDSaveSnapshotResults} */ { status: 'ok', hash },
});

// then start a new transcript span
transcriptStore.rolloverSpan(vatID);

// then push a load-snapshot entry, so that the current span
// always starts with an initialize-worker or load-snapshot
// pseudo-delivery
addToTranscript({
d: /** @type {TDLoadSnapshot} */ ['load-snapshot', hash],
sc: [],
r: { status: 'ok' },
});

kernelSlog.write({
type: 'heap-snapshot-save',
vatID,
Expand All @@ -539,7 +575,6 @@ export function makeVatKeeper(
compressSeconds,
endPosition,
});
return true;
}

function deleteSnapshotsAndTranscript() {
Expand Down Expand Up @@ -618,6 +653,7 @@ export function makeVatKeeper(
hasCListEntry,
deleteCListEntry,
deleteCListEntriesForKernelSlots,
transcriptSize,
getTranscript,
transcriptSnapshotStats,
addToTranscript,
Expand Down
98 changes: 69 additions & 29 deletions packages/SwingSet/src/kernel/vat-warehouse.js
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@ import djson from '../lib/djson.js';
* @typedef {import('@agoric/swingset-liveslots').VatSyscallResult} VatSyscallResult
* @typedef {import('@agoric/swingset-liveslots').VatSyscallHandler} VatSyscallHandler
* @typedef {import('../types-internal.js').VatManager} VatManager
* @typedef {import('../types-internal.js').VatID} VatID
* @typedef {import('../types-external.js').TranscriptDeliveryInitializeWorker} TDInitializeWorker
* @typedef {import('../types-external.js').TranscriptDeliveryShutdownWorker} TDShutdownWorker
* @typedef {{ body: string, slots: unknown[] }} Capdata
* @typedef { [unknown, ...unknown[]] } Tagged
* @typedef { { moduleFormat: string }} Bundle
Expand Down Expand Up @@ -40,6 +43,23 @@ function recordSyscalls(origHandler) {
return { syscallHandler, getTranscriptSyscalls };
}

/**
* @param {TranscriptEntry} transcriptEntry
* @returns {VatDeliveryObject}
*/
function onlyRealDelivery(transcriptEntry) {
const dtype = transcriptEntry.d[0];
if (
dtype === 'save-snapshot' ||
dtype === 'load-snapshot' ||
dtype === 'initialize-worker' ||
dtype === 'shutdown-worker'
) {
throw Fail`replay should not see ${dtype}`;
}
return transcriptEntry.d;
}

/**
* Make a syscallHandler that returns results from a
* previously-recorded transcript, instead of executing them for
Expand Down Expand Up @@ -251,28 +271,33 @@ export function makeVatWarehouse({
* @returns {Promise<void>}
*/
async function replayTranscript(vatID, vatKeeper, manager) {
const snapshotInfo = vatKeeper.getSnapshotInfo();
const startPos = snapshotInfo ? snapshotInfo.endPos : undefined;
// console.log('replay from', { vatID, startPos });

const total = vatKeeper.vatStats().transcriptCount;
const total = vatKeeper.transcriptSize();
kernelSlog.write({ type: 'start-replay', vatID, deliveries: total });
// TODO glean deliveryNum better, make sure we get the post-snapshot
// transcript starting point right. getTranscript() should probably
// return [deliveryNum, t] pairs.
let deliveryNum = startPos || 0;
for await (const te of vatKeeper.getTranscript(startPos)) {
let first = true;
for await (const [deliveryNum, te] of vatKeeper.getTranscript()) {
// if (deliveryNum % 100 === 0) {
// console.debug(`replay vatID:${vatID} deliveryNum:${deliveryNum} / ${total}`);
// }
//
if (first) {
// the first entry should always be initialize-worker or
// load-snapshot
first = false;
const dtype = te.d[0];
if (dtype === 'initialize-worker' || dtype === 'load-snapshot') {
continue; // TODO: use this to launch the worker
} else {
console.log(`transcript for ${vatID} starts with ${te.d[0]}`);
throw Fail`transcript for ${vatID} doesn't start with init/load`;
}
}
// we slog the replay just like the original, but some fields are missing
const finishSlog = slogReplay(kernelSlog, vatID, deliveryNum, te);
const delivery = onlyRealDelivery(te);
const sim = makeSyscallSimulator(kernelSlog, vatID, deliveryNum, te);
const status = await manager.deliver(te.d, sim.syscallHandler);
const status = await manager.deliver(delivery, sim.syscallHandler);
finishSlog(status);
sim.finishSimulation(); // will throw if syscalls did not match
deliveryNum += 1;
}
kernelSlog.write({ type: 'finish-replay', vatID });
}
Expand All @@ -296,6 +321,17 @@ export function makeVatWarehouse({
const translators = provideTranslators(vatID);
const syscallHandler = buildVatSyscallHandler(vatID, translators);

// if we use transcripts, but don't have one, create one with an
// initialize-worker event, to represent the vatLoader.create()
// we're about to do
if (options.useTranscript && vatKeeper.transcriptSize() === 0) {
vatKeeper.addToTranscript({
d: /** @type {TDInitializeWorker} */ ['initialize-worker'],
sc: [],
r: { status: 'ok' },
});
}

const isDynamic = kernelKeeper.getDynamicVats().includes(vatID);
const managerP = vatLoader.create(vatID, {
isDynamic,
Expand Down Expand Up @@ -437,7 +473,7 @@ export function makeVatWarehouse({
* options: pay $/block to keep in RAM - advisory; not consensus
* creation arg: # of vats to keep in RAM (LRU 10~50~100)
*
* @param {string} currentVatID
* @param {VatID} currentVatID
*/
async function applyAvailabilityPolicy(currentVatID) {
const lru = recent.add(currentVatID);
Expand All @@ -451,13 +487,9 @@ export function makeVatWarehouse({
await evict(lru);
}

/** @type { string | undefined } */
let lastVatID;

/** @type {(vatID: string, kd: KernelDeliveryObject, d: VatDeliveryObject, vs: VatSlog) => Promise<VatDeliveryResult> } */
async function deliverToVat(vatID, kd, vd, vs) {
await applyAvailabilityPolicy(vatID);
lastVatID = vatID;

const recreate = true; // PANIC in the failure case
// create the worker and replay the transcript, if necessary
Expand Down Expand Up @@ -502,21 +534,19 @@ export function makeVatWarehouse({
}

/**
* Save a snapshot of most recently used vat,
* depending on snapshotInterval.
* Save a heap snapshot for the given vatID, if the snapshotInterval
* is satisified
*
* @param {VatID} vatID
*/
async function maybeSaveSnapshot() {
if (!lastVatID || !lookup(lastVatID)) {
return false;
}

async function maybeSaveSnapshot(vatID) {
const recreate = true; // PANIC in the failure case
const { manager } = await ensureVatOnline(lastVatID, recreate);
const { manager } = await ensureVatOnline(vatID, recreate);
if (!manager.makeSnapshot) {
return false;
return false; // worker cannot make snapshots
}

const vatKeeper = kernelKeeper.provideVatKeeper(lastVatID);
const vatKeeper = kernelKeeper.provideVatKeeper(vatID);
let reason;
const { totalEntries, snapshottedEntries } =
vatKeeper.transcriptSnapshotStats();
Expand All @@ -530,10 +560,15 @@ export function makeVatWarehouse({
}
// console.log('maybeSaveSnapshot: reason:', reason);
if (!reason) {
return false;
return false; // not time to make a snapshot
}

// in addition to saving the actual snapshot,
// vatKeeper.saveSnapshot() pushes a save-snapshot transcript
// entry, then starts a new transcript span, then pushes a
// load-snapshot entry, so that the current span always starts
// with an initialize-snapshot or load-snapshot pseudo-delivery
await vatKeeper.saveSnapshot(manager);
lastVatID = undefined;
return true;
}

Expand Down Expand Up @@ -582,6 +617,11 @@ export function makeVatWarehouse({
async function resetWorker(vatID) {
await evict(vatID);
const vatKeeper = kernelKeeper.provideVatKeeper(vatID);
vatKeeper.addToTranscript({
d: /** @type {TDShutdownWorker} */ ['shutdown-worker'],
sc: [],
r: { status: 'ok' },
});
vatKeeper.dropSnapshotAndResetTranscript();
}

Expand Down
16 changes: 14 additions & 2 deletions packages/SwingSet/src/types-external.js
Original file line number Diff line number Diff line change
Expand Up @@ -121,9 +121,21 @@ export {};
* @typedef {[tag: 'error', problem: string]} DeviceInvocationResultError
* @typedef { DeviceInvocationResultOk | DeviceInvocationResultError } DeviceInvocationResult
*
* @typedef { [tag: 'initialize-worker'] } TranscriptDeliveryInitializeWorker
* @typedef { [tag: 'save-snapshot'] } TranscriptDeliverySaveSnapshot
* @typedef { [tag: 'load-snapshot', snapshotID: string] } TranscriptDeliveryLoadSnapshot
* @typedef { [tag: 'shutdown-worker'] } TranscriptDeliveryShutdownWorker
* @typedef { import('@agoric/swingset-liveslots').VatDeliveryObject
* | TranscriptDeliveryInitializeWorker
* | TranscriptDeliverySaveSnapshot
* | TranscriptDeliveryLoadSnapshot
* | TranscriptDeliveryShutdownWorker
* } TranscriptDelivery
* @typedef { { s: VatSyscallObject, r: VatSyscallResult } } TranscriptSyscall
* @typedef { { status: string } } TranscriptDeliveryResults
* @typedef { { d: VatDeliveryObject, sc: TranscriptSyscall[], r: TranscriptDeliveryResults } } TranscriptEntry
* @typedef { { status: string, hash: string } } TranscriptDeliverySaveSnapshotResults
* @typedef { { status: string } } TranscriptDeliveryGenericResults
* @typedef { TranscriptDeliverySaveSnapshotResults | TranscriptDeliveryGenericResults } TranscriptDeliveryResults
* @typedef { { d: TranscriptDelivery, sc: TranscriptSyscall[], r: TranscriptDeliveryResults } } TranscriptEntry
* @typedef { { transcriptCount: number } } VatStats
* @typedef { ReturnType<typeof import('./kernel/state/vatKeeper').makeVatKeeper> } VatKeeper
* @typedef { ReturnType<typeof import('./kernel/state/kernelKeeper').default> } KernelKeeper
Expand Down
13 changes: 8 additions & 5 deletions packages/SwingSet/test/test-controller.js
Original file line number Diff line number Diff line change
Expand Up @@ -55,12 +55,15 @@ async function simpleCall(t) {
const vattpVatID = controller.vatNameToID('vattp');
const timerVatID = controller.vatNameToID('timer');

const transcript = [
[0, { d: ['initialize-worker'], sc: [], r: { status: 'ok' } }],
];
t.deepEqual(data.vatTables, [
{ vatID: adminVatID, state: { transcript: [] } },
{ vatID: commsVatID, state: { transcript: [] } },
{ vatID: vattpVatID, state: { transcript: [] } },
{ vatID: timerVatID, state: { transcript: [] } },
{ vatID: vat1ID, state: { transcript: [] } },
{ vatID: adminVatID, state: { transcript } },
{ vatID: commsVatID, state: { transcript: [] } }, // transcriptless
{ vatID: vattpVatID, state: { transcript } },
{ vatID: timerVatID, state: { transcript } },
{ vatID: vat1ID, state: { transcript } },
]);
// the vatAdmin root is pre-registered
const vatAdminRoot = ['ko20', adminVatID, 'o+0'];
Expand Down
Loading

0 comments on commit 6a9b91d

Please sign in to comment.