Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(swingset): get a small upgrade to work #4927

Merged
merged 1 commit into from
Mar 28, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 53 additions & 18 deletions packages/SwingSet/src/kernel/kernel.js
Original file line number Diff line number Diff line change
Expand Up @@ -640,33 +640,68 @@ export default function buildKernel(
*/
async function processUpgradeVat(message) {
assert(vatAdminRootKref, `initializeKernel did not set vatAdminRootKref`);
// const { bundleID } = message;
const { vatID, upgradeID, vatParameters } = message;
const { vatID, upgradeID, bundleID, vatParameters } = message;
insistCapData(vatParameters);

// eslint-disable-next-line no-use-before-define
assert(vatWarehouse.lookup(vatID));
const vatKeeper = kernelKeeper.provideVatKeeper(vatID);
/** @type { import('../types-external.js').KernelDeliveryStopVat } */
const kd = harden(['stopVat']);
const kd1 = harden(['stopVat']);
// eslint-disable-next-line no-use-before-define
const vd = vatWarehouse.kernelDeliveryToVatDelivery(vatID, kd);
const status = await deliverAndLogToVat(vatID, kd, vd);
const vd1 = vatWarehouse.kernelDeliveryToVatDelivery(vatID, kd1);
const status1 = await deliverAndLogToVat(vatID, kd1, vd1);
if (status1.terminate) {
// TODO: if stopVat fails, stop now, arrange for everything to
// be unwound. TODO: we need to notify caller about the failure
console.log(`-- upgrade-vat stopVat failed: ${status1.terminate}`);
}

// TODO: if status.terminate then abort the crank, discard the
// upgrade event, and arrange to use vatUpgradeCallback to inform
// the caller
// stop the worker, delete the transcript and any snapshot
// eslint-disable-next-line no-use-before-define
await vatWarehouse.destroyWorker(vatID);
const source = { bundleID };
const { options } = vatKeeper.getSourceAndOptions();
vatKeeper.setSourceAndOptions(source, options);
// TODO: decref the bundleID once setSourceAndOptions increfs it

// for now, all attempts to upgrade will fail
// pause, take a deep breath, appreciate this moment of silence
// between the old and the new. this moment will never come again.

// TODO: decref the bundleID and vatParameters.slots
const args = {
body: JSON.stringify([upgradeID, false, { error: `not implemented` }]),
slots: [],
};
queueToKref(vatAdminRootKref, 'vatUpgradeCallback', args, 'logFailure');
// if stopVat fails, we want everything to be unwound. TODO: we
// need to notify caller about the failure
return { ...status, discardFailedDelivery: true };
// deliver a startVat with the new vatParameters
/** @type { import('../types-external.js').KernelDeliveryStartVat } */
const kd2 = harden(['startVat', vatParameters]);
// eslint-disable-next-line no-use-before-define
const vd2 = vatWarehouse.kernelDeliveryToVatDelivery(vatID, kd2);
// decref vatParameters now that translation did incref
for (const kref of vatParameters.slots) {
kernelKeeper.decrementRefCount(kref, 'upgrade-vat-event');
}
const status2 = await deliverAndLogToVat(vatID, kd2, vd2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little nervous about doing two deliverAndLogToVat calls within a single crank. While there's nothing specifically I can point at that seems wrong per se, up until now cranks and deliveries have been in 1:1 correspondence. In particular, we should be extra careful that there's nothing in deliverAndLogToVat that bakes in assumptions that it is at the start of a crank on the way in or at the end of a crank on the way out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this is the first time we're taking advantage of the difference between "crank" and "delivery". I think vatWarehouse.deliverToVat and the slogger is entirely ready for it, but the part that I know needs some polish is on the delivery results. The only sort of errors we've dealt with so far are all the kinds that terminate the vat. We're now entering into territory where e.g. startVat will soon be able to notice that userspace failed to reconnect all of the durable Kinds, and if that happens we want to roll back the entire upgrade. We'll need a response that says "hey caller, the worker is wedged and no longer viable, but if you abortCrank and jettison the worker, you could resume from the earlier snapshot without problems". My plan is to get roll-back-failed-upgrade working after getting the successful upgrade paths done, at which point I'll be looking more closely at these delivery results and their error modes.

if (status2.terminate) {
console.log(`-- upgrade-vat startVat failed: ${status2.terminate}`);
}

if (status1.terminate || status2.terminate) {
// TODO: if status.terminate then abort the crank, discard the
// upgrade event, and arrange to use vatUpgradeCallback to inform
// the caller
console.log(`-- upgrade-vat delivery failed`);

// TODO: this is the message we want to send on failure, but we
// need to queue it after the crank was unwound, else this
// message will be unwound too
const args = {
body: JSON.stringify([upgradeID, false, { error: `not implemented` }]),
slots: [],
};
queueToKref(vatAdminRootKref, 'vatUpgradeCallback', args, 'logFailure');
} else {
const args = { body: JSON.stringify([upgradeID, true]), slots: [] };
queueToKref(vatAdminRootKref, 'vatUpgradeCallback', args, 'logFailure');
}
// return { ...status1, ...status2, discardFailedDelivery: true };
return {};
}

function legibilizeMessage(message) {
Expand Down
22 changes: 22 additions & 0 deletions packages/SwingSet/src/kernel/state/vatKeeper.js
Original file line number Diff line number Diff line change
Expand Up @@ -567,6 +567,27 @@ export function makeVatKeeper(
return true;
}

function removeSnapshotAndTranscript() {
const skey = `local.${vatID}.lastSnapshot`;
const epkey = `${vatID}.t.endPosition`;
if (snapStore) {
const notation = kvStore.get(skey);
if (notation) {
const { snapshotID } = JSON.parse(notation);
if (removeFromSnapshot(snapshotID) === 0) {
// TODO: if we roll back (because the upgrade failed), we must
// not really delete the snapshot
snapStore.prepareToDelete(snapshotID);
}
kvStore.delete(skey);
}
}
// TODO: same rollback concern
// TODO: streamStore.deleteStream(transcriptStream);
const newStart = streamStore.STREAM_START;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to reset the stream pointer to the start position or merely remove the prior transcript entries? Seems like for debugging purposes it would be nicer to maintain the continuity of transcript position across the upgrade, so that, for example, if one was looking at a historical record of a transcript there would be no ambiguity about which incarnation of the vat code it was associated with.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was wondering about that too. The conclusion I came to was that the transcript is intimately tied to the bundleID being executed: it doesn't make sense to retain a transcript without also retaining the bundle. It's not like you can replay the concatenated transcripts on top of only the original source bundle and get to the current state.

To rebuild the vat from nothing requires a data structure that looks like "load bundle1, execute these N1 transcript entries, then shutdown and load bundle2, then execute these other N2 entries, etc". To do that properly we'd need to put the bundleID in the startVat delivery, and record the stopVat deliveries too. And you'd need to retain the source bundles for all the bundleIDs in all the startVats in the extended transcript.

That's not a bad design, but I haven't stopped to think about how long it would take to make the necessary changes to achieve it. I'll spend a bit evaluating that before I land this PR. At the very least I'll sketch out what the complete design would be so I can compare.

Performance-wise, I was hoping to delete as much data as possible, and being able to delete the transcript and the old bundles is appealing. A new validator who's trying to catch up doesn't need to replay the transcripts from bundle-1: it's sufficient for them to replay from just the most recent startVat. That helps their catchup performance a lot, especially if we manage periodic "null upgrades" whose main job is to discard the transcripts. But of course we don't have to delete the historical transcripts to achieve this, as long as new validators can merkle-root-validate the most recent transcript segment without needing to fetch all of the earlier ones too (i.e. we need to be judicious with our merkle tree layout).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleting the old transcript sounds fine to me. I was just thinking that having continuity of the crank numbers in the historical timeline would be helpful when debugging timelines that cross an update boundary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it's non-trivial to make this change now. We could change startVat to include the bundleID, but the manager would have to snoop that and rewrite the delivery to include the whole bundle (or, better, send a setBundle command with the bundle contents before sending the deliver(startVat) command). But given the way the supervisor is currently written, that's too late, because the supervisor needs the full bundle to define buildVatNamespace before passing it into makeLiveSlots. We'd have to change makeLiveSlots to return a setBuildVatNamespace, sort of like how it used to return a setBuildRootObject.

I'll open a new ticket to see if we can come back to this later. There's an open policy question as to whether we should retain the historical transcripts from the very beginning of time, or just enough to enable #1691 -style replay (which, to be clear, cannot benefit from replay beyond the most recent version, which is a significant downside of the null-upgrade step).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created #4940 to track this idea.

kvStore.set(epkey, `${JSON.stringify(newStart)}`);
}

function vatStats() {
function getCount(key, first) {
const id = Nat(BigInt(getRequired(key)));
Expand Down Expand Up @@ -638,5 +659,6 @@ export function makeVatKeeper(
saveSnapshot,
getLastSnapshot,
removeFromSnapshot,
removeSnapshotAndTranscript,
});
}
9 changes: 9 additions & 0 deletions packages/SwingSet/src/kernel/vat-warehouse.js
Original file line number Diff line number Diff line change
Expand Up @@ -361,6 +361,13 @@ export function makeVatWarehouse(kernelKeeper, vatLoader, policyOptions) {
}
}

async function destroyWorker(vatID) {
// stop any existing worker, delete transcript and any snapshot
await evict(vatID);
const vatKeeper = kernelKeeper.provideVatKeeper(vatID);
vatKeeper.removeSnapshotAndTranscript();
}

// mostly used by tests, only needed with thread/process-based workers
function shutdown() {
const work = Array.from(ephemeral.vats.values(), ({ manager }) =>
Expand All @@ -378,6 +385,8 @@ export function makeVatWarehouse(kernelKeeper, vatLoader, policyOptions) {
deliverToVat,
maybeSaveSnapshot,

destroyWorker,

// mostly for testing?
activeVatsInfo: () =>
[...ephemeral.vats].map(([id, { options }]) => ({ id, options })),
Expand Down
2 changes: 2 additions & 0 deletions packages/SwingSet/src/liveslots/liveslots.js
Original file line number Diff line number Diff line change
Expand Up @@ -1267,6 +1267,8 @@ function build(
assert(didStartVat);
assert(!didStopVat);
didStopVat = true;
// eslint-disable-next-line no-use-before-define
await bringOutYourDead();
// empty for now
}

Expand Down
10 changes: 3 additions & 7 deletions packages/SwingSet/test/upgrade/test-upgrade.js
Original file line number Diff line number Diff line change
Expand Up @@ -54,13 +54,9 @@ async function testUpgrade(t, defaultManagerType) {

// upgrade should work
const [v2status, v2capdata] = await run('upgradeV2', []);
// t.is(v2status, 'fulfilled');
// t.deepEqual(JSON.parse(v2capdata.body), ['v2', { youAre: 'v2', marker }]);
// t.deepEqual(v2capdata.slots, [markerKref]);

// but for now, upgrade is just a stub
t.is(v2status, 'rejected');
t.deepEqual(JSON.parse(v2capdata.body), { error: 'not implemented' });
t.is(v2status, 'fulfilled');
t.deepEqual(JSON.parse(v2capdata.body), ['v2', { youAre: 'v2', marker }]);
t.deepEqual(v2capdata.slots, [markerKref]);
}

test('vat upgrade - local', async t => {
Expand Down