fix(SwingSet): move promise rejection from stopVat to the kernel #7130

gibson042 · 2023-03-07T23:48:04Z

closes: #6694

Description

Reject orphaned promises in terminateVat
Reject orphaned promises in processUpgradeVat
Remove special-casing in durable promise watchers

Security Considerations

This is an overall increase in robustness against problems with dying vats.

Scaling Considerations

I don't believe there are any changes w.r.t. scaling.

Documentation Considerations

All changes would be internal, but we should still do a scan before merging.

Testing Considerations

No changes.

datadog-full-agoric · 2023-03-07T23:57:55Z

Datadog Report

Branch report: 6694-kernel-disconnect-promises
Commit report: a0ac542

✅ agoric-sdk: 0 Failed, 0 New Flaky, 3664 Passed, 55 Skipped, 15m 28.59s Wall Time

packages/SwingSet/src/kernel/state/kernelKeeper.js

warner

Nice! We'll need to move that new test into swingset/, remove the new dependencies, and write some new narrow-style tests to live in liveslots/ . Grab me tomorrow and we can pair on those tests if you like.

warner · 2023-03-09T22:56:31Z

packages/swingset-liveslots/src/vpid-tracking.md

@@ -2,85 +2,88 @@

 Kernels and vats communicate about promises by referring to their VPIDs: vat(-centric) promise IDs. These are strings like `p+12` and `p-23`. Like VOIDs (object IDs), the plus/minus sign indicates which side of the boundary allocated the number (`p+12` and `o+12` are allocated by the vat, `p-13` and `o-13` are allocated by the kernel). But where the object ID sign also indicates which side "owns" the object (i.e. where the behavior lives), the promise ID sign is generally irrelevant.

-Instead, we care about which side holds the resolution authority. This is not indicated by the VPID sign, and in fact is not necessarily static. Liveslots does not currently have any mechanism to allow one promise to be forwarded to another, but if it acquires this some day, then the decider of a promise could shift from kernel to vat to kernel again before it finally gets resolved. And there *is* a sequence that allows a vat to receive a reference to a promise in method arguments first, then receive a message whose result promise uses that VPID second (`p1 = p2~.foo(); x~.bar(p1)`, then someone resolves `p2` to `x`). In this case, the decider is initially the kernel, then the authority is transferred to the vat later.
+Instead, we care about which side holds the resolution authority for a promise (referred to as being its **decider**). This is not indicated by the VPID sign, and in fact is not necessarily static. Liveslots does not currently have any mechanism to allow one promise to be forwarded to another, but if it acquires this some day, then the decider of a promise could shift from kernel to vat to kernel again before it finally gets resolved. And there *are* sequences that allow a vat to receive a reference to a promise in method arguments before receiving a message for which that same VPID identifies the result promise (e.g., `pk = makePromiseKit(); p2 = E(pk.promise).push('queued'); await E(observer).push(p2); pk.resolve(observer)` with an observer whose `push` returns a prompt response). In such cases, the decider is initially the kernel but later becomes the receiving vat.


I fully agree with updating the example to E() instead of tildot, and your rewording is a lot more clear. I'm not sure I understand the new example.

OTOH I'm not sure I understand my original either. Let's see:

p2 is decided by the kernel

p1 is also decided by the kernel, at least until p2 is resolved

x is an object in some vat

bar is sent into that vat, carrying p1 as an argument, so the vat knows that p1 will be decided by the kernel, later

p2 gets resolved to x, allowing the foo() to be unqueued and delivered to the vat that hosts x

the dispatch.deliver() of foo causes its result promise (p1) to change decidership from the kernel to the vat that hosts x

none of this requires pipelining or promise forwarding

Let's see, in your new example:

pk and p2 are local to the sending vat

E(observer).push(p2) to a remote vat causes p2 to be exported, so the kernel knows the sending vat is the decider of p2

by "push returns a prompt response" I'm guessing there's a crank boundary between the push(p2) and the pk.resolve()?

pk.resolve(observer) resolves the local promise to the remote Presence, allowing the push("queued") to be sent out into the kernel towards "observer"

the remote vat receives push("queued")

I don't think I see which promise is changing decider from kernel to receiving vat.

Here's my understanding of how decidership of p2 changes; please let me know if I've got anything wrong.

step1: pk = makePromiseKit();

pk.promise is local to Alice

step2: p2 = E(pk.promise).push('queued');

syscall.send(target=pk.promise, method='push', args=['queued'], result=p2)

Kernel becomes the decider of p2 and notes that Alice is the decider of pk.promise

step3: await E(observer).push(p2);

syscall.send(target=observer, method='push', args=[p2], result=<anonymous>)

Kernel becomes the decider of <anonymous> and notes subscription by Alice

Alice loses agency

step3a

Kernel handles the step2 syscall.send by pushing onto the queue for pk.promise

step3b

Kernel handles the step3 syscall.send by deliver(type='send', target=observer, method='push', args=[p2], result=<anonymous>) and making observer's vat the decider of <anonymous>

observer does syscall.resolve(subject=<anonymous>, resolution=…)

observer loses agency

step3c

Kernel handles the syscall.resolve, which involves notify(vatID=Alice, promise=<anonymous>) to enqueue a message

step3d

Kernel handles the notify by deliver(type='notify', subject=<anonymous>, resolution=…)

Alice regains agency

step4: pk.resolve(observer);

syscall.resolve(subject=pk.promise, resolution=observer)

Alice loses agency

step4a

Kernel handles the step4 syscall.resolve, which involves noting resolution of pk.promise to observer and requeueKernelPromise(pk.promise) to requeue messages to it

step4b

Kernel handles the requeued step2 syscall.send by deliver(type='send', target=observer, method='push', args=['queued'], result=p2) and making observer's vat the decider of p2 (which it previously only knew from delivery arguments) and [hopefully!] also noting subscription by Alice to p2

step2: p2 = E(pk.promise).push('queued');
syscall.send(target=pk.promise, method='push', args=['queued'], result=p2)

Ah, ok since pk.promise is local, this E() is handled entirely by HandledPromise, and liveslots doesn't see it at all, until/unless pk.promise gets resolved to something that liveslots created.

(I'm betting that invalidates the rest of the sequence, so I didn't study it carefully yet, we can sync up on this in realtime and work out an example that demonstrates the problem, or maybe remove it from the docs.. I'm pretty sure I wrote the original to remind myself in the future, because it wasn't easy to reconstruct the problematic scenario, so it'd be nice to include at least a few hints)

packages/swingset-liveslots/src/vpid-tracking.md

warner · 2023-03-10T01:37:25Z

packages/swingset-liveslots/src/liveslots.js

@@ -1183,7 +1191,6 @@ function build(
    // upgrade, or if we acquire decider authority for a
    // previously-imported promise
    if (pRec) {
-      // TODO: insist that we do not have decider authority for promiseID


hm, I'm not 100% sure this goes away.

Let's see, if we resolve a watched local promise, currently the kernel won't send the notify back into us, because our syscall.resolve will delete the vpid from our c-list, which means the kernel's notify event will find the c-list entry missing, so the kernel can't deliver a dispatch.notify (it has no vpid to translate it into). So we won't observe that case.

(I think we would tolerate it anyways, if liveslots receives a notify for a promise whose vpid it doesn't recognize, it will ignore it. But I also think that case won't happen, at least not for these cancelled-by-upgrade promises)

If we watch a remotely-decided promise, the dispatch.notify will tell us about a promise that we aren't the decider for, so the assertion implied by the comment would be satisfied.

If we're a new incarnation, and we just finished walking our saved WatchedPromises table, and regenerated new Promise objects for everything there, we can pretend that the kernel is the decider for all of them. Some of them were decided by our predecessor, but we don't know that, and the kernel effectively claimed decidership when it shut down our predecessor.

Maybe just change it to "it is unlikely that we have decider authority for these: do some case analysis, and then maybe add an assertion".

Does liveslots track decidership at all? It looks to me like this block handles a notify for any watched promise in the imported set regardless of how it got there, and that being in the imported set is a solid basis for proceeding.

Yeah, you're probably right. Our comment near the importedVPIDs declaration claims that it only holds VPIDs for which the kernel is the decider.. if that's actually true, then this comment is stale (and a tautology).

I remember doing a lot of refactoring on this about a year ago, and maybe I introduced importedVPIDs/exportedVPIDs at that time, to clean up liveslot's approach to keeping track of decidership. I suspect it still isn't precisely correct (in the face of forwardings), but it's clearly trying to be.

So yeah, go ahead and delete that line. There's no better criteria for decidership than membership/not in importedVPIDs, so if there's any cleanup or other cases to inspect, it won't be in this function, it'll be in the dispatch.deliver that reacts to a previously-known result= VPID, or in the code that does syscall.send.

warner · 2023-03-10T01:41:24Z

packages/swingset-liveslots/src/liveslots.js

@@ -1424,7 +1431,7 @@ function build(
      Fail`buildRootObject() for vat ${forVatID} returned ${rootObject} with no interface`;
    // Need to load watched promises *after* buildRootObject() so that handler kindIDs
    // have a chance to be reassociated with their handlers.
-    watchedPromiseManager.loadWatchedPromiseTable();
+    watchedPromiseManager.loadWatchedPromiseTable(unmeteredRevivePromise);


What's our rationale for using unmetered revivePromise? I'm sure there might be one, but each time this comes up I have to think carefully about what it might or might not be sensitive to, and I'd like to have a clear reason for each place where we disable metering. I think it's usually a question of whether the function's behavior might depend upon the WeakRefs in slotToVal. If loadWatchedPromiseTable is strictly always adding things to that table (there's no chance for a vpid to already exist there), then I suspect we don't need/want it to be unmetered.

This preserves existing behavior, but I would like to leave enough breadcrumbs to avoid such doubt in the future.

The first line of revivePromise asserts absence of metering, and then the rest proceeds to call pRec = makePipelinablePromise(vpid), store the vpid–pRec mapping in importedVPIDs, register the vpid–promise mapping for the marshaller with registerValue, and return the promise.
revivePromise is called only by loadWatchedPromiseTable, which initializes an entry for the vpid–promise pair in promiseRegistrations and invokes pseudoThen to register settlement callbacks.

What commentary would you like to see at this invocation of loadWatchedPromiseTable?

Good point, and yeah we don't have the time to dive any deeper than what you've just mapped out. Maybe just add "TODO: why does this need to be unmetered?", and we'll eventually do another pass through and revisit those decisions.

packages/swingset-liveslots/package.json

packages/swingset-liveslots/test/test-vat-upgrade.js

datadog-full-agoric · 2023-03-16T02:41:05Z

Datadog Report

Branch report: 6694-kernel-disconnect-promises
Commit report: 3c7d40b

✅ agoric-sdk: 0 Failed, 0 New Flaky, 3744 Passed, 57 Skipped, 18m 51.48s Wall Time

warner

mostly good, but let's see if we can remove that new vat-data dependency

packages/swingset-liveslots/src/vpid-tracking.md

warner · 2023-03-16T18:55:24Z

packages/vat-data/src/exo-utils.js

I don't see any problems with this refactoring, but if you adopt my request to avoid a liveslots->vat-data dependency, maybe it could go into a separate PR?

warner · 2023-03-16T19:07:31Z

packages/swingset-liveslots/test/test-handled-promises.js

+import { M } from '@agoric/store';
+import { makePromiseKit } from '@endo/promise-kit';
+// import { makeStoreUtils } from '../../vat-data/src/vat-data-bindings.js';
+import { makeExoUtils } from '../../vat-data/src/exo-utils.js';


This import feels problematic. At the very least, let's change it to @agoric/vat-data/src/exo-util.js (using ../.. is kinda cheating / assuming the monorepo layout, and I'm likely to be moving swingset-liveslots out to a separate repository, which will break ../../)

At a deeper level, Exos are not a liveslots concern. Liveslots provides Kinds, and userspace can use @agoric/vat-data to get Exos on top of that. We need to make sure that liveslot's defineKind works when used by makeExo stuff, but I think that should be the responsibility of vat-data rather than liveslots.

And I'm worried about the coupling this induces (especially because of my upcoming #6596 work): currently swingset-liveslots has a devDependency upon vat-data solely to get the types for DefineKindOptions and DurableKindHandle, and I think those types should be exported by liveslots rather than having liveslots import them from vat-data. With that change, we could shed the liveslots -> vat-data devDependency entirely, which is going to help with the #6596 packaging effort (the build process for xsnap-supervisor, which imports liveslots, can happen in a workspace/repo/node_modules that doesn't have swingset at all, which would prune a cycle).

Could you have this test work with a Kind instead of an Exo, and remove the dependency on vat-data?

Huh, I thought I had already done that. Fixed.

warner

Ok everything looks good except for the vpid-tracking.md line 5 "vpid forwarding" example. Maybe just leave that line the way it was (tildot and all), and we come back to it later for a more complete analysis? I'm pretty sure the original wording expressed whatever weird case it was that I thought of years ago (even if I don't remember how that case worked anymore), and I'm not convinced that the new wording accomplishes that, and I don't want to hold this up just for the sake of updating the example.

This makes room inside processUpgradeVat() for the new cleanup steps that we want to add. refs #6650 refs #7001 refs #6694 refs #6696

cleanupAfterTerminatedVat() had a loop which identified all the promises (kpids) which were decided by a vat, so it could reject all of them as the vat was being terminated. We will have a need for this list of kpids during vat upgrade too, so refactor it out into a separate kernelKeeper method, and change terminateVat() to fetch the list before calling cleanupAfterTerminatedVat().

When upgrading a vat, we must reject all promises that were decided by the old incarnation, because none of them are durable (the new incarnation will be unable to resolve them). Previously, the vat's stopVat() delivery would do the rejection. This moves the responsibility to the kernel, to happen in the pause between the old version being shut down and the new version being launched. closes #6694

…tions

Increases clarity and convenience and sets up for future changes.

…he kernel Fixes #6694

…de changes

…ata scope

Ref #6523

…Stuff

gibson042 requested a review from warner March 7, 2023 23:48

gibson042 marked this pull request as draft March 7, 2023 23:48

gibson042 force-pushed the 6694-kernel-disconnect-promises branch from d87c370 to 90e156c Compare March 8, 2023 00:00

gibson042 marked this pull request as ready for review March 9, 2023 02:23

gibson042 commented Mar 9, 2023

View reviewed changes

packages/SwingSet/src/kernel/state/kernelKeeper.js Show resolved Hide resolved

warner requested changes Mar 10, 2023

View reviewed changes

gibson042 requested a review from warner March 14, 2023 01:27

gibson042 mentioned this pull request Mar 16, 2023

fix: Move upgrade-time abandonExports responsibility into the kernel #7170

Merged

1 task

warner requested changes Mar 16, 2023

View reviewed changes

gibson042 requested a review from warner March 16, 2023 20:19

warner approved these changes Mar 16, 2023

View reviewed changes

warner added the SwingSet package: SwingSet label Mar 16, 2023

warner assigned gibson042 Mar 16, 2023

warner and others added 15 commits March 16, 2023 19:02

fix(swingset): add placeholders for upgrade/stopvat work

c02253c

This makes room inside processUpgradeVat() for the new cleanup steps that we want to add. refs #6650 refs #7001 refs #6694 refs #6696

fix(SwingSet): Access getDecidedPromises on the correct object

24016f0

fix(SwingSet): Iterate kpids decided by a vat before removing its data

517ab28

style(SwingSet): Colocate getDecidedPromises with related functions

4189e9a

style(SwingSet): Align promise-by-decider generator with related func…

f0ee093

…tions

test(SwingSet): Minor improvements

f465398

style(SwingSet): Improve variable names and step ordering

82b37a7

test(swingset-liveslots): Verify that vat upgrade rejects local promises

9ce63bc

docs(swingset-liveslots): Clean up VPID managment documentation

a602f2c

refactor(swingset-liveslots): Update some function signatures

5d84157

Increases clarity and convenience and sets up for future changes.

fix(swingset-liveslots): Move promise rejection responsibility into t…

dd29ff3

…he kernel Fixes #6694

docs(swingset-liveslots): Update VPID management documentation for co…

cf2c34d

…de changes

docs(swingset-liveslots): Clarify VPID managment documentation

01729ab

gibson042 added 10 commits March 16, 2023 19:04

test: Move full-environment test from liveslots to SwingSet

be8adc3

refactor(vat-data): Merge kind-utils.js into exo-utils.js

abbf2a8

refactor(vat-data): Generalize store/kind/exo maker factories to VatD…

835b5e5

…ata scope

test(swingset-liveslots): Expand the signature of setupTestLiveslots

261e070

Ref #6523

test(swingset-liveslots): Add a default cache size to makeFakeVirtual…

60a8375

…Stuff

test(swingset-liveslots): Add liveslots-level handled promises tests

9207f44

chore: Minor jsdoc/eslint improvements

8327856

fixup! docs(swingset-liveslots): Clarify VPID managment documentation

5a723d4

test(swingset-liveslots): Remove the dependency on vat-data

3777b89

docs(swingset-liveslots): Restore tildot example that might be important

8dab80f

gibson042 force-pushed the 6694-kernel-disconnect-promises branch from 062317f to 8dab80f Compare March 16, 2023 23:05

gibson042 added the automerge:no-update (expert!) Automatically merge without updates label Mar 16, 2023

mergify bot merged commit 9d47dd0 into master Mar 17, 2023

mergify bot deleted the 6694-kernel-disconnect-promises branch March 17, 2023 00:11

warner mentioned this pull request Mar 20, 2023

Should watchPromise support a remote watcher? #7162

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(SwingSet): move promise rejection from stopVat to the kernel #7130

fix(SwingSet): move promise rejection from stopVat to the kernel #7130

gibson042 commented Mar 7, 2023 •

edited

Loading

datadog-full-agoric bot commented Mar 7, 2023 •

edited

Loading

warner left a comment

warner Mar 9, 2023

gibson042 Mar 10, 2023 •

edited

Loading

warner Mar 16, 2023

warner Mar 10, 2023

gibson042 Mar 10, 2023

warner Mar 16, 2023

warner Mar 10, 2023

gibson042 Mar 10, 2023

warner Mar 16, 2023

datadog-full-agoric bot commented Mar 16, 2023 •

edited

Loading

warner left a comment

warner Mar 16, 2023

warner Mar 16, 2023

gibson042 Mar 16, 2023

warner left a comment

fix(SwingSet): move promise rejection from stopVat to the kernel #7130

fix(SwingSet): move promise rejection from stopVat to the kernel #7130

Conversation

gibson042 commented Mar 7, 2023 • edited Loading

Description

Security Considerations

Scaling Considerations

Documentation Considerations

Testing Considerations

datadog-full-agoric bot commented Mar 7, 2023 • edited Loading

Datadog Report

warner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gibson042 Mar 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

datadog-full-agoric bot commented Mar 16, 2023 • edited Loading

Datadog Report

warner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

warner left a comment

Choose a reason for hiding this comment

gibson042 commented Mar 7, 2023 •

edited

Loading

datadog-full-agoric bot commented Mar 7, 2023 •

edited

Loading

gibson042 Mar 10, 2023 •

edited

Loading

datadog-full-agoric bot commented Mar 16, 2023 •

edited

Loading