-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(SwingSet): move promise rejection from stopVat to the kernel #7130
Conversation
Datadog ReportBranch report: ✅ |
d87c370
to
90e156c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! We'll need to move that new test into swingset/, remove the new dependencies, and write some new narrow-style tests to live in liveslots/ . Grab me tomorrow and we can pair on those tests if you like.
@@ -2,85 +2,88 @@ | |||
|
|||
Kernels and vats communicate about promises by referring to their VPIDs: vat(-centric) promise IDs. These are strings like `p+12` and `p-23`. Like VOIDs (object IDs), the plus/minus sign indicates which side of the boundary allocated the number (`p+12` and `o+12` are allocated by the vat, `p-13` and `o-13` are allocated by the kernel). But where the object ID sign also indicates which side "owns" the object (i.e. where the behavior lives), the promise ID sign is generally irrelevant. | |||
|
|||
Instead, we care about which side holds the resolution authority. This is not indicated by the VPID sign, and in fact is not necessarily static. Liveslots does not currently have any mechanism to allow one promise to be forwarded to another, but if it acquires this some day, then the decider of a promise could shift from kernel to vat to kernel again before it finally gets resolved. And there *is* a sequence that allows a vat to receive a reference to a promise in method arguments first, then receive a message whose result promise uses that VPID second (`p1 = p2~.foo(); x~.bar(p1)`, then someone resolves `p2` to `x`). In this case, the decider is initially the kernel, then the authority is transferred to the vat later. | |||
Instead, we care about which side holds the resolution authority for a promise (referred to as being its **decider**). This is not indicated by the VPID sign, and in fact is not necessarily static. Liveslots does not currently have any mechanism to allow one promise to be forwarded to another, but if it acquires this some day, then the decider of a promise could shift from kernel to vat to kernel again before it finally gets resolved. And there *are* sequences that allow a vat to receive a reference to a promise in method arguments before receiving a message for which that same VPID identifies the result promise (e.g., `pk = makePromiseKit(); p2 = E(pk.promise).push('queued'); await E(observer).push(p2); pk.resolve(observer)` with an observer whose `push` returns a prompt response). In such cases, the decider is initially the kernel but later becomes the receiving vat. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fully agree with updating the example to E()
instead of tildot, and your rewording is a lot more clear. I'm not sure I understand the new example.
OTOH I'm not sure I understand my original either. Let's see:
- p2 is decided by the kernel
- p1 is also decided by the kernel, at least until p2 is resolved
x
is an object in some vatbar
is sent into that vat, carryingp1
as an argument, so the vat knows thatp1
will be decided by the kernel, laterp2
gets resolved tox
, allowing thefoo()
to be unqueued and delivered to the vat that hosts x- the
dispatch.deliver()
offoo
causes its result promise (p1
) to change decidership from the kernel to the vat that hosts x - none of this requires pipelining or promise forwarding
Let's see, in your new example:
pk
andp2
are local to the sending vatE(observer).push(p2)
to a remote vat causesp2
to be exported, so the kernel knows the sending vat is the decider ofp2
- by "push returns a prompt response" I'm guessing there's a crank boundary between the push(p2) and the pk.resolve()?
pk.resolve(observer)
resolves the local promise to the remote Presence, allowing the push("queued") to be sent out into the kernel towards "observer"- the remote vat receives push("queued")
I don't think I see which promise is changing decider from kernel to receiving vat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's my understanding of how decidership of p2 changes; please let me know if I've got anything wrong.
step1: pk = makePromiseKit();
pk.promise
is local to Alice
step2: p2 = E(pk.promise).push('queued');
syscall.send(target=pk.promise, method='push', args=['queued'], result=p2)
- Kernel becomes the decider of
p2
and notes that Alice is the decider ofpk.promise
step3: await E(observer).push(p2);
syscall.send(target=observer, method='push', args=[p2], result=<anonymous>)
- Kernel becomes the decider of
<anonymous>
and notes subscription by Alice - Alice loses agency
step3a
- Kernel handles the step2
syscall.send
by pushing onto the queue forpk.promise
step3b
- Kernel handles the step3
syscall.send
bydeliver(type='send', target=observer, method='push', args=[p2], result=<anonymous>)
and makingobserver
's vat the decider of<anonymous>
- observer does
syscall.resolve(subject=<anonymous>, resolution=…)
- observer loses agency
step3c
- Kernel handles the
syscall.resolve
, which involvesnotify(vatID=Alice, promise=<anonymous>)
to enqueue a message
step3d
- Kernel handles the notify by
deliver(type='notify', subject=<anonymous>, resolution=…)
- Alice regains agency
step4: pk.resolve(observer);
syscall.resolve(subject=pk.promise, resolution=observer)
- Alice loses agency
step4a
- Kernel handles the step4
syscall.resolve
, which involves noting resolution ofpk.promise
toobserver
andrequeueKernelPromise(pk.promise)
to requeue messages to it
step4b
- Kernel handles the requeued step2
syscall.send
bydeliver(type='send', target=observer, method='push', args=['queued'], result=p2)
and makingobserver
's vat the decider ofp2
(which it previously only knew from delivery arguments) and [hopefully!] also noting subscription by Alice top2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
step2: p2 = E(pk.promise).push('queued');
syscall.send(target=pk.promise, method='push', args=['queued'], result=p2)
Ah, ok since pk.promise
is local, this E() is handled entirely by HandledPromise, and liveslots doesn't see it at all, until/unless pk.promise
gets resolved to something that liveslots created.
(I'm betting that invalidates the rest of the sequence, so I didn't study it carefully yet, we can sync up on this in realtime and work out an example that demonstrates the problem, or maybe remove it from the docs.. I'm pretty sure I wrote the original to remind myself in the future, because it wasn't easy to reconstruct the problematic scenario, so it'd be nice to include at least a few hints)
@@ -1183,7 +1191,6 @@ function build( | |||
// upgrade, or if we acquire decider authority for a | |||
// previously-imported promise | |||
if (pRec) { | |||
// TODO: insist that we do not have decider authority for promiseID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, I'm not 100% sure this goes away.
Let's see, if we resolve a watched local promise, currently the kernel won't send the notify back into us, because our syscall.resolve
will delete the vpid from our c-list, which means the kernel's notify
event will find the c-list entry missing, so the kernel can't deliver a dispatch.notify
(it has no vpid to translate it into). So we won't observe that case.
(I think we would tolerate it anyways, if liveslots receives a notify for a promise whose vpid it doesn't recognize, it will ignore it. But I also think that case won't happen, at least not for these cancelled-by-upgrade promises)
If we watch a remotely-decided promise, the dispatch.notify
will tell us about a promise that we aren't the decider for, so the assertion implied by the comment would be satisfied.
If we're a new incarnation, and we just finished walking our saved WatchedPromises
table, and regenerated new Promise objects for everything there, we can pretend that the kernel is the decider for all of them. Some of them were decided by our predecessor, but we don't know that, and the kernel effectively claimed decidership when it shut down our predecessor.
Maybe just change it to "it is unlikely that we have decider authority for these: do some case analysis, and then maybe add an assertion".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does liveslots track decidership at all? It looks to me like this block handles a notify for any watched promise in the imported set regardless of how it got there, and that being in the imported set is a solid basis for proceeding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you're probably right. Our comment near the importedVPIDs
declaration claims that it only holds VPIDs for which the kernel is the decider.. if that's actually true, then this comment is stale (and a tautology).
I remember doing a lot of refactoring on this about a year ago, and maybe I introduced importedVPIDs
/exportedVPIDs
at that time, to clean up liveslot's approach to keeping track of decidership. I suspect it still isn't precisely correct (in the face of forwardings), but it's clearly trying to be.
So yeah, go ahead and delete that line. There's no better criteria for decidership than membership/not in importedVPIDs
, so if there's any cleanup or other cases to inspect, it won't be in this function, it'll be in the dispatch.deliver
that reacts to a previously-known result=
VPID, or in the code that does syscall.send
.
@@ -1424,7 +1431,7 @@ function build( | |||
Fail`buildRootObject() for vat ${forVatID} returned ${rootObject} with no interface`; | |||
// Need to load watched promises *after* buildRootObject() so that handler kindIDs | |||
// have a chance to be reassociated with their handlers. | |||
watchedPromiseManager.loadWatchedPromiseTable(); | |||
watchedPromiseManager.loadWatchedPromiseTable(unmeteredRevivePromise); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's our rationale for using unmetered revivePromise
? I'm sure there might be one, but each time this comes up I have to think carefully about what it might or might not be sensitive to, and I'd like to have a clear reason for each place where we disable metering. I think it's usually a question of whether the function's behavior might depend upon the WeakRefs in slotToVal
. If loadWatchedPromiseTable
is strictly always adding things to that table (there's no chance for a vpid to already exist there), then I suspect we don't need/want it to be unmetered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This preserves existing behavior, but I would like to leave enough breadcrumbs to avoid such doubt in the future.
The first line of revivePromise
asserts absence of metering, and then the rest proceeds to call pRec = makePipelinablePromise(vpid)
, store the vpid–pRec mapping in importedVPIDs
, register the vpid–promise mapping for the marshaller with registerValue
, and return the promise.
revivePromise
is called only by loadWatchedPromiseTable
, which initializes an entry for the vpid–promise pair in promiseRegistrations
and invokes pseudoThen
to register settlement callbacks.
What commentary would you like to see at this invocation of loadWatchedPromiseTable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, and yeah we don't have the time to dive any deeper than what you've just mapped out. Maybe just add "TODO: why does this need to be unmetered?", and we'll eventually do another pass through and revisit those decisions.
Datadog ReportBranch report: ✅ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly good, but let's see if we can remove that new vat-data dependency
packages/vat-data/src/exo-utils.js
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any problems with this refactoring, but if you adopt my request to avoid a liveslots->vat-data dependency, maybe it could go into a separate PR?
import { M } from '@agoric/store'; | ||
import { makePromiseKit } from '@endo/promise-kit'; | ||
// import { makeStoreUtils } from '../../vat-data/src/vat-data-bindings.js'; | ||
import { makeExoUtils } from '../../vat-data/src/exo-utils.js'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This import feels problematic. At the very least, let's change it to @agoric/vat-data/src/exo-util.js
(using ../..
is kinda cheating / assuming the monorepo layout, and I'm likely to be moving swingset-liveslots out to a separate repository, which will break ../../
)
At a deeper level, Exos are not a liveslots concern. Liveslots provides Kinds, and userspace can use @agoric/vat-data
to get Exos on top of that. We need to make sure that liveslot's defineKind
works when used by makeExo
stuff, but I think that should be the responsibility of vat-data
rather than liveslots.
And I'm worried about the coupling this induces (especially because of my upcoming #6596 work): currently swingset-liveslots has a devDependency
upon vat-data solely to get the types for DefineKindOptions
and DurableKindHandle
, and I think those types should be exported by liveslots rather than having liveslots import them from vat-data. With that change, we could shed the liveslots -> vat-data devDependency
entirely, which is going to help with the #6596 packaging effort (the build process for xsnap-supervisor
, which imports liveslots, can happen in a workspace/repo/node_modules
that doesn't have swingset at all, which would prune a cycle).
Could you have this test work with a Kind instead of an Exo, and remove the dependency on vat-data
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, I thought I had already done that. Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok everything looks good except for the vpid-tracking.md line 5 "vpid forwarding" example. Maybe just leave that line the way it was (tildot and all), and we come back to it later for a more complete analysis? I'm pretty sure the original wording expressed whatever weird case it was that I thought of years ago (even if I don't remember how that case worked anymore), and I'm not convinced that the new wording accomplishes that, and I don't want to hold this up just for the sake of updating the example.
cleanupAfterTerminatedVat() had a loop which identified all the promises (kpids) which were decided by a vat, so it could reject all of them as the vat was being terminated. We will have a need for this list of kpids during vat upgrade too, so refactor it out into a separate kernelKeeper method, and change terminateVat() to fetch the list before calling cleanupAfterTerminatedVat().
When upgrading a vat, we must reject all promises that were decided by the old incarnation, because none of them are durable (the new incarnation will be unable to resolve them). Previously, the vat's stopVat() delivery would do the rejection. This moves the responsibility to the kernel, to happen in the pause between the old version being shut down and the new version being launched. closes #6694
Increases clarity and convenience and sets up for future changes.
062317f
to
8dab80f
Compare
closes: #6694
Description
terminateVat
processUpgradeVat
Security Considerations
This is an overall increase in robustness against problems with dying vats.
Scaling Considerations
I don't believe there are any changes w.r.t. scaling.
Documentation Considerations
All changes would be internal, but we should still do a scan before merging.
Testing Considerations
No changes.