Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

have kernel reject unresolved vat-decided promises of an upgraded vat #6694

Closed
warner opened this issue Dec 19, 2022 · 4 comments · Fixed by #7130
Closed

have kernel reject unresolved vat-decided promises of an upgraded vat #6694

warner opened this issue Dec 19, 2022 · 4 comments · Fixed by #7130
Assignees
Labels
enhancement New feature or request liveslots requires vat-upgrade to deploy changes SwingSet package: SwingSet vaults_triage DO NOT USE
Milestone

Comments

@warner
Copy link
Member

warner commented Dec 19, 2022

What is the Problem Being Solved?

As listed in #6650, when a vat is upgraded, it's stopVat is currently responsible for rejecting ("disconnecting") all of that vat's Promises (i.e. all the Promises for which the upgraded vat was the decider). We do this because we don't have durable pseudo-promises (#3787), so the V2 version of that vat cannot possibly have access to the resolve()/reject() facets on those promises, so there's no way for them to become resolved ever again. We reject these promises during upgrade, with a distinctive value (so clients can distinguish "owner upgraded" vs "owner deliberately rejected"), which allows clients of e.g. durable Notifiers in the upgraded vat to come back and ask for a new Promise from the new version.

As part of #6650, we're moving responsibility for this task from the old vat into the kernel. The kernel will reject these promises just after it shuts down the old worker, and just before it starts the new one.

Description of the Design

The kernel will walk the vat's c-list to find all promises known to the vat. These should all be unresolved, because we retire vpid/vrefs and c-list entries immediately upon resolution. For each kpid in the c-list, the kernel it will examine the kernel promise table to find the subset that are decided by the vat being upgraded. (In the future, perhaps as part of #6677 , we might use a better index to find this set of kpids more efficiently). The code to do this is already part of kernelKeeper.cleanupAfterTerminatedVat, as called by kernel.js terminateVat.

The kernel will reject these promises, using the same disconnectObject that we currently pass as an argument into stopVat (the one that contains upgradeMessage and the old incarnationNumber).

Inside the vat, we'll modify the Durable Promise Watcher:

  • when watching a promise, liveslots must syscall.subscribe() to the vpid, even if the promise is being exported at that time
    • the kernel must now allocate a kpid for the vpid in syscall.subscribe(), if we don't already have one. This might already work fine, but it's a new case that we need to check
    • we need this to ensure that the new vat version receives dispatch.notify() events for self-decided/exported promises from the old version. Previously these were tracked internally in the deadPromises vatstore key
  • remove prepareShutdownRejections from liveslots/watchedPromises.js, alone with the deadPromises table, and deadPromisesDO for the disconnect object
  • in loadWatchedPromiseTable, remove the reads of deadPromises/deadPromiseDO, and the code that handles promises which were in this table. Instead, all promises in the watchedPromiseTable will be revived, and then the kernel will deliver dispatch.notify() for the ones that were disconnected
  • change kernelQueue.js's doResolve() function to enqueue a notify() to all subscribers, not skipping the deciding vat: vats are allowed to subscribe to their own promises, and they'll be notified if/when those promises are resolved
    • liveslots already tolerates (ignores) a dispatch.notify() of a promise that's missing from the slotToVal table (which we added because a notify might be queued at the time of a vat upgrade, and the new version would not remember subscribing)
    • liveslots will not gratuitously subscribe to exported promises: it only subscribes when deserializing a new vpid (imported promise), or when an auxilliary vpid appears in a batched resolution, or (new) when a promise becomes Watched

Compatibility Considerations

In this new approach, self-exported promises will deliver their rejections (within the new vat version) slightly later than before. Previously, startVat sent these rejections, so they'll all be executed during the startVat delivery (after module load, after buildRootObject is called again, but still in the same crank, and before any new messages can be delivered). After this change, the rejections will be pushed onto the run-queue during upgrade, so they won't be delivered until after everything else currently on the run-queue is delivered.

Test Plan

Unit tests

@warner warner added enhancement New feature or request SwingSet package: SwingSet labels Dec 19, 2022
@warner warner self-assigned this Dec 19, 2022
@ivanlei ivanlei added the vaults_triage DO NOT USE label Dec 20, 2022
@warner warner added the liveslots requires vat-upgrade to deploy changes label Jan 24, 2023
warner added a commit that referenced this issue Feb 15, 2023
When upgrading a vat, we must reject all promises that were decided by
the old incarnation, because none of them are durable (the new
incarnation will be unable to resolve them). Previously, the vat's
stopVat() delivery would do the rejection. This moves the
responsibility to the kernel, to happen in the pause between the old
version being shut down and the new version being launched.

closes #6694
@warner
Copy link
Member Author

warner commented Feb 15, 2023

I have an incomplete branch for the work in https://github.com/Agoric/agoric-sdk/tree/6694-kernel-disconnect-promises . I think it's mostly done, but I haven't run the tests yet. We should look at SwingSet/test/upgrade/) and make sure it exercises the case of a promise getting rejected across an upgrade (I'm 90% sure it does, but double-checking is part of this task).

@warner warner assigned gibson042 and unassigned warner Feb 15, 2023
@gibson042
Copy link
Member

The changes of https://github.com/Agoric/agoric-sdk/tree/6694-kernel-disconnect-promises are causing difficult-to-diagnose vat upgrade test failures. I'm adding a new test to packages/SwingSet/test/upgrade/test-upgrade.js that uses bootstrap-relay.js with a new option to skip promise/remotable replacement and is much simpler than testUpgrade, keeping #6523 in mind but not promoting it to a blocker (especially since I wouldn't necessarily trust its mocking for the relevant functionality until we see that working in a real vat environment).

gibson042 pushed a commit that referenced this issue Mar 4, 2023
This makes room inside processUpgradeVat() for the new cleanup steps
that we want to add.

refs #6650
refs #7001
refs #6694
refs #6696
gibson042 pushed a commit that referenced this issue Mar 4, 2023
When upgrading a vat, we must reject all promises that were decided by
the old incarnation, because none of them are durable (the new
incarnation will be unable to resolve them). Previously, the vat's
stopVat() delivery would do the rejection. This moves the
responsibility to the kernel, to happen in the pause between the old
version being shut down and the new version being launched.

closes #6694
@gibson042
Copy link
Member

Down to 19 failing tests, and looking like mostly reordering of promise settlement. I'm going to try moving the the kernel takeover to after stopVat.

gibson042 pushed a commit that referenced this issue Mar 7, 2023
When upgrading a vat, we must reject all promises that were decided by
the old incarnation, because none of them are durable (the new
incarnation will be unable to resolve them). Previously, the vat's
stopVat() delivery would do the rejection. This moves the
responsibility to the kernel, to happen in the pause between the old
version being shut down and the new version being launched.

closes #6694
@gibson042
Copy link
Member

The last remaining gap affects local promises observed by durable promise watchers, which must now be exported to the kernel so it can reject them as part of upgrade. syscall.subscribe(kpid) should suffice, but so far isn't quite enough so I'm working though why. Current estimate stands.

gibson042 pushed a commit that referenced this issue Mar 16, 2023
This makes room inside processUpgradeVat() for the new cleanup steps
that we want to add.

refs #6650
refs #7001
refs #6694
refs #6696
gibson042 pushed a commit that referenced this issue Mar 16, 2023
When upgrading a vat, we must reject all promises that were decided by
the old incarnation, because none of them are durable (the new
incarnation will be unable to resolve them). Previously, the vat's
stopVat() delivery would do the rejection. This moves the
responsibility to the kernel, to happen in the pause between the old
version being shut down and the new version being launched.

closes #6694
@mergify mergify bot closed this as completed in #7130 Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request liveslots requires vat-upgrade to deploy changes SwingSet package: SwingSet vaults_triage DO NOT USE
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants