Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed vat upgrades should be rewound #5344

Closed
warner opened this issue May 11, 2022 · 0 comments · Fixed by #5730
Closed

failed vat upgrades should be rewound #5344

warner opened this issue May 11, 2022 · 0 comments · Fixed by #5730
Assignees
Labels
enhancement New feature or request SwingSet package: SwingSet
Milestone

Comments

@warner
Copy link
Member

warner commented May 11, 2022

What is the Problem Being Solved?

Currently, a successful vat upgrade works pretty well.

Upgrades fail if the new version's buildRootObject throws an exception (or returns a Promise that rejects or fails to resolve by the end of the crank), or it fails to defineDurableKind() for all Kinds inherited from its predecessor.

Currently, a failed upgrade is not handled correctly. What's supposed to happen is that the upgrade gets rewound, and we're left with the original version humming along, all its data still in place, ready to receive more messages. The caller of adminFacet~.upgrade() gets a rejected result promise, but all other changes are unwound.

Instead, I think we currently leave the vat in a halfway state, and we don't notify the caller correctly.

Description of the Design

The kernel's processUpgradeVat() function performs two deliveries: stopVat and startVat. We need to do abortCrank if either one fails. And we need to submit the failure message to vat-admin even though abortCrank has rewound the state changes. This should look a lot like when a vat is terminated for a bad syscall or meter underrun. We use a little side table to remember the state changes that need to be made after abortCrank unwinds everything.

Security Considerations

I don't know how confused the kernel can currently get in the face of a failed upgrade, but clearly we must make it impossible to corrupt anything kernel-side by intentionally upgrading to a broken version.

Test Plan

Unit tests.

@warner warner added enhancement New feature or request SwingSet package: SwingSet labels May 11, 2022
@warner warner self-assigned this May 11, 2022
@warner warner added this to the Mainnet 1 milestone May 11, 2022
@Tartuffo Tartuffo modified the milestone: Mainnet 1 May 11, 2022
warner added a commit that referenced this issue Jul 7, 2022
This commit modifies vat-upgrade to respond correctly to failures
during the upgrade process, by rewinding the vat back to it's
pre-upgrade state.

If a failure occurs either during the `stopVat` sent to the old
version, or during the `startVat` sent to the new version, the
consequences of both deliveries will be unwound, the 'upgrade-vat'
event popped from the delivery queue, and the worker abandoned. The
next delivery to this vat will cause a new worker to be created from
the previous state. The only state change that survives is a message
to vat-admin, instructing it to reject the
`E(adminFacet).upgradeVat()` result promise.

From the outside world, it will appear as if upgrade was never
requested, except that `upgradeVat()` rejects.

To implement this, I rewrote the way delivery results are handled in
kernel.js . `deliverAndLogToVat()` still returns `DeliveryStatus`, but
each `processXYZ` function (`processSend`, `processVatUpgrade`, etc)
is now responsible for transforming that `DeliveryStatus` into a new
`CrankResult`. This `CrankResult` tells `processDeliveryMessage` what
to do in a uniform way: abort the crankbuffer changes, terminate a
vat, deduct a meter, etc.

Most processor functions use a shared `deliveryCrankResults` helper
function to build their results, which takes flags to indicate whether
the delivery should deduct a meter, should decrement the
BOYD/reapCount counter, etc.

`processCreateVat` does its conversion in a different way, to merge
the results of the two deliveries (`stopVat`, `startVat`), and to
abandon the process in the middle if `stopVat` failed.

This will also make it trivial to begin charging a meter for the work
done during `startVat`, and to report this work to the
runPolicy. However it does not do so yet.

closes #5344
warner added a commit that referenced this issue Jul 7, 2022
This commit modifies vat-upgrade to respond correctly to failures
during the upgrade process, by rewinding the vat back to it's
pre-upgrade state.

If a failure occurs either during the `stopVat` sent to the old
version, or during the `startVat` sent to the new version, the
consequences of both deliveries will be unwound, the 'upgrade-vat'
event popped from the delivery queue, and the worker abandoned. The
next delivery to this vat will cause a new worker to be created from
the previous state. The only state change that survives is a message
to vat-admin, instructing it to reject the
`E(adminFacet).upgradeVat()` result promise.

From the outside world, it will appear as if upgrade was never
requested, except that `upgradeVat()` rejects.

To implement this, I rewrote the way delivery results are handled in
kernel.js . `deliverAndLogToVat()` still returns `DeliveryStatus`, but
each `processXYZ` function (`processSend`, `processVatUpgrade`, etc)
is now responsible for transforming that `DeliveryStatus` into a new
`CrankResult`. This `CrankResult` tells `processDeliveryMessage` what
to do in a uniform way: abort the crankbuffer changes, terminate a
vat, deduct a meter, etc.

Most processor functions use a shared `deliveryCrankResults` helper
function to build their results, which takes flags to indicate whether
the delivery should deduct a meter, should decrement the
BOYD/reapCount counter, etc.

`processCreateVat` does its conversion in a different way, to merge
the results of the two deliveries (`stopVat`, `startVat`), and to
abandon the process in the middle if `stopVat` failed.

This will also make it trivial to begin charging a meter for the work
done during `startVat`, and to report this work to the
runPolicy. However it does not do so yet.

closes #5344
warner added a commit that referenced this issue Jul 7, 2022
This commit modifies vat-upgrade to respond correctly to failures
during the upgrade process, by rewinding the vat back to it's
pre-upgrade state.

If a failure occurs either during the `stopVat` sent to the old
version, or during the `startVat` sent to the new version, the
consequences of both deliveries will be unwound, the 'upgrade-vat'
event popped from the delivery queue, and the worker abandoned. The
next delivery to this vat will cause a new worker to be created from
the previous state. The only state change that survives is a message
to vat-admin, instructing it to reject the
`E(adminFacet).upgradeVat()` result promise.

From the outside world, it will appear as if upgrade was never
requested, except that `upgradeVat()` rejects.

To implement this, I rewrote the way delivery results are handled in
kernel.js . `deliverAndLogToVat()` still returns `DeliveryStatus`, but
each `processXYZ` function (`processSend`, `processVatUpgrade`, etc)
is now responsible for transforming that `DeliveryStatus` into a new
`CrankResult`. This `CrankResult` tells `processDeliveryMessage` what
to do in a uniform way: abort the crankbuffer changes, terminate a
vat, deduct a meter, etc.

Most processor functions use a shared `deliveryCrankResults` helper
function to build their results, which takes flags to indicate whether
the delivery should deduct a meter, should decrement the
BOYD/reapCount counter, etc.

`processCreateVat` does its conversion in a different way, to merge
the results of the two deliveries (`stopVat`, `startVat`), and to
abandon the process in the middle if `stopVat` failed.

This will also make it trivial to begin charging a meter for the work
done during `startVat`, and to report this work to the
runPolicy. However it does not do so yet.

closes #5344
warner added a commit that referenced this issue Jul 7, 2022
This commit modifies vat-upgrade to respond correctly to failures
during the upgrade process, by rewinding the vat back to it's
pre-upgrade state.

If a failure occurs either during the `stopVat` sent to the old
version, or during the `startVat` sent to the new version, the
consequences of both deliveries will be unwound, the 'upgrade-vat'
event popped from the delivery queue, and the worker abandoned. The
next delivery to this vat will cause a new worker to be created from
the previous state. The only state change that survives is a message
to vat-admin, instructing it to reject the
`E(adminFacet).upgradeVat()` result promise.

From the outside world, it will appear as if upgrade was never
requested, except that `upgradeVat()` rejects.

To implement this, I rewrote the way delivery results are handled in
kernel.js . `deliverAndLogToVat()` still returns `DeliveryStatus`, but
each `processXYZ` function (`processSend`, `processVatUpgrade`, etc)
is now responsible for transforming that `DeliveryStatus` into a new
`CrankResult`. This `CrankResult` tells `processDeliveryMessage` what
to do in a uniform way: abort the crankbuffer changes, terminate a
vat, deduct a meter, etc.

Most processor functions use a shared `deliveryCrankResults` helper
function to build their results, which takes flags to indicate whether
the delivery should deduct a meter, should decrement the
BOYD/reapCount counter, etc.

`processCreateVat` does its conversion in a different way, to merge
the results of the two deliveries (`stopVat`, `startVat`), and to
abandon the process in the middle if `stopVat` failed.

This will also make it trivial to begin charging a meter for the work
done during `startVat`, and to report this work to the
runPolicy. However it does not do so yet.

closes #5344
warner added a commit that referenced this issue Jul 7, 2022
This commit modifies vat-upgrade to respond correctly to failures
during the upgrade process, by rewinding the vat back to it's
pre-upgrade state.

If a failure occurs either during the `stopVat` sent to the old
version, or during the `startVat` sent to the new version, the
consequences of both deliveries will be unwound, the 'upgrade-vat'
event popped from the delivery queue, and the worker abandoned. The
next delivery to this vat will cause a new worker to be created from
the previous state. The only state change that survives is a message
to vat-admin, instructing it to reject the
`E(adminFacet).upgradeVat()` result promise.

From the outside world, it will appear as if upgrade was never
requested, except that `upgradeVat()` rejects.

To implement this, I rewrote the way delivery results are handled in
kernel.js . `deliverAndLogToVat()` still returns `DeliveryStatus`, but
each `processXYZ` function (`processSend`, `processVatUpgrade`, etc)
is now responsible for transforming that `DeliveryStatus` into a new
`CrankResult`. This `CrankResult` tells `processDeliveryMessage` what
to do in a uniform way: abort the crankbuffer changes, terminate a
vat, deduct a meter, etc.

Most processor functions use a shared `deliveryCrankResults` helper
function to build their results, which takes flags to indicate whether
the delivery should deduct a meter, should decrement the
BOYD/reapCount counter, etc.

`processCreateVat` does its conversion in a different way, to merge
the results of the two deliveries (`stopVat`, `startVat`), and to
abandon the process in the middle if `stopVat` failed.

This will also make it trivial to begin charging a meter for the work
done during `startVat`, and to report this work to the
runPolicy. However it does not do so yet.

closes #5344
warner added a commit that referenced this issue Jul 9, 2022
This commit modifies vat-upgrade to respond correctly to failures
during the upgrade process, by rewinding the vat back to it's
pre-upgrade state.

If a failure occurs either during the `stopVat` sent to the old
version, or during the `startVat` sent to the new version, the
consequences of both deliveries will be unwound, the 'upgrade-vat'
event popped from the delivery queue, and the worker abandoned. The
next delivery to this vat will cause a new worker to be created from
the previous state. The only state change that survives is a message
to vat-admin, instructing it to reject the
`E(adminFacet).upgradeVat()` result promise.

From the outside world, it will appear as if upgrade was never
requested, except that `upgradeVat()` rejects.

To implement this, I rewrote the way delivery results are handled in
kernel.js . `deliverAndLogToVat()` still returns `DeliveryStatus`, but
each `processXYZ` function (`processSend`, `processVatUpgrade`, etc)
is now responsible for transforming that `DeliveryStatus` into a new
`CrankResult`. This `CrankResult` tells `processDeliveryMessage` what
to do in a uniform way: abort the crankbuffer changes, terminate a
vat, deduct a meter, etc.

Most processor functions use a shared `deliveryCrankResults` helper
function to build their results, which takes flags to indicate whether
the delivery should deduct a meter, should decrement the
BOYD/reapCount counter, etc.

`processCreateVat` does its conversion in a different way, to merge
the results of the two deliveries (`stopVat`, `startVat`), and to
abandon the process in the middle if `stopVat` failed.

This will also make it trivial to begin charging a meter for the work
done during `startVat`, and to report this work to the
runPolicy. However it does not do so yet.

closes #5344
warner added a commit that referenced this issue Jul 12, 2022
This commit modifies vat-upgrade to respond correctly to failures
during the upgrade process, by rewinding the vat back to it's
pre-upgrade state.

If a failure occurs either during the `stopVat` sent to the old
version, or during the `startVat` sent to the new version, the
consequences of both deliveries will be unwound, the 'upgrade-vat'
event popped from the delivery queue, and the worker abandoned. The
next delivery to this vat will cause a new worker to be created from
the previous state. The only state change that survives is a message
to vat-admin, instructing it to reject the
`E(adminFacet).upgradeVat()` result promise.

From the outside world, it will appear as if upgrade was never
requested, except that `upgradeVat()` rejects.

To implement this, I rewrote the way delivery results are handled in
kernel.js . `deliverAndLogToVat()` still returns `DeliveryStatus`, but
each `processXYZ` function (`processSend`, `processVatUpgrade`, etc)
is now responsible for transforming that `DeliveryStatus` into a new
`CrankResult`. This `CrankResult` tells `processDeliveryMessage` what
to do in a uniform way: abort the crankbuffer changes, terminate a
vat, deduct a meter, etc.

Most processor functions use a shared `deliveryCrankResults` helper
function to build their results, which takes flags to indicate whether
the delivery should deduct a meter, should decrement the
BOYD/reapCount counter, etc.

`processCreateVat` does its conversion in a different way, to merge
the results of the two deliveries (`stopVat`, `startVat`), and to
abandon the process in the middle if `stopVat` failed.

This will also make it trivial to begin charging a meter for the work
done during `startVat`, and to report this work to the
runPolicy. However it does not do so yet.

closes #5344
warner added a commit that referenced this issue Jul 12, 2022
This commit modifies vat-upgrade to respond correctly to failures
during the upgrade process, by rewinding the vat back to it's
pre-upgrade state.

If a failure occurs either during the `stopVat` sent to the old
version, or during the `startVat` sent to the new version, the
consequences of both deliveries will be unwound, the 'upgrade-vat'
event popped from the delivery queue, and the worker abandoned. The
next delivery to this vat will cause a new worker to be created from
the previous state. The only state change that survives is a message
to vat-admin, instructing it to reject the
`E(adminFacet).upgradeVat()` result promise.

From the outside world, it will appear as if upgrade was never
requested, except that `upgradeVat()` rejects.

To implement this, I rewrote the way delivery results are handled in
kernel.js . `deliverAndLogToVat()` still returns `DeliveryStatus`, but
each `processXYZ` function (`processSend`, `processVatUpgrade`, etc)
is now responsible for transforming that `DeliveryStatus` into a new
`CrankResult`. This `CrankResult` tells `processDeliveryMessage` what
to do in a uniform way: abort the crankbuffer changes, terminate a
vat, deduct a meter, etc.

Most processor functions use a shared `deliveryCrankResults` helper
function to build their results, which takes flags to indicate whether
the delivery should deduct a meter, should decrement the
BOYD/reapCount counter, etc.

`processCreateVat` does its conversion in a different way, to merge
the results of the two deliveries (`stopVat`, `startVat`), and to
abandon the process in the middle if `stopVat` failed.

This will also make it trivial to begin charging a meter for the work
done during `startVat`, and to report this work to the
runPolicy. However it does not do so yet.

closes #5344
warner added a commit that referenced this issue Jul 12, 2022
This commit modifies vat-upgrade to respond correctly to failures
during the upgrade process, by rewinding the vat back to it's
pre-upgrade state.

If a failure occurs either during the `stopVat` sent to the old
version, or during the `startVat` sent to the new version, the
consequences of both deliveries will be unwound, the 'upgrade-vat'
event popped from the delivery queue, and the worker abandoned. The
next delivery to this vat will cause a new worker to be created from
the previous state. The only state change that survives is a message
to vat-admin, instructing it to reject the
`E(adminFacet).upgradeVat()` result promise.

From the outside world, it will appear as if upgrade was never
requested, except that `upgradeVat()` rejects.

To implement this, I rewrote the way delivery results are handled in
kernel.js . `deliverAndLogToVat()` still returns `DeliveryStatus`, but
each `processXYZ` function (`processSend`, `processVatUpgrade`, etc)
is now responsible for transforming that `DeliveryStatus` into a new
`CrankResult`. This `CrankResult` tells `processDeliveryMessage` what
to do in a uniform way: abort the crankbuffer changes, terminate a
vat, deduct a meter, etc.

Most processor functions use a shared `deliveryCrankResults` helper
function to build their results, which takes flags to indicate whether
the delivery should deduct a meter, should decrement the
BOYD/reapCount counter, etc.

`processCreateVat` does its conversion in a different way, to merge
the results of the two deliveries (`stopVat`, `startVat`), and to
abandon the process in the middle if `stopVat` failed.

This will also make it trivial to begin charging a meter for the work
done during `startVat`, and to report this work to the
runPolicy. However it does not do so yet.

closes #5344
warner added a commit that referenced this issue Jul 13, 2022
This commit modifies vat-upgrade to respond correctly to failures
during the upgrade process, by rewinding the vat back to it's
pre-upgrade state.

If a failure occurs either during the `stopVat` sent to the old
version, or during the `startVat` sent to the new version, the
consequences of both deliveries will be unwound, the 'upgrade-vat'
event popped from the delivery queue, and the worker abandoned. The
next delivery to this vat will cause a new worker to be created from
the previous state. The only state change that survives is a message
to vat-admin, instructing it to reject the
`E(adminFacet).upgradeVat()` result promise.

From the outside world, it will appear as if upgrade was never
requested, except that `upgradeVat()` rejects.

To implement this, I rewrote the way delivery results are handled in
kernel.js . `deliverAndLogToVat()` still returns `DeliveryStatus`, but
each `processXYZ` function (`processSend`, `processVatUpgrade`, etc)
is now responsible for transforming that `DeliveryStatus` into a new
`CrankResult`. This `CrankResult` tells `processDeliveryMessage` what
to do in a uniform way: abort the crankbuffer changes, terminate a
vat, deduct a meter, etc.

Most processor functions use a shared `deliveryCrankResults` helper
function to build their results, which takes flags to indicate whether
the delivery should deduct a meter, should decrement the
BOYD/reapCount counter, etc.

`processCreateVat` does its conversion in a different way, to merge
the results of the two deliveries (`stopVat`, `startVat`), and to
abandon the process in the middle if `stopVat` failed.

This will also make it trivial to begin charging a meter for the work
done during `startVat`, and to report this work to the
runPolicy. However it does not do so yet.

closes #5344
warner added a commit that referenced this issue Jul 14, 2022
This commit modifies vat-upgrade to respond correctly to failures
during the upgrade process, by rewinding the vat back to it's
pre-upgrade state.

If a failure occurs either during the `stopVat` sent to the old
version, or during the `startVat` sent to the new version, the
consequences of both deliveries will be unwound, the 'upgrade-vat'
event popped from the delivery queue, and the worker abandoned. The
next delivery to this vat will cause a new worker to be created from
the previous state. The only state change that survives is a message
to vat-admin, instructing it to reject the
`E(adminFacet).upgradeVat()` result promise.

From the outside world, it will appear as if upgrade was never
requested, except that `upgradeVat()` rejects.

To implement this, I rewrote the way delivery results are handled in
kernel.js . `deliverAndLogToVat()` still returns `DeliveryStatus`, but
each `processXYZ` function (`processSend`, `processVatUpgrade`, etc)
is now responsible for transforming that `DeliveryStatus` into a new
`CrankResult`. This `CrankResult` tells `processDeliveryMessage` what
to do in a uniform way: abort the crankbuffer changes, terminate a
vat, deduct a meter, etc.

Most processor functions use a shared `deliveryCrankResults` helper
function to build their results, which takes flags to indicate whether
the delivery should deduct a meter, should decrement the
BOYD/reapCount counter, etc.

`processCreateVat` does its conversion in a different way, to merge
the results of the two deliveries (`stopVat`, `startVat`), and to
abandon the process in the middle if `stopVat` failed.

This will also make it trivial to begin charging a meter for the work
done during `startVat`, and to report this work to the
runPolicy. However it does not do so yet.

closes #5344
@mergify mergify bot closed this as completed in #5730 Jul 14, 2022
turadg pushed a commit that referenced this issue Jul 14, 2022
This commit modifies vat-upgrade to respond correctly to failures
during the upgrade process, by rewinding the vat back to it's
pre-upgrade state.

If a failure occurs either during the `stopVat` sent to the old
version, or during the `startVat` sent to the new version, the
consequences of both deliveries will be unwound, the 'upgrade-vat'
event popped from the delivery queue, and the worker abandoned. The
next delivery to this vat will cause a new worker to be created from
the previous state. The only state change that survives is a message
to vat-admin, instructing it to reject the
`E(adminFacet).upgradeVat()` result promise.

From the outside world, it will appear as if upgrade was never
requested, except that `upgradeVat()` rejects.

To implement this, I rewrote the way delivery results are handled in
kernel.js . `deliverAndLogToVat()` still returns `DeliveryStatus`, but
each `processXYZ` function (`processSend`, `processVatUpgrade`, etc)
is now responsible for transforming that `DeliveryStatus` into a new
`CrankResult`. This `CrankResult` tells `processDeliveryMessage` what
to do in a uniform way: abort the crankbuffer changes, terminate a
vat, deduct a meter, etc.

Most processor functions use a shared `deliveryCrankResults` helper
function to build their results, which takes flags to indicate whether
the delivery should deduct a meter, should decrement the
BOYD/reapCount counter, etc.

`processCreateVat` does its conversion in a different way, to merge
the results of the two deliveries (`stopVat`, `startVat`), and to
abandon the process in the middle if `stopVat` failed.

This will also make it trivial to begin charging a meter for the work
done during `startVat`, and to report this work to the
runPolicy. However it does not do so yet.

closes #5344
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request SwingSet package: SwingSet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants