Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed startVat won't fire criticalVat panic #9157

Open
warner opened this issue Mar 27, 2024 · 1 comment
Open

failed startVat won't fire criticalVat panic #9157

warner opened this issue Mar 27, 2024 · 1 comment
Labels
bug Something isn't working SwingSet package: SwingSet

Comments

@warner
Copy link
Member

warner commented Mar 27, 2024

While working on #8980, I noticed a problem in the code which is supposed to panic the kernel if a "critical" vat gets terminated. If such a vat gets terminated super-early, during buildRootObject(), then it won't trigger the panic.

This can only happen when a dynamic vat is marked as critical (requiring criticalVatKey) and fails its initial launch, perhaps because its buildRootObject() throws an Error.

The problem is in how we manage state. When a vat triggers a fatal termination-worthy error, it sets both crankResults.abort and crankResults.terminate. The .abort causes us to unwind the state changes back to the point where we've popped the message off the run-queue (in this case a create-vat message):

if (crankResults.abort) {
// Errors unwind any changes the vat made, by rolling back to the
// "deliver" savepoint In addition, the crankResults will either ask for
// the message to be consumed (without redelivery), or they'll ask for it
// to be attempted again (so it can go "splat" against a terminated vat,
// and give the sender the right error). In the latter case, we roll back
// to the "start" savepoint, established by `run()` or `step()` before the
// delivery was pulled off the run-queue, undoing the dequeueing.
kernelKeeper.rollbackCrank(
crankResults.consumeMessage ? 'deliver' : 'start',
);

then a moment later, the .terminate invokes terminateVat:

if (crankResults.abort) {
// Errors unwind any changes the vat made, by rolling back to the
// "deliver" savepoint In addition, the crankResults will either ask for
// the message to be consumed (without redelivery), or they'll ask for it
// to be attempted again (so it can go "splat" against a terminated vat,
// and give the sender the right error). In the latter case, we roll back
// to the "start" savepoint, established by `run()` or `step()` before the
// delivery was pulled off the run-queue, undoing the dequeueing.
kernelKeeper.rollbackCrank(
crankResults.consumeMessage ? 'deliver' : 'start',
);

which needs to retrieve the critical flag:

async function terminateVat(vatID, shouldReject, info) {
const vatKeeper = kernelKeeper.provideVatKeeper(vatID);
const critical = vatKeeper.getOptions().critical;

then deletes state (which only matter for later deliveries, after the vat has had a chance to create any promises or state):

if (kernelKeeper.vatIsAlive(vatID)) {
// Reject all promises decided by the vat, making sure to capture the list
// of kpids before that data is deleted.
const deadPromises = [...kernelKeeper.enumeratePromisesByDecider(vatID)];
kernelKeeper.cleanupAfterTerminatedVat(vatID);
for (const kpid of deadPromises) {
resolveToError(kpid, makeError('vat terminated'), vatID);
}
}

then uses critical to decide whether to panic the kernel or not :

if (critical) {
// The following error construction is a bit awkward, but (1) it saves us
// from doing unmarshaling while in the kernel, while (2) it protects
// against the info not actually encoding an error, but (3) it still
// provides some diagnostic information back to the host, and (4) this
// should happen rarely enough that if you have to do a little bit of
// extra interpretive work on the receiving end to diagnose the problem,
// it's going to be a small cost compared to the trouble you're probably
// already in anyway if this happens.
panic(`critical vat ${vatID} failed`, Error(info.body));
return;
}

The problem is that the critical flag is stored in vNN.options, which is part of the state that got unwound by the .abort flag. So when terminateVat calls vatKeeper.getOptions() to read out the .critical flag, we end up relying upon a fallback inside vatKeeper.getOptions that returns an empty options if the key was missing:

function getOptions() {
/** @type { RecordedVatOptions } */
const options = JSON.parse(kvStore.get(`${vatID}.options`) || '{}');
return harden(options);
}

So, in this case, we always get an empty options, which means .critical is undefined, which is falsy, which inhibits the panic.

Fixes

My vague plan is to add .criticalVat to the crankResults. At the beginning of createVat, we can populate this from the options that are getting stored by optionRecorder.recordDynamic(). At the beginning of other deliveries, we can populated it from a vatKeeper.getOptions().critical. Then the post-delivery code can refer to crankResults.criticalVat instead of querying the vatKeeper (whose backing state might no longer be present).

I'm not entirely happy with this approach, though, because it means an extra DB call for every delivery. We don't usually need to know whether a vat is critical or not, and the critical status doesn't change for the lifetime of the vat. So either a lazy lookup or something that gets cached (maybe in the vatKeeper, maybe in vat-warehouse) would be more satisfying.

@warner warner added bug Something isn't working SwingSet package: SwingSet labels Mar 27, 2024
@warner
Copy link
Member Author

warner commented Mar 27, 2024

Ah, I think I can avoid the unnecessary DB call. Most crankResults are built by the deliveryCrankResults() function: I'll have it only query getOptions() in the case that it is returning a truthy .terminate, and since it is called after delivery (but before anything gets unwound), it will get the right value even if the vat state is being deleted or unwound. Then, for processCreateVat (which builds its own crankResults for the creation phase), it can copy the .critical flag from the dynamicOptions that it recorded.

This means that deliveryCrankResults() will need to call provideVatKeeper(), but that should be cheap because all the callers will have already created a vatkeeper (to perform their deliveries).

warner added a commit that referenced this issue Mar 29, 2024
Previously, vat state initialization (e.g. setting counters to zero)
happened lazily, the first time that `provideVatKeeper()` was
called. When creating a new vat, the pattern was:

  vk = kernelKeeper.provideVatKeeper();
  vk.setSourceAndOptions(source, options);

Now, we initialize both counters and source/options explicitly, in
`recordVatOptions`, when the static/dynamic vat is first defined:

  kernelKeeper.createVatState(vatID, source, options);

In the future, this will make it easier for `provideVatKeeper` to rely
upon the presence of all vat state keys, especially `options`.

Previously, vatKeeper.getOptions() would tolerate a missing key by
returning empty options. The one place where this was
needed (terminating a barely-created vat because startVat() failed,
using getOptions().critical) was changed, so this tolerance is no
longer needed, and was removed. The tolerance caused bug #9157 (kernel
doesn't panic when it should), which continues to be present, but at
least won't cause a crash.

refs #8980
warner added a commit that referenced this issue Apr 9, 2024
Previously, vat state initialization (e.g. setting counters to zero)
happened lazily, the first time that `provideVatKeeper()` was
called. When creating a new vat, the pattern was:

  vk = kernelKeeper.provideVatKeeper();
  vk.setSourceAndOptions(source, options);

Now, we initialize both counters and source/options explicitly, in
`recordVatOptions`, when the static/dynamic vat is first defined:

  kernelKeeper.createVatState(vatID, source, options);

In the future, this will make it easier for `provideVatKeeper` to rely
upon the presence of all vat state keys, especially `options`.

Previously, vatKeeper.getOptions() would tolerate a missing key by
returning empty options. The one place where this was
needed (terminating a barely-created vat because startVat() failed,
using getOptions().critical) was changed, so this tolerance is no
longer needed, and was removed. The tolerance caused bug #9157 (kernel
doesn't panic when it should), which continues to be present, but at
least won't cause a crash.

refs #8980
warner added a commit that referenced this issue Aug 12, 2024
Previously, vat state initialization (e.g. setting counters to zero)
happened lazily, the first time that `provideVatKeeper()` was
called. When creating a new vat, the pattern was:

  vk = kernelKeeper.provideVatKeeper();
  vk.setSourceAndOptions(source, options);

Now, we initialize both counters and source/options explicitly, in
`recordVatOptions`, when the static/dynamic vat is first defined:

  kernelKeeper.createVatState(vatID, source, options);

In the future, this will make it easier for `provideVatKeeper` to rely
upon the presence of all vat state keys, especially `options`.

Previously, vatKeeper.getOptions() would tolerate a missing key by
returning empty options. The one place where this was
needed (terminating a barely-created vat because startVat() failed,
using getOptions().critical) was changed, so this tolerance is no
longer needed, and was removed. The tolerance caused bug #9157 (kernel
doesn't panic when it should), which continues to be present, but at
least won't cause a crash.

refs #8980
kriskowal pushed a commit that referenced this issue Aug 27, 2024
Previously, vat state initialization (e.g. setting counters to zero)
happened lazily, the first time that `provideVatKeeper()` was
called. When creating a new vat, the pattern was:

  vk = kernelKeeper.provideVatKeeper();
  vk.setSourceAndOptions(source, options);

Now, we initialize both counters and source/options explicitly, in
`recordVatOptions`, when the static/dynamic vat is first defined:

  kernelKeeper.createVatState(vatID, source, options);

In the future, this will make it easier for `provideVatKeeper` to rely
upon the presence of all vat state keys, especially `options`.

Previously, vatKeeper.getOptions() would tolerate a missing key by
returning empty options. The one place where this was
needed (terminating a barely-created vat because startVat() failed,
using getOptions().critical) was changed, so this tolerance is no
longer needed, and was removed. The tolerance caused bug #9157 (kernel
doesn't panic when it should), which continues to be present, but at
least won't cause a crash.

refs #8980
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working SwingSet package: SwingSet
Projects
None yet
Development

No branches or pull requests

1 participant