Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add code to upgrade all upgradable vats during agoric-upgrade-12 #8219

Open
1 task done
Tracked by #8291 ...
warner opened this issue Aug 18, 2023 · 10 comments
Open
1 task done
Tracked by #8291 ...

add code to upgrade all upgradable vats during agoric-upgrade-12 #8219

warner opened this issue Aug 18, 2023 · 10 comments
Labels
enhancement New feature or request

Comments

@warner
Copy link
Member

warner commented Aug 18, 2023

What is the Problem Being Solved?

Ideally, at any given time, there is a branch in agoric-sdk which is as close as possible to the code running on the chain. We can't keep this up forever (eventually we'll be deploying code to the chain that didn't come from agoric-sdk), and we can't do it perfectly (not all vats are upgradable, so changes we make to e.g. the core-bootstrap mechanism in packages/vats/ cannot be deployed to the chain), and we can't represent non-vat-deployment state changes in our source tree. But the closer we can make those two environments, the more relevant unit tests and CI results against that branch will be.

Since master is where all our main work gets landed, we'd like to minimize the difference between the services available on mainnet, and the code present on master.

To that end, when we perform the agoric-upgrade-12 event, it would be great to upgrade all the vats that we possibly can, to whatever code is currently present on master.

#8215 is about writing a test of this event.

Description of the Design

I'm not sure, but I know it will require the following pieces in the upgrade handler:

  • after creating the kernel, run through a list of entrypoint pathnames (mostly the same as decentral-core-config.json and its bundles.*.sourceSpec properties)
    • for each one, use bundleSource() to create the a bundle from that entry point
    • then use controller.validateAndInstallBundle() to add it to the kernel's bundle table
      • (note that this will create export-data records, which need to be copied into the IAVL tree)
    • record the name/BundleID for later
  • then, after all bundles are installed, execute a CORE_EVAL snippet
    • this must iterate through all upgradable vats, in a specific pre-computed order
    • for each non-contract vat, do a E(adminFacet).upgrade() to the new bundle
    • somehow tell Zoe to use the new ZCF bundle
    • for each contract vat, tell Zoe to upgrade it, using both the new ZCF and the new contract bundle

To avoid interleaving of vat upgrade with work from the previous block, or work from the following block, we should follow @mhofman 's recommendations:

  • before starting on the upgrade, perform a controller.run() without a computron limit, to flush all old work
  • then inject the "do the upgrade" message
  • then perform another controller.run(), without a computron limit, so all the upgrade work can complete
  • only then sample the high/normal -priority actions queues, and timers

We should probably upgrade each vat in a separate controller.run(), so all other (subscribing) vats have a chance to react to the cancelled Promises of the vat being upgraded, without interleaving with their own upgrade.

Security Considerations

Only the usual security concerns about upgrading a huge amount of code at the same time.

Each upgraded vat will cancel (reject) all its outstanding promises. We think we've designed for this, but we haven't used it "in anger" before, so we should be concerned about whether subscribing vats will e.g. resubscribe properly.

Scaling Considerations

The upgrades may take a considerable amount of time. We should find a way to exercise this and provide validator guidance, however a chain-software upgrade is the ideal time for long execution like this: all validators are restarting at their own pace anyways.

Test Plan

#8215 is one part: using our docker-based deployment test to exercise the agoric-upgrade-12 handler, and then interacting with the upgraded system to see if it still works.

Upgrading lots of vats at the same time represents considerable risk: each vat's upgrade is an opportunity to discover something we missed, especially in their collective interaction. I really hope we can find a way to test this kind of upgrade against a clone of mainnet.

We should look carefully at the clone's slogfiles: look for vat termination or unexpected error messages happening during the upgrade process. Any such symptoms could mean serious breakage.

Upgrade Considerations

All of them.

Tasks

Preview Give feedback
  1. performance question wallet
    dckc warner
@dckc
Copy link
Member

dckc commented Aug 31, 2023

We should probably upgrade each vat in a separate controller.run() ...

Does the coreProposals mechanism do that? @michaelfig ?

@mhofman
Copy link
Member

mhofman commented Aug 31, 2023

#8287 arranges for multiple actions that are run to completion within a block.

If the upgrade can be arranged for multiple core proposals, each generating a different action (todo), then it would naturally fall off of this.

@michaelfig
Copy link
Member

The coreProposals mechanism doesn't impose any serial execution. The reason is that I assumed there might be interdependencies between the different proposals, in the same way there is between different behaviours.

I suspect it's most flexible (and less intrusive) to structure this using a coordinating promise space behavior to ensure sequential upgrades.

@warner, what was your reason for needing separate controller.run()s?

@mhofman
Copy link
Member

mhofman commented Sep 1, 2023

@michaelfig how hard would be it to support both? A single core proposal that arranges some sequential work using internal promises, and multiple core proposals inside an upgrade plan?

@mhofman
Copy link
Member

mhofman commented Sep 1, 2023

One motivation I can think of right now is to update the bridge vat first so that cosmic-swingset can get the results for subsequent core eval bridge messages.

@michaelfig
Copy link
Member

A single core proposal that arranges some sequential work using internal promises, and multiple core proposals inside an upgrade plan?

Am I missing something? I thought that's exactly what we have today.

@mhofman
Copy link
Member

mhofman commented Sep 10, 2023

Am I missing something? I thought that's exactly what we have today.

I thought an upgrade plan results in a single core eval. I would like a way to express an upgrade plan that results in multiple core eval. My requirement is being able to upgrade vat-bridge in a first core eval, then handle the subsequent core eval using the new vat bridge logic.

@michaelfig
Copy link
Member

Ah, I was confused because you had first said "multiple core proposals inside an upgrade plan", which we do have. "multiple core evals" is a different thing, and currently we squash the core proposals into a single core eval so that they run in parallel.

What I was trying to say is that if you want to schedule a number of core proposals to run sequentially, you can do that today without any changes to the underlying mechanism. Just create a core proposal that accepts options describing other proposals to schedule in a custom way.

I would rather use the existing machinery, which is flexible enough already, instead of grafting scheduling policy into a lower level than necessary.

@mhofman
Copy link
Member

mhofman commented Sep 10, 2023

The use case I have is unfortunately not compatible with parallel execution. Let's discuss.

@michaelfig
Copy link
Member

The use case I have...

upgrade each vat in a separate controller.run(), so all other (subscribing) vats have a chance to react to the cancelled Promises of the vat being upgraded, without interleaving with their own upgrade

Oh, now I see. Even if you linearise the individual upgrades, you can't be sure that their side-effects have run to completion unless you put them in a separate controller.run(). Makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants