Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add controller.reapAllVats() #8626

Closed
warner opened this issue Dec 6, 2023 · 2 comments · Fixed by #8634
Closed

add controller.reapAllVats() #8626

warner opened this issue Dec 6, 2023 · 2 comments · Fixed by #8634
Assignees
Labels
enhancement New feature or request SwingSet package: SwingSet

Comments

@warner
Copy link
Member

warner commented Dec 6, 2023

What is the Problem Being Solved?

We might be able to survive remediating #8400 and #8401 in "one fell swoop", where we do a chain-halting upgrade, which upgrades the vats involved, to code whose buildRootObject deletes or clears the durable Sets and Maps that are holding the 100k+ objects alive. That will free a lot of objects, all of which will get GCed at the next BOYD event.

To ensure that this BOYD happens during the upgrade interval, and not some random number of blocks afterwards, we should have a way for the upgade code to request that the kernel forces a dispatch.bringOutYourDead to all vats. We can then follow this up with a controller.run() and flush out all the garbage. This may take a considerable amount of time (I'm roughly estimating ten minutes), but as long as this completes within the upgrade downtime for all validators, the process will be survivable (merely annoying). If for some reason, the large BOYD didn't happen during the downtime, then it would happen some unknown number of blocks later (depending upon how much activity those vats experienced), causing a surprising multi-minute chain stall. Worse, becomes some validator computers are faster than others, this stall would probably knock the slower ones out of the validator set.

Description of the Design

Add a controller.reapAllVats() method. This will enumerate all vats (static and dynamic) and perform a dispatch.bringOutYourDead() on each of them, or inject something into the run-queue to perform the same.

The BOYDs do not necessarily need to complete by the time reapAllVats() returns, but all the work should complete during the next full controller.run() call.

Security Considerations

This represents a performance hit, but it is only reachable by the host application (via controller), not anything inside a vat.

Scaling Considerations

This might take a long time, but better to stall at a known time than at a surprising one.

Test Plan

The unit test should create two or three vats, send them different numbers of messages (to get their deliveries-until-BOYD counters to different states), then perform the controller.reapAllVats() and a controller.run(). The test should somehow assert that the BOYDs happened (either snooping on the slogSender for type=='deliver' && kd[0]=='bringOutYourDead', or examining the swing-store transcript entries).

Upgrade Considerations

none

@warner warner added enhancement New feature or request SwingSet package: SwingSet labels Dec 6, 2023
FUDCo added a commit that referenced this issue Dec 7, 2023
FUDCo added a commit that referenced this issue Dec 8, 2023
@warner
Copy link
Member Author

warner commented Dec 8, 2023

Note for ourselves later: the kernel services the reapQueue before it looks at the run-queue. So our upgrade-time reaping process needs to do:

await controller.run();
controller.reapAllVats();
await controller.run();

to ensure that no GC-work-adding deliveries occur after the BOYDs are executed. The first controller.run() must be complete: it should not be limited by a runPolicy, otherwise there might still be runQueue items left over that won't happen until after the BOYDs.

@mhofman
Copy link
Member

mhofman commented Dec 8, 2023

Run after reap will likely free more objects. Should we execute reap a few times until nothing is found anymore?

Edit: in that case, we only need to run reap in the vats that have seen deliveries. In some of my early experiment, I plumbed an option for the last know position for each vat, so that reap "all" could reap only vats that had new deliveries.

FUDCo added a commit that referenced this issue Dec 11, 2023
FUDCo added a commit that referenced this issue Dec 13, 2023
@mergify mergify bot closed this as completed in #8634 Dec 13, 2023
anilhelvaci pushed a commit to Jorge-Lopes/agoric-sdk that referenced this issue Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request SwingSet package: SwingSet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants