Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The SwingSet GC Plan (save RAM for unreachable objects) #3106

Closed
8 of 9 tasks
warner opened this issue May 16, 2021 · 1 comment
Closed
8 of 9 tasks

The SwingSet GC Plan (save RAM for unreachable objects) #3106

warner opened this issue May 16, 2021 · 1 comment
Assignees
Labels
enhancement New feature or request SwingSet package: SwingSet

Comments

@warner
Copy link
Member

warner commented May 16, 2021

This ticket aims to unify the complete set of SwingSet GC implementation stages, up to the point of significant RAM savings.

Stages:

At the end of this sequence, our RAM usage should drop considerably for our most common use cases. More savings is possible, as described at the end of this comment, but will be deferred until a later stage. KernelDB (disk) savings is deferred until a later stage.

Descriptions of the stages follow.

Previous Work / Background Reading

We've been designing GC for a long time. The following tickets are worth reading to understand our route to this current plan:

  • non-deterministic GC #1872 "non-deterministic GC": summarizes our original plan (which presumed identical behavior), then examines how to deal with each member of a consensus-based SwingSet (i.e. blockchain validator) observing different GC events. We mostly abandoned this plan: we now require each member to see mostly the same events in consensus-mode, although we take additional steps to make this more likely (force GC at end-of-crank, avoid processing WeakRefs or FinalizationRegistry callbacks until end-of-crank)
  • sufficiently-deterministic GC #2615 "sufficiently-deterministic GC": this was the new plan until we realized "unreachable but still recognizable" was a problem
  • vref-aware GC design (was: WeakMap + garbage-collected Presences = non-determinism) #2724 "WeakMap+GC = non-determinism": explains the need and solution for "reachable" vs "recognizable"

Several other tickets exist for various pieces of this task (but describe older approaches, so they should be updated as we repurpose them): #1126, #1968, #2106, #2243, #2646, #2660, #2664, #2870, #2993.

Land vref-aware WeakMap/WeakSet

When #2993 is landed, it will be safe to allow liveslots to drop a Presence and then introduce a replacement in a later crank, because the WeakMap/WeakSet replacements will still recognize it as the same key. This is both a correctness concern (Zoe unit tests failed with some sort of fundamental amnesia problem when I first added WeakRefs to liveslots, prompting me to add the "safety pins"), and a nondeterminism vulnerability (because otherwise userspace could use WeakMap/WeakSet to sense the Presence sometimes being dropped).

@FUDCo and I have talked a lot about whether the #2993 implementation is safe against adversarial code trying to sense the same thing, via counting invocations of the kind constructor, storing the .state object in a WeakSet, storing the initial self object (which does not yet have a vref attached) in a WeakSet, etc. The case analysis also needs to include the LRU cache, and making sure that the sequence of syscalls depends only upon userspace actions (object and property acccess) and cannot be influenced by uncontrolled GC actions. I'm not yet convinced the code is safe, but surviving adversarial code is a future milestone, so we'll track that in a separate ticket.

This vref-aware WeakMap/WeakSet needs to track Presences and Representatives by their vref, but it should track Remotables by their object identity (i.e. put them in the real WeakMap), because 1: there is only one Remotable for this vref, ever, and 2: the values should go away when the Remotable goes away. Presences and Representatives are different (there is only one JS object at a time, but that object might be released and then we'll get a new+different object for it later).

Build Reachability Manager

  • create src/kernel/reachabilityManager.js
  • liveSlots.js imports both virtualObjectManager.js and reachabilityManager.js

This is factors the "reachable? recognizable?" state machines into their own file, which will interact with liveslots.js and the virtual object manager. Liveslots will deliver the end-of-crank "dead set" to this manager, which will decide what syscall.dropImport-type events need to be emitted. The virtual object manager will be consulted to help figure out if a given vref is save to dropImport and/or retireImport. The VOM will deliver changes as well (e.g. when virtualized data is changed, causing refcount changes of some sort). dispatch.dropExport -type events will be delivered to this manager, which will tell liveslots and the VOM what to do.

VOM tracks Presence reachability in virtualized data, add-only

  • add const reachableVrefs = Set() to virtualObjectManager.js
  • update with all vrefs used in serialized state properties or makeWeakStore values
  • nothing is ever removed from it

When a Presence is collected, liveslots might be able to send a syscall.dropImport for it, but only if the vat cannot still reach the vref. "Reachable" means the vat has a way to generate a reference to the vref in the future: either because it has a Presence, or because some virtualized data (like a .state property of a virtual object, or the value of our current makeWeakStore instances, or the contents of some as-yet-undesigned virtual collection) holds the vref. Note that the vref merely being used as the key of a weak collection does not make it reachable, although it does make it recognizable (which is tracked in a different way).

So when the Presence is collected, liveslots needs to know whether or not the vref is still in use by some virtualized data. The full task requires proper refcounting, but we can make a good start by merely having the VOM keep a Set of all vrefs that have ever been used as the value of virtualized data. Each time the VOM serializes a .state property, it merges the resulting slots into this Set. It does the same each time a makeWeakStore gets an init or set request and the new/replacement value is serialized.

For our current codebase, we only have two users:

  • each Payment is a virtual object (payment.js, whose one state value is a Brand (which is a Remotable of the same vat)
  • the Payment is used as the key of a makeWeakStore named paymentLedger, whose value is an Amount (which has a Brand and a BigInt)

So at this point, we never put Presences into virtualized data, and we only put one Remotable (per Issuer) there. So the RAM cost of the Set(vref..) isn't a problem, nor is the DB cost of not being able to syscall.dropImport Presences-in-virtualized-data.

Remove liveslots safety pins, enable syscall.dropImport

  • remove liveslots.js const safetyPins = Set() and the code that adds Presences to it
    • maybe retain device nodes
    • maybe retain Promises (but continue to remove them when retired)
  • configure liveslots/reachability manager to emit syscall.dropImport

Once we can correctly determine that a vat can no longer generate an imported vref (i.e. there is not a Presence for it, and it does not appear in any virtualized data), we can safely inform the kernel with a syscall.dropImport.

This will reduce the RAM usage somewhat, because forgotten Presences can now be released. I don't expect it to save very much, though, because Presences don't have behavior or state, so apart from the HandledPromise machinery, they aren't keeping a lot of other data alive. But it's a good start.

VOM must retain Remotables used in virtualized data

If a vat creates a Remotable (e.g. Far(type, {})), stores it in virtualized data (.state of a virtual object, or value of a makeWeakStore instance), and then drops the Remotable, without ever having sent it off-vat, then we're in a situation where liveslots (slotToVal and the exportedRemotables Set) is not holding on to it, but the object is still supposed to be recoverable.

Since Remotables cannot be reconstructed on demand (that's what virtual objects and makeKind is for), we must hold on to them with a strong reference for as long as they might be reachable.

The full task is to have the VOM maintain a refcount for each Remotable. Every time a virtual object's .state is modified, we must update the refcount with the delta, covering all four quadrants of (old state slots includes vref, or not) times (new state includes vref or not). We must do the same for all values of all makeWeakStore instances. The VOM must then maintain a strong reference to the Remotable for any non-virtual exported vref whose refcount is nonzero. We'll probably use a (strong) Map for this, whose keys are vrefs, and whose values are { remotable, refcount }. If we delete the entry when the refcount goes to zero, the cardinality of this Map will match the cardinality of the Remotables we have to keep alive, which is probably the best we can do (as far as saving RAM). Therefore we don't need this table to be stored offline: as best we'd save one Number and a wrapper object per Remotable. If userspace wants to save RAM, use virtual objects or virtual collections, not a Remotable.

Note that capdata .slots might have duplicate entries, so the refcounting implementation should use Sets instead of simply iterating through .slots. Our marshal serialization, coupled with the convertValToSlot that liveslots provides, is complex enough that I don't want to rely upon it only creating unique slots in the array.

But since we're currently only ever including a single Remotable (a Brand) per Issuer, we don't need this full refcounting yet. So the immediate task is for the VOM to maintain a strong Set of every Remotable that was ever used in the .state of a virtual object, or as the value of a makeWeakStore instance. To do this, immediately after serialization, we must scan the resulting slots for vrefs that are 1: exported and 2: not virtual (i.e. not hierarchical). We then use slotToVal.get(slot).deref() to get the Remotable, and then add it to a reachableRemotables Set. We never remove anything from the Set at this stage.

This will make it safe to have liveslots drop its own reference to a Remotable when the kernel sends dispatch.dropExport.

Free Remotables upon dispatch.dropExport

  • liveslots notifies the Reachability Manager of dispatch.dropExport calls
  • the Reachability Manager tells liveslots to drop the export
  • liveslots uses slotToVal(slot).deref() to find the Remotable
  • liveslots does exportedRemotables.delete(remotable)
  • liveslots does not delete the slotToVal entry
    • if/when virtualized data is deserialized, slotToVal will be used to find the Remotable again
  • the valToSlot mapping is a WeakMap, and will go away on its own
  • eventually a finalizer might fire for the Remotable
    • the slotToVal entry should be deleted
    • the reachability manager should be informed
      • but it is currently a no-op, a future stage will change this
      • because it represents a potential change in recognizability status

When the vat receives dispatch.dropExport, the kernel is indicating that nothing on the kernel side can still reach an exported vref (either a Remotable or a virtual object). Note that the kernel can still recognize that vref: some other vat might have a WeakMap/WeakSet which used a Presence as the key, and we cannot forget the identity mapping until that recognizer goes away or the vref becomes un-emittable by the exporting vat. But we can still save RAM by dropping the exported Remotable, unless it could be kept alive by other means.

Once the VOM is holding (strongly) onto Remotables that are reachable via virtualized data, liveslots no longer needs to retain its own hold on the object.

So the task for liveslots to drop the exportedRemotables pin when the kernel says nothing in the kernel needs it anymore.

If the VOM is also holding onto the Remotable (i.e. it is still reachable internally, by the vat, just not by the kernel), then the Remotable will stay alive. The slotToVal mapping will remain, so if/when someone looks at the virtual object .state or makeWeakStore value, they will get back the original Remotable. The kernel won't send anything that needs slotToVal, because that's what dispatch.dropExport means. valToSlot is a WeakMap, and the entry will be kept alive by the Remotable, so if/when the vat re-sends the Remotable out to the kernel, liveslots will be able to translate it correctly (using the original vref, which will still be in the clist, because it might still be recognizable by the kernel even though it's no longer reachable by the kernel).

If the VOM is not keeping it alive, then eventually the Remotable will be collected. The finalizer will run for it, which will (in a later stage) notify the reachability manager, which then might then change the recognizability state. slotToVal should be deleted, just like for Presences. valToSlot will go away on its own.

We do not yet gain any RAM savings for Remotables at this stage, because the kernel is not yet paying attention to syscall.dropImport, nor is it emitting dispatch.dropExport. But at this point, the vat is now fully prepared to emit and act correctly upon such messages.

Implement kernel-side handling of dropImport, clist reachability flag

  • augment clists to include "reachable" flag on imports
  • kernel handles syscall.dropImport by clearing the flag

When a vat calls syscall.dropImport, it means that vat can no longer emit a reference to the given vref. It might still be able to recognize the vref if someone sends them a new copy (it might be a key in a vref-aware WeakMap or WeakSet), but the kernel shouldn't keep the original Remotable alive for the sake of this vat.

We must distinguish between this "unreachable" state, and the fully "unrecognizable" state. The clist entry must remain (thus preserving the identity of the object) until it becomes fully unrecognizable. Therefore we need an additional flag to separate these two states.

The tasks is to augment the kernel's clist tables (specifically kernelDB keys shaped like v$NN.c.$kernelSlot) to indicate whether the entry reflects reachable+recognizable, or unreachable+recognizable. We should do this with a simple 0 or 1 added to the beginning of the value for entries that represent imports (exports are always both reachable and recognizable). So when vatID v2 maps ko34 maps to o-56, and the import is in the reachable state, we'll have the following kernelDB entries:

  • v2.c.ko34 -> 1 o-56
  • v2.c.o-56 -> ko34

and when it moves to the unreachable+unrecognizable state, we'll see:

  • v2.c.ko34 -> 0 o-56
  • v2.c.o-56 -> ko34

When the clist entry is first created, the reachable flag is set to 1.

The other task is to implement kernel handling for syscall.dropImport. For now, it merely changes the 1 to 0. A later stage will act upon the transition.

Implement kernel-side refcounting for objects, handle dropImport, call dropExport

  • add reachability refcounts to kernel object table
  • update refcounts during clist changes, run-queue changes, and vat termination
    • syscall.dropImports clears the flag and triggers refcount checking
  • manage maybe-unreachable set during refcount decrements
  • process maybe-unreachable set after crank is finished
  • generate and execute dispatch.dropImports cranks

The kernel object table (indexed by kref) needs to be augmented to hold an inbound reference count (in the kernelDB). This count represents the number of sources that could reach the object (i.e. cause a message that carries the object as an argument). It does not (yet) capture the notion of "recognizability". A kref which is recognizable needs to retain its identity, so we cannot delete object table or clist entries merely because they become unreachable. Therefore, at this stage, it is a normal occurrence for the refcount to be zero. Most inactive entries will wind up with a zero refcount.

This count should be incremented by one for every:

  • clist which cites the kref next to an import vref (o-NN, not o+NN)
    • the vat which exports the object does not count: the arrow is pointing in the opposite direction
  • run-queue message whose target or argument slots cites the kref
  • resolved kernel promise table entry whose resolution data slots cites the kref
  • unresolved kernel promise table entry queued message whose argument slots cites the kref
    • it's easier to count each queued message separately, because messages will be added to the queue over time
  • (eventually) every kernel object table entry whose auxdata cites the kref
  • the construction of the bootstrap message should cause a reference to each vat's root object
  • the bootstrap vat's root object should probably be pinned in place by an artifical reference, maybe

When the kref is first allocated (during export, as a vat syscall is being translated from vrefs to krefs), the refcount is zero. But every syscall that can cause allocation will also cause the refcount to be immediately incremented by one:

  • a syscall.send which introduces an export in the arguments will cause a run-queue entry to be created
  • a syscall.resolve which exports something in the resolution data will cause the kernel promise table entry to moved to the resolved state, and its new capdata will have the kref in .slots
  • a syscall.invoke will translate the args into the vref space of the target device, adding them to the device's clist as an import
  • syscall.vatstoreSet is currently pure data: it contains strings which the vat knows to be vrefs, but the kernel does not interpret them at all
    • this might change some day, in which case the secondary storage might contain krefs, which would contribute to the refcount

When any of these references is removed, the refcount should be decremented. This will happen:

  • when kernel.step() has just pulled a message off the run-queue
  • when kernelSyscall.resolve() has removed queued messages
  • when kernelSyscall.dropImports() has marked a clist entry as unreachable
  • when a vat is terminated, deleting all clist entries

If the refcount reaches zero, the kref is added to an in-memory set of potentially-droppable krefs. We do not want to process this set immediately, because the decrement might be followed by an immediate increment, which is especially likely:

  • when kernel.step() has just pulled a message off the run-queue (causing a decrement), but has not yet translated the krefs into vrefs (causing an increment as the receiving vat's clist is updated with new imports)
  • when kernelSyscall.resolve() has removed queued messages (which may have slots in their arguments) from the newly-resolved promise table entry (causing a decrement), but has not yet added those messages to the run-queue (causing an increment)
  • (we might consider special-casing these pairs and avoid any refcount changes, but remember the kernelDB goes through the crank- and block- buffers, so any writes will not actually touch the DB yet, so there may not be much performance improvement to be made)

We do not want to manipulate shared tables too much while a syscall is still being processed. In addition, detecting cycles and traversing the reference graph may take a non-trivial amount of time, so we'd like to wait until all decrements have completed before we begin the process, and only do it once.

So the kernel will wait until the vat worker is done with the crank before processing the maybe-unreachable set. The crank's state changes may be committed first (i.e. the crank buffer is committed), without fear of losing progress, because the kernel will immediately proceed to maybe-unreachable processing before doing anything else, or allowing control to return to the host application. If the process crashes during this time (before the block buffer is committed), the next incarnation of the application will resume from a state that does not include the crank which provoked the decrements, so that crank will be re-executed and the GC work will begin anew.

After the crank is done, the kernel will sort the maybe-unreachable Set and then walk the krefs to see if their refcount remains zero. If so, it will locate the owning vat and add the kref to a per-vat Set of pending dispatch.dropExports to make. The kernel object table entry is not deleted (it may be remain recognizable). The fact that the refcount is zero means it is unreachable by the kernel, but its identity (and the clist mappings) must remain intact until it ceases to be recognizable by all vats as well.

(In the future, we will delete object table entries when they become unrecognizable. This will also free the #2069 auxdata, making the process recursive, and adding more items to the per-vat Sets.)

Once the walk is complete, the kernel will have a list of vats which need to be notified, and a Set of krefs for each.

The kernel walks this list in vatID order. For each vat, it prepares a dispatch.dropExports() message with the set of krefs, and translates all of them through the clist (all of which must before exports: o+NN, not o-NN). It does not delete the clist entries. After translation, we sort the vrefs inside the dropExports message, and perform an immediate dispatch.dropExports() crank to the given vat. This crank may make GC syscalls (dropImports, retireImports, retireExports), but no user-level code is executed, so it should not make any syscall.send or syscall.resolve. The crank is processed and committed as usual, then the kernel moves on to the next vatID. When all vats have received their drops, the kernel.step() function is complete, and it can move on to the next run-queue item (or finish, giving the host application an opportunity to end the block and commit everything to the DB for real).

In a later iteration, we may want to break up the GC work, by storing the maybe-unreachable set in the DB, and processing only a little bit at a time. If so, we must be prepared for krefs to be re-introduced (i.e. re-exported by the vat) in between the time we observe their refcount go to zero, and the time we actually notify the exporting vat. The concern that might prompt us to break GC up that way is a pathologically large amount of data being freed by a single syscall.dropImport or a vat being terminated, which might cause the GC cranks (which should not be metered, as they do not execute user-level code) to take a very long time and exceed our block-time budget.

Where That Gets Us

Once we implement all of the preceding steps:

  • unreferenced Presences will be removed from RAM
    • unless they were ever used in the .state of a virtual object or the value of a makeWeakStore instance
  • unreferenced Remotables will be removed from RAM
    • unless they were ever used in the .state of a virtual object or the value of a makeWeakStore instance
  • any objects referenced by a Remotable (closed over, or via a Remotable-keyed WeakMap) will be released from RAM

We won't get any RAM savings from:

  • Remotables held by a remote swingset, via the comms vat
  • Remotables that were ever used in virtualized state
  • Presences that were ever used in virtualized state
  • WeakMap values that were referenced by a Presence or Representative

and we won't get any disk savings at all, because we cannot delete database entries (including clists) until we implement recognizable-vs-unrecognizable state management.

After we accomplish all this, the next steps are:

  • implement reachability refcounting in the comms vat, to save RAM held by remote Presences
  • implement "unrecognizable" transition, to save disk space by deleting DB entries
  • implement "unrecognizable" management in the comms vat, to save disk space held by remote Presences
    • or by dead remotes, if we can declare such a thing
  • track recognizability within virtualized data, to save RAM and disk held by more intensive use of these types than we currently exercise
@warner warner added enhancement New feature or request SwingSet package: SwingSet labels May 16, 2021
@warner warner self-assigned this May 16, 2021
warner added a commit that referenced this issue May 21, 2021
Userspace might store a locally-created Remotable (e.g. `Far('iface',
{methods..})` in the `state` of a virtual object, or somewhere in the value
of a vref-keyed `makeWeakStore()` entry. In either case, the data is
virtualized: serialized and written to disk. This serialized form obviously
cannot keep the Remotable JS `Object` alive directly, however userspace
reasonably expects to get the Remotable back if it reads the `state` or does
a `.get` on the store.

To ensure the Remotable can be looked up from the serialized vref, the
virtual object manager must retain a strong reference to the original
Remotable for as long as its vref is present anywhere in the virtualized
data.

For now, we simply add the Remotable to a strong Set the first time it is
added, and we never remove it. This is safe, but conservative.

To do better (and eventually release the Remotable), we'll need to add some
form of refcount to each vref. When the refcount of the Remotable's vref
drops to zero, the VOM can drop its strong reference to the Remotable.

closes #3132
refs #3106
warner added a commit that referenced this issue May 21, 2021
Userspace might store a locally-created Remotable (e.g. `Far('iface',
{methods..})` in the `state` of a virtual object, or somewhere in the value
of a vref-keyed `makeWeakStore()` entry. In either case, the data is
virtualized: serialized and written to disk. This serialized form obviously
cannot keep the Remotable JS `Object` alive directly, however userspace
reasonably expects to get the Remotable back if it reads the `state` or does
a `.get` on the store.

To ensure the Remotable can be looked up from the serialized vref, the
virtual object manager must retain a strong reference to the original
Remotable for as long as its vref is present anywhere in the virtualized
data.

For now, we simply add the Remotable to a strong Set the first time it is
added, and we never remove it. This is safe, but conservative.

To do better (and eventually release the Remotable), we'll need to add some
form of refcount to each vref. When the refcount of the Remotable's vref
drops to zero, the VOM can drop its strong reference to the Remotable.

closes #3132
refs #3106
warner added a commit that referenced this issue May 21, 2021
Userspace might store a locally-created Remotable (e.g. `Far('iface',
{methods..})` in the `state` of a virtual object, or somewhere in the value
of a vref-keyed `makeWeakStore()` entry. In either case, the data is
virtualized: serialized and written to disk. This serialized form obviously
cannot keep the Remotable JS `Object` alive directly, however userspace
reasonably expects to get the Remotable back if it reads the `state` or does
a `.get` on the store.

To ensure the Remotable can be looked up from the serialized vref, the
virtual object manager must retain a strong reference to the original
Remotable for as long as its vref is present anywhere in the virtualized
data.

For now, we simply add the Remotable to a strong Set the first time it is
added, and we never remove it. This is safe, but conservative.

To do better (and eventually release the Remotable), we'll need to add some
form of refcount to each vref. When the refcount of the Remotable's vref
drops to zero, the VOM can drop its strong reference to the Remotable.

closes #3132
refs #3106
warner added a commit that referenced this issue May 22, 2021
If userspace puts a Presence into the `state` of a virtual object, or
somewhere inside the value stored in vref-keyed `makeWeakStore()` entry, it
gets serialized and stored as a vref, which doesn't (and should not) keep the
Presence object alive. Allowing this Presence to leave RAM, remembering only
the vref on disk, is a non-trivial part of the memory savings we obtain by
using virtualized data.

However, just because there is currently no Presence (for a given vref)
does *not* mean that the vat cannot reach the vref. Liveslots will observe
the Presence being collected (when the finalizer runs), but if the vref is
still stored somewhere in virtualized data, liveslots must not emit a
`syscall.dropImport` for it.

This changes the virtual object manager to keep track of Presences used in
virtualized data, and remember their vref in a Set. When liveslots' wants to
`dropImport` a vref that no longer has a Presence, it will ask the VOM first.
With this Set, the VOM can inhibit the `dropImport` call until later.

At this stage, we simply add the vref to a Set and never remove it. This is
safe but conservative. In the future, we'll need some form of refcounting to
detect when the vref is no longer mentioned anywhere in virtualized data. At
that point, the VOM will need to inform liveslots (or some sort of
"reachability manager") that the VOM no longer needs the vref kept alive. The
`syscall.dropImport` can be sent when neither the VOM nor a Presence is
causing the vref to remain reachable.

closes #3133
refs #3106
warner added a commit that referenced this issue May 24, 2021
If userspace puts a Presence into the `state` of a virtual object, or
somewhere inside the value stored in vref-keyed `makeWeakStore()` entry, it
gets serialized and stored as a vref, which doesn't (and should not) keep the
Presence object alive. Allowing this Presence to leave RAM, remembering only
the vref on disk, is a non-trivial part of the memory savings we obtain by
using virtualized data.

However, just because there is currently no Presence (for a given vref)
does *not* mean that the vat cannot reach the vref. Liveslots will observe
the Presence being collected (when the finalizer runs), but if the vref is
still stored somewhere in virtualized data, liveslots must not emit a
`syscall.dropImport` for it.

This changes the virtual object manager to keep track of Presences used in
virtualized data, and remember their vref in a Set. When liveslots' wants to
`dropImport` a vref that no longer has a Presence, it will ask the VOM first.
With this Set, the VOM can inhibit the `dropImport` call until later.

At this stage, we simply add the vref to a Set and never remove it. This is
safe but conservative. In the future, we'll need some form of refcounting to
detect when the vref is no longer mentioned anywhere in virtualized data. At
that point, the VOM will need to inform liveslots (or some sort of
"reachability manager") that the VOM no longer needs the vref kept alive. The
`syscall.dropImport` can be sent when neither the VOM nor a Presence is
causing the vref to remain reachable.

closes #3133
refs #3106
warner added a commit that referenced this issue May 25, 2021
If userspace puts a Presence into the `state` of a virtual object, or
somewhere inside the value stored in vref-keyed `makeWeakStore()` entry, it
gets serialized and stored as a vref, which doesn't (and should not) keep the
Presence object alive. Allowing this Presence to leave RAM, remembering only
the vref on disk, is a non-trivial part of the memory savings we obtain by
using virtualized data.

However, just because there is currently no Presence (for a given vref)
does *not* mean that the vat cannot reach the vref. Liveslots will observe
the Presence being collected (when the finalizer runs), but if the vref is
still stored somewhere in virtualized data, liveslots must not emit a
`syscall.dropImport` for it.

This changes the virtual object manager to keep track of Presences used in
virtualized data, and remember their vref in a Set. When liveslots' wants to
`dropImport` a vref that no longer has a Presence, it will ask the VOM first.
With this Set, the VOM can inhibit the `dropImport` call until later.

At this stage, we simply add the vref to a Set and never remove it. This is
safe but conservative. In the future, we'll need some form of refcounting to
detect when the vref is no longer mentioned anywhere in virtualized data. At
that point, the VOM will need to inform liveslots (or some sort of
"reachability manager") that the VOM no longer needs the vref kept alive. The
`syscall.dropImport` can be sent when neither the VOM nor a Presence is
causing the vref to remain reachable.

closes #3133
refs #3106
warner added a commit that referenced this issue May 25, 2021
This removes the liveslots 'safety pins' that have inhibited collection of
objects until we were ready. I believe the collection of `pendingPromises`,
`exportedRemotables`, `importedDevices`, and the VOM's `reachableRemotables`
should keep everything alive that needs to be.

refs #3106
warner added a commit that referenced this issue May 31, 2021
Adds docs/garbage-collection.md with a detailed description of the design.

There's still a lot of material to add, but this should capture a lot of the
terminology and concepts we've introduced to implement GC, plus most of the
algorithms involved. Required reading to understand how GC works in SwingSet.

refs #3106
warner added a commit that referenced this issue Jun 1, 2021
This makes the final step to activate swingset garbage collection (at least
pieces covered by #3106):

* implement kernel-side handling of the three GC syscalls (`dropImports`,
`retireImports`, `retireExports`), some of which happens during translation,
the remainder in `kernelSyscalls.js`
* implement kernel-side translators for the GC deliveries (`dropExports`,
`retireExports`, `retireImports`)
* populate the GC Actions set during `processRefcounts()`
* change `c.step()/run()` to execute any pending GC Action before consulting
the run-queue
* add test-gc-kernel.js to exercise basic checks

Also, a few miscellaneous error messages were improved.

closes #3109
refs #3106
warner added a commit that referenced this issue Jun 1, 2021
This makes the final step to activate swingset garbage collection (at least
pieces covered by #3106):

* implement kernel-side handling of the three GC syscalls (`dropImports`,
`retireImports`, `retireExports`), some of which happens during translation,
the remainder in `kernelSyscalls.js`
* implement kernel-side translators for the GC deliveries (`dropExports`,
`retireExports`, `retireImports`)
* populate the GC Actions set during `processRefcounts()`
* change `c.step()/run()` to execute any pending GC Action before consulting
the run-queue
* add test-gc-kernel.js to exercise basic checks

Also, a few miscellaneous error messages were improved.

closes #3109
refs #3106
warner added a commit that referenced this issue Jun 1, 2021
This makes the final step to activate swingset garbage collection (at least
pieces covered by #3106):

* implement kernel-side handling of the three GC syscalls (`dropImports`,
`retireImports`, `retireExports`), some of which happens during translation,
the remainder in `kernelSyscalls.js`
* implement kernel-side translators for the GC deliveries (`dropExports`,
`retireExports`, `retireImports`)
* populate the GC Actions set during `processRefcounts()`
* change `c.step()/run()` to execute any pending GC Action before consulting
the run-queue
* add test-gc-kernel.js to exercise basic checks

Also, a few miscellaneous error messages were improved.

closes #3109
refs #3106
warner added a commit that referenced this issue Jun 2, 2021
This makes the final step to activate swingset garbage collection (at least
pieces covered by #3106):

* implement kernel-side handling of the three GC syscalls (`dropImports`,
`retireImports`, `retireExports`), some of which happens during translation,
the remainder in `kernelSyscalls.js`
* implement kernel-side translators for the GC deliveries (`dropExports`,
`retireExports`, `retireImports`)
* populate the GC Actions set during `processRefcounts()`
* change `c.step()/run()` to execute any pending GC Action before consulting
the run-queue
* add test-gc-kernel.js to exercise basic checks

Also, a few miscellaneous error messages were improved.

closes #3109
refs #3106
warner added a commit that referenced this issue Jun 3, 2021
Adds docs/garbage-collection.md with a detailed description of the design.

There's still a lot of material to add, but this should capture a lot of the
terminology and concepts we've introduced to implement GC, plus most of the
algorithms involved. Required reading to understand how GC works in SwingSet.

refs #3106
warner added a commit that referenced this issue Jun 11, 2021
This makes a number of small cleanups in preparation for landing larger GC work.

* update docs/garbage-collection.md with a new algorithm
* add `slogFile` option to `buildVatController()`
* use `provideVatSlogger` within the slogger
* tolerate `policy='none'` in `queueToVatExport`
* log message tweaks
* enable `unregister()` in the liveslots FinalizationRegistry
* comment out GC debug prints in the comms vat

refs #3106
warner added a commit that referenced this issue Jun 14, 2021
Includes some PNG diagrams, and the OmniGraffle source for them.

refs #3106
warner added a commit that referenced this issue Jun 14, 2021
Includes some PNG diagrams, and the OmniGraffle source for them.

refs #3106
warner added a commit that referenced this issue Jun 14, 2021
implement dispatch.retireExports for Remotables

refs #3106
warner added a commit that referenced this issue Jun 14, 2021
Includes some PNG diagrams, and the OmniGraffle source for them.

refs #3106
warner added a commit that referenced this issue Jun 15, 2021
Includes some PNG diagrams, and the OmniGraffle source for them.

refs #3106
warner added a commit that referenced this issue Jun 16, 2021
implement kernel-side GC

refs #3106 
closes #2646 
closes #3109
@warner
Copy link
Member Author

warner commented Jun 16, 2021

#3298 landed the big changes to activate this feature. I charged ahead and implemented the reachable-vs-recognizable state management, so if all vats retire an import, the kernel should delete the object data from the kernel DB, and we should see a disk savings in addition to a RAM savings.

We still don't have comms support (#3306), and our virtualized-data handling is still entirely conservative (never releasing anything that's touched a virtual object).

So the next steps are:

  • implement reachability refcounting in the comms vat, to save RAM held by remote Presences
  • implement "unrecognizable" management in the comms vat, to save disk space held by remote Presences
    • or by dead remotes, if we can declare such a thing
  • track recognizability within virtualized data, to save RAM and disk held by more intensive use of these types than we currently exercise

@warner warner closed this as completed Jun 18, 2021
warner added a commit that referenced this issue Jun 21, 2021
This should remove some unnecessary work done by `processRefcounts` on
objects which have lost one reference, but not all of them.

refs #3106
warner added a commit that referenced this issue Jun 21, 2021
This should remove some unnecessary work done by `processRefcounts` on
objects which have lost one reference, but not all of them.

refs #3106
This was referenced Jun 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request SwingSet package: SwingSet
Projects
None yet
Development

No branches or pull requests

1 participant