-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
using Remotable as weak collection key causes value to be pinned forever #9956
Comments
I've added a failing test in https://github.com/Agoric/agoric-sdk/tree/warner/9956-remotable-weak-key-leak |
The remediation will be for a BOYD call to walk all weak collections, scanning their ordinal table for Liveslots knows exactly which KindIDs are for weak collections (there are only four of them), so the scan will start with their ordinal table (prefix I think this will count all the occurrences:
but it's showing zero hits for a recent mainnet DB snapshot. That's entirely possible, we mostly use durable collections, not merely-virtual ones, so we aren't in the habit of using non-durables as keys. The remediation should also set a flag in the vatstore (maybe
The BOYD should not perform the remediation if the version is high enough. Newly initialized vats should start with the higher version (and never do remediation). My fix for #9939 will probably add code to |
Oh that
so for purposes of estimating remediation work, we could just look at 3/5/7/9. |
dropping a Remotable now checks for local recognizers, in preparation for fixing #9956
The symptoms of this one overlap with #8756 : the key might get retained because deleting the collection didn't work, or the value might get retained because it's Remotable and the collection is weak and the key got retired and we don't have enough recognition records to know which collection to delete it from. |
Liveslots was suffering from the following bugs: * vrefs weren't properly encoded into DB keys and "encodePassable" keys when deleting collections (such that `isEncodedRemotable()` always answered false) * recognition records (vom.ir. keys) weren't created or removed for Remotable-style keys in weak collections * Remotable-style vrefs were not submitted for retirement processing when deleting a weak collection As a result: * Invoking the `.clear()` method on a virtual/durable collection, or dereferencing the collection itself (and allowing it to be garbage collected), did not free any Remotables, virtual/durable objects (Representatives), or imported Presences used as keys. This could cause data in other weak collections (or other vats) to be retained longer than necessary. * Allowing a weak virtual/durable collection to be garbage collected did not inform the kernel that the vat can no longer recognize the Remotable/Representative/Presence-style keys, consuming c-list entries longer than necessary. * `collection.clear()` left the "ordinal-assignment"` records in the vatstore, causing subsequent .has() to incorrectly return true, and .init() to throw a "already registered" error, as well as consuming DB space longer than necessary. * retiring (deleting) a Remotable used as a key in a weak virtual/durable collection did not free the corresponding value, causing data (in local and/or remote vats) to be retained longer than necessary This commit fixes those bugs, which fixes the immediate cause of issues #8756 , #7355 , and #9956 . As a change to liveslots, full deployment requires both restarting the kernel with this new code, *and* triggering a vat upgrade of all vats. Once that is done, no new collection entries will suffer the problems listed above. However, this commit makes no attempt to remediate any existing data corruption. See #8759 for plans to build a tool that can audit the virtual-data reference graph and detect the consequences of these bugs in pre-existing kernel databases, and see the issues listed above for notes on efforts to build remediation tools.
Liveslots was suffering from the following bugs: * vrefs weren't properly encoded into DB keys and "encodePassable" keys when deleting collections (such that `isEncodedRemotable()` always answered false, which skipped the refcount processing of keys, and removal of the "ordinal-assignment" record) * recognition records (vom.ir. keys) weren't created or removed for Remotable-style keys in weak collections * Presence-style vrefs were not submitted for retirement processing when deleting a weak collection As a result: * `collection.clear()` left the "ordinal-assignment"` records in the vatstore, causing subsequent .has(oldkey) to incorrectly return true, and .init(oldkey, newvalue) to throw a "already registered" error, as well as consuming DB space longer than necessary. * Invoking the `.clear()` method on a virtual/durable collection, or dereferencing the collection itself (and allowing it to be garbage collected), did not free any Remotables, virtual/durable objects (Representatives), or imported Presences used as keys. This could cause data in other weak collections (or other vats) to be retained longer than necessary. * retiring (deleting) a Remotable used as a key in a weak virtual/durable collection did not free the corresponding value, causing data (in local and/or remote vats) to be retained longer than necessary * Allowing a weak virtual/durable collection to be garbage collected did not inform the kernel that the vat can no longer recognize the Presence-style keys, consuming c-list entries longer than necessary. This commit fixes those bugs, which fixes the immediate cause of issues #8756 , #7355 , and #9956 . As a change to liveslots, full deployment requires both restarting the kernel with this new code, *and* triggering a vat upgrade of all vats. Once that is done, no new collection entries will suffer the problems listed above. However, this commit makes no attempt to remediate any existing data corruption. See #8759 for plans to build a tool that can audit the virtual-data reference graph and detect the consequences of these bugs in pre-existing kernel databases, and see the issues listed above for notes on efforts to build remediation tools.
Liveslots was suffering from the following bugs: * vrefs weren't properly encoded into DB keys and "encodePassable" keys when deleting collections (such that `isEncodedRemotable()` always answered false, which skipped the refcount processing of keys, and removal of the "ordinal-assignment" record) * recognition records (vom.ir. keys) weren't created or removed for Remotable-style keys in weak collections * Presence-style vrefs were not submitted for retirement processing when deleting a weak collection As a result: * `collection.clear()` left the "ordinal-assignment"` records in the vatstore, causing subsequent .has(oldkey) to incorrectly return true, and .init(oldkey, newvalue) to throw a "already registered" error, as well as consuming DB space longer than necessary. * Invoking the `.clear()` method on a virtual/durable collection, or dereferencing the collection itself (and allowing it to be garbage collected), did not free any Remotables, virtual/durable objects (Representatives), or imported Presences used as keys. This could cause data in other weak collections (or other vats) to be retained longer than necessary. * retiring (deleting) a Remotable used as a key in a weak virtual/durable collection did not free the corresponding value, causing data (in local and/or remote vats) to be retained longer than necessary * Allowing a weak virtual/durable collection to be garbage collected did not inform the kernel that the vat can no longer recognize the Presence-style keys, consuming c-list entries longer than necessary. This commit fixes those bugs, which fixes the immediate cause of: * fixes #8756 * fixes #7355 * fixes #9956 As a change to liveslots, full deployment requires both restarting the kernel with this new code, *and* triggering a vat upgrade of all vats. Once that is done, no new collection entries will suffer the problems listed above. However, this commit makes no attempt to remediate any existing data corruption. See #8759 for plans to build a tool that can audit the virtual-data reference graph and detect the consequences of these bugs in pre-existing kernel databases, and see the issues listed above for notes on efforts to build remediation tools. This commit marks the new tests as expected to pass again. It adds one new (failing) test to demonstrate the lack of remediation code. Thanks to @gibson042 for recommendations.
Liveslots was suffering from the following bugs: * vrefs weren't properly encoded into DB keys and "encodePassable" keys when deleting collections (such that `isEncodedRemotable()` always answered false, which skipped the refcount processing of keys, and removal of the "ordinal-assignment" record) * recognition records (vom.ir. keys) weren't created or removed for Remotable-style keys in weak collections * Presence-style vrefs were not submitted for retirement processing when deleting a weak collection As a result: * `collection.clear()` left the "ordinal-assignment"` records in the vatstore, causing subsequent .has(oldkey) to incorrectly return true, and .init(oldkey, newvalue) to throw a "already registered" error, as well as consuming DB space longer than necessary. * Invoking the `.clear()` method on a virtual/durable collection, or dereferencing the collection itself (and allowing it to be garbage collected), did not free any Remotables, virtual/durable objects (Representatives), or imported Presences used as keys. This could cause data in other weak collections (or other vats) to be retained longer than necessary. * retiring (deleting) a Remotable used as a key in a weak virtual/durable collection did not free the corresponding value, causing data (in local and/or remote vats) to be retained longer than necessary * Allowing a weak virtual/durable collection to be garbage collected did not inform the kernel that the vat can no longer recognize the Presence-style keys, consuming c-list entries longer than necessary. This commit fixes those bugs, which fixes the immediate cause of: * fixes #8756 * fixes #7355 * fixes #9956 As a change to liveslots, full deployment requires both restarting the kernel with this new code, *and* triggering a vat upgrade of all vats. Once that is done, no new collection entries will suffer the problems listed above. However, this commit makes no attempt to remediate any existing data corruption. See #8759 for plans to build a tool that can audit the virtual-data reference graph and detect the consequences of these bugs in pre-existing kernel databases, and see the issues listed above for notes on efforts to build remediation tools. This commit marks the new tests as expected to pass again. It adds one new (failing) test to demonstrate the lack of remediation code. Thanks to @gibson042 for recommendations.
Liveslots was suffering from the following bugs: * vrefs weren't properly encoded into DB keys and "encodePassable" keys when deleting collections (such that `isEncodedRemotable()` always answered false, which skipped the refcount processing of keys, and removal of the "ordinal-assignment" record) * recognition records (vom.ir. keys) weren't created or removed for Remotable-style keys in weak collections * Presence-style vrefs were not submitted for retirement processing when deleting a weak collection As a result: * `collection.clear()` left the "ordinal-assignment"` records in the vatstore, causing subsequent .has(oldkey) to incorrectly return true, and .init(oldkey, newvalue) to throw a "already registered" error, as well as consuming DB space longer than necessary. * Invoking the `.clear()` method on a virtual/durable collection, or dereferencing the collection itself (and allowing it to be garbage collected), did not free any Remotables, virtual/durable objects (Representatives), or imported Presences used as keys. This could cause data in other weak collections (or other vats) to be retained longer than necessary. * retiring (deleting) a Remotable used as a key in a weak virtual/durable collection did not free the corresponding value, causing data (in local and/or remote vats) to be retained longer than necessary * Allowing a weak virtual/durable collection to be garbage collected did not inform the kernel that the vat can no longer recognize the Presence-style keys, consuming c-list entries longer than necessary. This commit fixes those bugs, which fixes the immediate cause of: * fixes #8756 * fixes #7355 * fixes #9956 As a change to liveslots, full deployment requires both restarting the kernel with this new code, *and* triggering a vat upgrade of all vats. Once that is done, no new collection entries will suffer the problems listed above. However, this commit makes no attempt to remediate any existing data corruption. See #8759 for plans to build a tool that can audit the virtual-data reference graph and detect the consequences of these bugs in pre-existing kernel databases, and see the issues listed above for notes on efforts to build remediation tools. This commit marks the new tests as expected to pass again. It adds one new (failing) test to demonstrate the lack of remediation code. Thanks to @gibson042 for recommendations.
Liveslots was suffering from the following bugs: * vrefs weren't properly encoded into DB keys and "encodePassable" keys when deleting collections (such that `isEncodedRemotable()` always answered false, which skipped the refcount processing of keys, and removal of the "ordinal-assignment" record) * recognition records (vom.ir. keys) weren't created or removed for Remotable-style keys in weak collections * Presence-style vrefs were not submitted for retirement processing when deleting a weak collection As a result: * `collection.clear()` left the "ordinal-assignment"` records in the vatstore, causing subsequent .has(oldkey) to incorrectly return true, and .init(oldkey, newvalue) to throw a "already registered" error, as well as consuming DB space longer than necessary. * Invoking the `.clear()` method on a virtual/durable collection, or dereferencing the collection itself (and allowing it to be garbage collected), did not free any Remotables, virtual/durable objects (Representatives), or imported Presences used as keys. This could cause data in other weak collections (or other vats) to be retained longer than necessary. * retiring (deleting) a Remotable used as a key in a weak virtual/durable collection did not free the corresponding value, causing data (in local and/or remote vats) to be retained longer than necessary * Allowing a weak virtual/durable collection to be garbage collected did not inform the kernel that the vat can no longer recognize the Presence-style keys, consuming c-list entries longer than necessary. This commit fixes those bugs, which fixes the immediate cause of: * fixes #8756 * fixes #7355 * fixes #9956 As a change to liveslots, full deployment requires both restarting the kernel with this new code, *and* triggering a vat upgrade of all vats. Once that is done, no new collection entries will suffer the problems listed above. However, this commit makes no attempt to remediate any existing data corruption. See #8759 for plans to build a tool that can audit the virtual-data reference graph and detect the consequences of these bugs in pre-existing kernel databases, and see the issues listed above for notes on efforts to build remediation tools. This commit marks the new tests as expected to pass again. It adds one new (failing) test to demonstrate the lack of remediation code. Thanks to @mhofman and @gibson042 for recommendations.
Liveslots was suffering from the following bugs: * vrefs weren't properly encoded into DB keys and "encodePassable" keys when deleting collections (such that `isEncodedRemotable()` always answered false, which skipped the refcount processing of keys, and removal of the "ordinal-assignment" record) * recognition records (vom.ir. keys) weren't created or removed for Remotable-style keys in weak collections * Presence-style vrefs were not submitted for retirement processing when deleting a weak collection As a result: * `collection.clear()` left the "ordinal-assignment"` records in the vatstore, causing subsequent .has(oldkey) to incorrectly return true, and .init(oldkey, newvalue) to throw a "already registered" error, as well as consuming DB space longer than necessary. * Invoking the `.clear()` method on a virtual/durable collection, or dereferencing the collection itself (and allowing it to be garbage collected), did not free any Remotables, virtual/durable objects (Representatives), or imported Presences used as keys. This could cause data in other weak collections (or other vats) to be retained longer than necessary. * retiring (deleting) a Remotable used as a key in a weak virtual/durable collection did not free the corresponding value, causing data (in local and/or remote vats) to be retained longer than necessary * Allowing a weak virtual/durable collection to be garbage collected did not inform the kernel that the vat can no longer recognize the Presence-style keys, consuming c-list entries longer than necessary. This commit fixes those bugs, which fixes the immediate cause of issues #8756 , #7355 , and #9956 . As a change to liveslots, full deployment requires both restarting the kernel with this new code, *and* triggering a vat upgrade of all vats. Once that is done, no new collection entries will suffer the problems listed above. However, this commit makes no attempt to remediate any existing data corruption. See #8759 for plans to build a tool that can audit the virtual-data reference graph and detect the consequences of these bugs in pre-existing kernel databases, and see the issues listed above for notes on efforts to build remediation tools.
Impact measurement requires an audit of the vatstore, looking for two properties:
Remediation will require that check, followed by either deleting the entry (for non-reachable keys) or adding the |
While investigating #9939 I found a second bug:
rem = Far('iface', {})
, the kind of ephemeral referenceable object that liveslots calls a "Remotable", with a vref likeo+1
wms = makeScalarBigWeakMapStore()
data
, perhaps with an imported Presencewms.init(rem, data)
rem
call out of scope, and wait for a BOYD to garbage-collect itThe expected behavior is that the store data is collected too. However, because of the bug, the data remains alive. In fact, the entire key+value entry remains in the weak collection (indexed by the Remotable's
o+1
vref). That will keep a Presence alive, which may keep data alive in other vats.I think the problem is in
addRecognizableValue
. Note that the(type === 'object' && (!allocatedByVat || virtual || durable))
condition accepts Presences (which are notallocatedByVat
) and virtual/durable Representatives, but excludes Remotables. And there is noelse
clause afterwards. SoaddRecognizableValue(remotable)
will just be ignored.agoric-sdk/packages/swingset-liveslots/src/virtualReferences.js
Lines 538 to 542 in 7296b47
That means we don't record any pointers from the Remotable's vref to the collection whose entry ought to be deleted once the vref is retired. So the entry will be left forever, or at least until the entire collection is cleared or deleted.
The function's comment reveals that
addRecognizableValue
was written to supportvoAwareWeakMap/Set
, in which any Remotables used as keys are stored strongly, as the key of aMap
. When we started callingaddRecognizableValue
from the virtual-weak-collection code, we didn't revisit its limitations.agoric-sdk/packages/swingset-liveslots/src/virtualReferences.js
Lines 510 to 514 in 7296b47
Impact
This isn't a safety issue, because by the time this bug is triggered, the key is already unreachable (by definition), so the matching value is unretrievable: no userspace code can generate the Remotable to do a
.get()
, and of course weak collections are non-enumerable.But it is a performance issue, specifically a disk-space consumption problem. As a virtual (rather than ephemeral) collection, the values are stored in the
vatstore
database, and we'll use more DB space than we're supposed to (but since the values must also be virtual, no RAM will be consumed).The bug cannot occur for a fully-durable collection, because we refuse to accept non-durable keys (or values), and Remotables are not durable. It doesn't appear for strongly-keyed collections, of course, because the key is held strongly, and won't go away until someone explicitly deletes it from the collection (or clears/deletes the whole collection). It only affects merely-virtual weak-keyed collections, so only
makeScalarBigWeakMapStore
andmakeScalarBigWeakSetStore
. And only the Map form can entrain remote objects.We don't use merely-virtual stores very much (when we go virtual, we go durable), so the immediate impact is probably low.
Likely Fix
I think we'll need to add
vom.ir.${vref}
keys for theseo+1
-format Remotable vrefs. That will require changes toscanForDeadObjects
to make sure we're reacting to their retirement properly.The text was updated successfully, but these errors were encountered: