Skip to content
This repository has been archived by the owner on Jan 25, 2022. It is now read-only.

Soft check for WeakRef with reclaimed referent? #188

Closed
tjcrowder opened this issue Mar 2, 2020 · 15 comments
Closed

Soft check for WeakRef with reclaimed referent? #188

tjcrowder opened this issue Mar 2, 2020 · 15 comments

Comments

@tjcrowder
Copy link
Contributor

Apologies if I'm retreading trodden ground here, as seems likely. I see this note in this PDF in the history:

  • Is there a query operation for the states of a WeakRef
    • Fallback: No

which seems like it might be relevant, but I'm not sure what "Fallback: No" means.

The explainer's makeWeakCached example leaks memory in the absense of finalizer callbacks from GC, which implementations can skip "...for any reason or no reason..." That made me wonder if a cache like that should do proactive cleanup in some way, removing WeakRefs whose referent has been reclaimed. But the only way to know if a referent has been reclaimed is to use deref, which makes the referent strongly reachable again and keeps it alive until the end of the current job. (I also wonder if the GC might take that "use" [which isn't really a use] of the referent as evidence it should keep the referent in preference to other reclaimable objects. [It's probably painfully obvious that I know very little about modern GC techniques used in JavaScript engines.])

Would some kind of soft check for a reclaimed referent (isReclaimed) make sense, something that doesn't make the referent strongly reachable, and isn't taken as evidence of use? Then caches like the makeWeakCached example could remove entries for which ref.isReclaimed() returns true, perhaps in an idle callback if the host supports them. (Not necessarily in a big loop, it could be incremental.)

It would be important to emphasize that only a true return actually tells you anything useful. A false return is no guarantee that the referent won't be GC'd milliseconds later.

Semantics:

  • If isReclaimed returns true, it will always return true in the future for that same WeakRef.
  • If it returns false, smarter people than me would need to decide whether it would consistently return false for that same WeakRef until the end of the current job (e.g., isReclaimed adds the referent to the [[KeptAlive]] list) or if it might return true if called a second time.
@tjcrowder
Copy link
Contributor Author

tjcrowder commented Mar 2, 2020

Just FWIW, here's an idea what makeWeakCached might look like with incremental cleanup which:

  1. Tosses its proactive component if it sees a finalization callback, on the basis it doesn't need proactive cleanup (an assumption, but seems like a reasonable one).
  2. Tries to do cleanup during idle callbacks.
  3. Falls back to doing one cleanup check on each call to the cache (if no idle and hasn't seen a finalization callback).

(Tried to keep the same style.)

[Edit 2020/03/05 - My updated understanding -- thank you @littledan!! -- is that a scan like the one below is not a good idea, so don't copy this code or the idea behind it.]

function makeWeakCached(f) {
  const cache =  new Map();
  let itHoover = cache.entries();
  const hasIdle = typeof requestIdleCallback !== "undefined";
  const cleanup = new FinalizationRegistry(iterator => {
    hoover = itHoover = null; // Saw a finalization callback, we don't need the hoover
    for (const key of iterator) {
      // See note below on concurrency considerations.
      const ref = cache.get(key);
      if (ref && !ref.deref()) cache.delete(key);
    }
  });

  let hoover = () => {
    if (!hoover) return;
    let result = itHoover.next();
    if (result.done) {
      // Restart
      itHoover = cache.entries();
      result = itHoover.next();
    }
    if (!result.done) {
      const [key, ref] = result.value;
      if (ref.isReclaimed()) cache.delete(key);
    }
    if (hasIdle) requestIdleCallback(hoover);
  };

  if (hasIdle) requestIdleCallback(hoover);

  return key => {
    if (hoover && !hasIdle) hoover();
    const ref = cache.get(key);
    if (ref) {
      const cached = ref.deref();
      // See note below on concurrency considerations.
      if (cached !== undefined) return cached;
    }

    const fresh = f(key);
    cache.set(key, new WeakRef(fresh));
    cleanup.register(fresh, key, fresh);
    return fresh;
  };
}

Obviously it's more complicated. It also adds a bit more concurrency btw the program and GC, but in the same sort of class as the existing concurrency.

There's more one could do (not running the hoover if the cache is empty, only running it periodically rather than constantly when idle, etc.) but for an example, seems sufficient.

@littledan
Copy link
Member

There are definitely possible improvements that we could make to the weak cache (for example, strongly hold an LRU list, which would probably be necessary to make this useful). Maybe it should be deleted from the README since it's not a great idea in its current state.

However, we cannot add any APIs to check whether something is being referenced--they would be too inefficient. The WeakRef and FinalizationRegistry APIs don't give guarantees of precision because this would imply inefficient traversals over the heap, but you can expect implementations to make the tradeoff they feel is appropriate in terms of nulling things out in a timely manner, so isReferenced would not do any better.

@tjcrowder
Copy link
Contributor Author

@littledan - Thanks for the reply! I'm a bit confused, though:

However, we cannot add any APIs to check whether something is being referenced...

That's not what I'm suggesting/asking about -- or at least, if it is I'm afraid I don't understand. :-D

isReclaimed (not isReferenced) would logically be:

class WeakRef {
    // ...
    isReclaimed() {
        return this.deref() === undefined;
    }
}

...but where the GC is not supposed to infer from that check that we actually wanted the referent (e.g., if it prioritizes based on recent usage). That's its only purpose, checking if something has already been reclaimed (so the WeakRef is pointless). It's not asking if something is referenced, it's asking if it's already known to no longer exist.

@mhofman
Copy link
Member

mhofman commented Mar 2, 2020

I think the "peeking without impact" issue has been raised a couple times before. I've personally wondered if that API could be useful as well.

This would allow for the equivalent of ref.deref() == null without side effects on the liveness of the referant.

Semantics:

  • If isReclaimed returns true, it will always return true in the future for that same WeakRef.
  • If it returns false, smarter people than me would need to decide whether it would consistently return false for that same WeakRef until the end of the current job (e.g., isReclaimed adds the referent to the [[KeptAlive]] list) or if it might return true if called a second time.

I'm not sure why we'd need this API be stable for the true false case in the same turn? I think the only guarantee should be that it cannot return true false after having returned false true.

Edit: mixed up my true and false,

@tjcrowder
Copy link
Contributor Author

tjcrowder commented Mar 2, 2020

@mhofman -

I'm not sure why we'd need this API be stable for the true case in the same turn? I think the only guarantee should be that it cannot return true after having returned false.

true means "reclaimed" (e.g., calling deref would return undefined). A WeakRef's referent never goes from reclaimed to un-reclaimed, so true remains true forever (not just the current job). Perhaps isGone would be clearer. :-) (I wouldn't suggest making the flag work the other way, because isPresent is too strong a statement -- there's a big difference between "not known to be gone" and "known to be present.")

It's the false case ("it isn't known to be gone") that may change to true ("it's gone now"). Whether that's allowed to happen during the same job is for those more in the know to decide (well, all of this is, but you get the idea). FWIW, my tentative view is: No, there's no current job stability guarantee for a false return, because A) We don't want the object put on [[KeptAlive]], and B) A programmer should never infer anything from a false return value other than that the referent wasn't already known to be gone as of the call. I'd suggest that it's entirely possible for isReclaimed to return false but then an immediate call to deref to return undefined. Docs would need to be very clear that if (!ref.isReclaimed()) { const obj = ref.deref(); } is an anti-pattern; just use const obj = ref.deref(); if (obj) { ... }.

@hotsphink
Copy link
Contributor

I'm definitely -1 on this proposal. WeakRefs make GC observable, which is hugely problematic, and the proposal goes to some effort to reduce that observability (by restricting it to the end of the current synchronous execution, for example.) This proposal, if implemented usefully, would reintroduce that observability unless I'm missing something.

I haven't really worked through the details of WeakRefs + caching, but I'm sympathetic to the use case. You could have the weakCache use a Map from key to (metadata, WeakRef(value)) tuples, where the metadata is sufficient to compute the eviction priority of an item. But the most straightforward time to decide to evict something is during finalization, at which point it's too late to not evict the entry being finalized.

I guess you could evict during insertion: eg for LRU eviction, you maintain an LRU list of strong refs to the values. When you insert something that pushes you above your desired cache size, you remove some number of LRU list entries.

Deallocation is not very prompt, though. You'd probably want to do the eviction before the allocation of the new item if possible, because its allocation would be a signal to the engine to do a GC if things are getting tight. If you did it the other way around, that GC would happen while the LRU list was still holding the doomed values, and the LRU list removal isn't enough to tell the engine that there is an opportunity to clean some big things up.

I don't know what the minimal API would be to handle this more promptly. You kind of want a whole new API entry for the engine to ask "would you like to keep this?" before discarding registered targets. But that in itself would probably need to be delayed to the end of a turn (synch JS execution) to avoid leaking timing and internals, and would probably delay the cleanup until the following turn anyway.

Err... I don't think it works. Cleanup would be delayed at least until the following GC, which is even less prompt. The "would you like to keep this?" code requires a GC collection in order to be called in the first place. And then you need another GC to finalize the target for real, since the target and everything reachable from the target would have to be traced in that first GC. The engine would first do a full GC treating all WeakRefs as weak, at which time it would accumulate a list of dead stuff. Then it would take the subset of that stuff that is keepalive-able, trace it, and run the "would you like to keep this?" keepalive callbacks. Anything that returns "no" would then go into an internal "condemned" set that would be ineligible for keepalive callbacks in the next GC. Bleh.

@tjcrowder
Copy link
Contributor Author

@hotsphink -

This proposal, if implemented usefully, would reintroduce that observability unless I'm missing something.

I don't think this introduces any observability that isn't already there via deref. See my comment above, it's basically return this.deref() === undefined; except "without impact" as @mhofman cleverly put it (e.g., the GC should not take an isReclaimed check to mean that the code actually wanted the referent, if it uses usage information when prioritizing ejecting weakly held referents).

@hotsphink
Copy link
Contributor

Making deref() have an impact is to reduce observability, as I understand it. Otherwise, you could repeatedly poll deref() during a turn and know almost exactly when a GC happened.

@mhofman
Copy link
Member

mhofman commented Mar 2, 2020

Making deref() have an impact is to reduce observability, as I understand it. Otherwise, you could repeatedly poll deref() during a turn and know almost exactly when a GC happened.

If that's the case, then yes there would be no way to make this API not fully equivalent to this.deref() === undefined which inserts into the [KeptObjects] list.

However, maybe we could add a note that mentions something like:

Calling deref() doesn't imply that the program is interested in the identity of the referent

Maybe implementations would be able to avoid moving objects between generation buckets if they are only in the [KeptObjects] list? I consider this an optimization only, and the program couldn't rely on that behavior.

@tjcrowder
Copy link
Contributor Author

@hotsphink -

Making deref() have an impact is to reduce observability, as I understand it. Otherwise, you could repeatedly poll deref() during a turn and know almost exactly when a GC happened.

That isn't the impact I'm talking about. That's the [[KeptAlive]] impact (which would be fine). I'm talking about any impact on the garbage collector: deref actively gets the reference, which a GC could use as evidence the referent was "used" (for instance, for prioritization purposes, as I called out originally).

You're right about multiple deref calls in the same job being consistent is about granularity, thanks for that! From the proposal:

The intention is to coarsen the granularity of observability of garbage collection, making it less likely that programs will depend too closely on the details of any particular implementation.

I'd said smarter people than me needed to decide whether a false return needed to be locked in for the entire job, but they already have done, and it does. :-) So I can remove that open question. Clarifying the earlier semantics:

  • If isReclaimed returns true, it will always return true for that same ref, forever. (Since a referent doesn't go from reclaimed to un-reclaimed; once it's gone, it's gone.)
  • If it returns false, it will return false for the remainder of the job.

@tjcrowder
Copy link
Contributor Author

tjcrowder commented Mar 3, 2020

Note: All of this is predicated on the idea that garbage collectors may take recent usage of a referent as an input to their processing (now, or at some point in the future, since weakly held references are a new design input). If they don't and won't, if calling deref and immediately throwing away its result has no impact on GC (edit: other than [[KeptAlive]]) and isReclaimed is literally just deref() === undefined and thus unnecessary. To me it seems likely that recent usage would be something GC would consider at some point, but that needs input from implementers.

Separately:

isReclaimed can be added as a follow-on if it's too late to consider it now -- the proposal is at Stage 3 and being actively added to engines, after all. It's an addition to the API, it doesn't change anything currently in the proposal.

Here's prior art for an API providing this kind of information: C#'s WeakReference has an IsAlive accessor property, which is basically the inverse of isReclaimed().

Property Value:
true if the object referenced by the current WeakReference object has not been garbage collected and is still accessible; otherwise, false.

(Again, please note those return values are the inverse of what I'd originally suggested for isReclaimed.)

That "has not been garbage collected" is a stronger statement than I'd originally suggested for isReclaimed, but in keeping with the revised version given the granularity that locking in a "no, it hasn't been reclaimed" answer for the current job creates. Purely for reasons of maintaining that coarseness, if (!ref.isReclaimed()) { const obj = ref.deref(); } should reliably get the object into obj (but is still an anti-pattern). Otherwise, isReclaimed would increase the observability of the GC (as @hotsphink pointed out) which the proposal intentionally keeps coarse.

@littledan
Copy link
Member

It sounds like we're down to a version which wouldn't affect observability of GC, as described in #188 (comment) .

Note: All of this is predicated on the idea that garbage collectors may take recent usage of a referent as an input to their processing (now, or at some point in the future, since weakly held references are a new design input). If they don't and won't, if calling deref and immediately throwing away its result has no impact on GC (edit: other than [[KeptAlive]]) and isReclaimed is literally just deref() === undefined and thus unnecessary. To me it seems likely that recent usage would be something GC would consider at some point, but that needs input from implementers.

I can't rule out that some GCs might prioritize things this way, but I haven't heard of it. Can we consider adding this API in a follow-on proposal if there is interest from implementers in the future? Right now, we're in a phase where we're considering removing some possibly-overengineered APIs that try to give more signals to implementations, e.g., #187 .

@tjcrowder
Copy link
Contributor Author

@littledan - Thanks Dan. FWIW, that seems reasonable to me, perhaps with the caveat that if implementers are taking deref as a prioritization signal at some point during implementation of this proposal, we might revisit bundling it in.

I ran some tests with the current support in V8 and SpiderMonkey that suggest neither of them cares whether you use deref or not (in my very simple synthetic scenario), it doesn't seem to affect their cleanup of the object (other than [[KeptAlive]], of course) relative to other objects. Can't guarantee my test was rigorous (or clever), but...

@hotsphink
Copy link
Contributor

You are correct, SpiderMonkey currently only uses deref() for [[KeptAlive]]. Especially with the spec still changing, we are only worried about correctness.

For scheduling and tuning to be relevant, we would have to be holding back and/or reordering finalization callbacks. Right now, when anything dies, we immediately queue up any relevant callbacks and then yield/invoke all of them in the order they were discovered. I suspect all the other implementations are at the same point, though the ordering may vary depending on the underlying data structures used. (eg, I think we're iterating over a set for part of this.)

I'd guess we're a ways off from any form of scheduling.

@littledan
Copy link
Member

OK, given #188 (comment) , I'm closing this issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants