-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom key equality for weakKeys (equals-based lookup) #344
Comments
I think the most natural argument for weakKeys by equality is interning, e.g. Guava's I'm hard pressed to think of other cases where I might have wanted weakKey equality in my own code. The identity is perfect when scoping the lifecycle to a single consumer, such as a per-session cache of a shared resource. Any other usage I can think of, especially when considering soft references, is better handled by an explicit size bounding. I think naming of this feature in the builder is my biggest concern. |
Thinking about this anew, and I think we can do this cleanly without addition codegen bloat, poor builder names, and minimal complexity. In MapMaker of yore, there were settings In our case we have a The builder would default to its current behavior, but allow overriding by a It could also make a stronger argument for offering #294 as an optimization to avoid the lookup key wrapper. It might be something to bake into our What do you think? |
hmm, I suppose this wouldn't work nicely for custom key equivalence and is why I originally dismissed it. It might still work for equals and reference, though. The identity case has to be wrapped for both lookup and storing. Therefore
The weak reference has to wrap the key for GC'ing, so a wrapper is always required. We'd prefer not storing an extra I think that would still work and remains simpler by not needing to codegen different |
Looks good, I only wanted to point out that soft keys will add another set of rows. |
Right, thanks. We would probably ignore soft keys + identity and use its weak key instead, since it has identical application semantics while also being more GC hygienic. We can also probably trim the generated class heirarchy by combining weak/soft variants by using |
Sorry, I don't understand, why can soft reference be replaced by weak reference for identity? Or do you mean during lookup? |
A weak reference is collected when there are no strong references remaining. This can be collected immediately. A soft key is collected when there are no strong references and gc pressure requires freeing memory. It can be resurrected to a strong reference. For identity there is no way to have the same instance to look up with unless you have a strong key. If you lack a strong reference, it cannot be retrieved again. This makes it externally identical to a weakKey setting. Potentially you might have a soft reference of the key elsewhere to resurrect from, thereby being able to access in the cache. That redundancy isn’t realistic in practice, so I don’t think it’s a violation to support. |
I am having a hard time justifying this feature. The technical aspects are clear, but the use-case is not. That leads back to Guava's rational for why it was removed, as in Google's large code-base there was not a valid use-case. I'll summarize what understand so far, Weak internerThis is the use-case that Guava observed throughout Google's code base that caused the feature to remain as package-private (see Function<E, E> newWeakInterner() {
Cache<E, Boolean> cache = Caffeine.newBuilder()
.keyEquivilance(Equivalence.equals())
.weakKeys()
.build();
return e -> {
for (;;) {
var entry = cache.policy().getEntryIfPresentQuietly(e);
if (entry) {
return entry.getKey();
}
var value = cache.asMap().putIfPresent(e, Boolean.TRUE);
if (value == null) {
return e;
}
}
};
} Custom equalityThe primary example for this is that Java's arrays use identity-based equality, e.g. The problemSo while interning is a desirable case to support, it allows for a lot of misuse. Guava's larger scope allowed it to hide this feature and solve the singular problem in a nice interface, whose implementation complexity is hidden from the user. Since many might not write an interner correctly, it does seem better for them to have made that into a first-class concept. As @kevinb9n rational summarizes the user's intent and misunderstanding well. They would expect that using any equivalent key would find the entry and show a strong desire to retain it. Once all references to an equivalent key are reclaimed then it would become prudent to remove. And that is exactly where the confusion lies, the weak reference is only applicable to the instance used by the mapping. The fact that the mapping is popular or equivalent instances are abound has no relevance to the garbage collector. Once the instance itself has no strong references then it is eligible for collection, which results in the cache discarding the value earlier than the user intended. Therefore, to get the desired behavior one should use both an interner and a cache. These then serve different purposes and combining that responsibility would be confusing logic. The interner explicitly reads as replacing the element with a canonical instance that should be used, while the cache explicitly provides a policy for how long the data should be held in-memory. One can easily understand the intent: What to do?If there are no other cases then we should be very wary of exposing a footgun. We could avoid this by following Guava's example by providing our own Interner type backed by Caffeine instead of MapMaker. That is not completely unreasonable as a cache-like use-case. However this library isn't meant to advocate for not using Guava - its still wonderful - so in practice using their Interner seems perfectly fine. The reasons not to don't seem to be about performance, but minimizing dependencies for those who want to avoid the disk footprint. That's a practical ask, but quite niche. Hence I have not been able to justify implementing this feature and would lean towards closing over duplicating Guava's Interner. |
I agree that WeakReference require careful handling. In the systems I've worked on, and where I wanted a WeakReference key, it was often something like interning (aka maximal sharing), or quicker clearing of a cache. So with that in mind, three observations:
Hope this helps a bit. |
Thanks @DavyLandman. Do you think that those cases would fit the interner + weak keyed cache approach? If so, then it should make the intent and behavior much clearer. The interner is responsible for returning a canonical key and the cache is responsible for evicting that key's heavy-weight value once it is no longer strongly referenced. If the argument can be reduced down to a preference of avoiding Guava or a micro optimization, then I think this feature doesn't carry its conceptual weight. |
I've had two scenarios where I wanted weak keys with
I think from an esthetic point of view, it makes sense to allow equality on WeakReferences. |
Shouldn’t the heavy key be interned first or else runs into the misunderstanding that Kevin highlighted? |
Not necessarily, especially if the key can come from multiple places. So you have some fleeting access (that reconstruct the key) and some longer access, that keep the key around via some form of interning. |
Sure, but after reconstruction there is a canonical key that can be used and held by the cache. The longer accessed that interned help keep it alive, while the short ones only need it transiently for the lookup. It seems to me that using an interner to get the cache key before the lookup has similar results. |
Ah, true, that's completely possible, at the cost of an extra tree lookup you could chain them after each other. Do you want to propose special case for interning logic? |
Yes, exactly. Guava instead kept this feature internal and offers an Interner interface with Interners.newWeakInterner() for this feature. Then one performs If that satisfies the requirements then for now I don't think we need to offer our own |
I don't want to take-over @Maaartinus issue. But I do want to note that it's a pity we need guava as extra dependency. Most of our intern caches we've done using caffeine. |
If we did this then maybe we could make it into a subtle feature. Even though I do think a proper interface like Guava's is very desirable, it also seems a bit beyond the scope here and could lead to feature creep asks. We could do something subtle like |
My bad, I was assuming most of the wiring for a proper "weak" interner was already present in your internal classes. |
It isn't far from it, but some work is needed. A full abstraction for pluggable key / value |
A weak keyed cache uses identity equivalence, as this is the only sensible approach when caching a value. If Objects.equals was used then the user would not be inclined to hold a reference to the canonical key and expect any equivalent key to cause the mapping to be retained. As that is an impossible expectation, the cache would seemingly discard prematurely. Therefore we disallow this case to hint that users should be more thoughtful about their desired behavior. An interner is a Set-based cache where the key is the only item of interest. This allows usages to resolve to a canonical instance, discard duplicates as determined by Object.equals, and thereby reduce memory usage. A weak interner allows the garbage collector to discard the canonical instance when all usages have been reclaimed. This special case of a weak keyed cache with Object.equals behavior is now supported, but hidden through an interface to make the behavior explicit and avoid misuses. For cases that want to cache by a canonical weak key to a value, the two type should be used in conjunction. Instead of incorrectly trying to combine into a single cache, use intern the key and use that for the cache lookup.
A weak keyed cache uses identity equivalence, as this is the only sensible approach when caching a value. If Objects.equals was used then the user would not be inclined to hold a reference to the canonical key and expect any equivalent key to cause the mapping to be retained. As that is an impossible expectation, the cache would seemingly discard prematurely. Therefore we disallow this case to hint that users should be more thoughtful about their desired behavior. An interner is a Set-based cache where the key is the only item of interest. This allows usages to resolve to a canonical instance, discard duplicates as determined by Object.equals, and thereby reduce memory usage. A weak interner allows the garbage collector to discard the canonical instance when all usages have been reclaimed. This special case of a weak keyed cache with Object.equals behavior is now supported, but hidden through an interface to make the behavior explicit and avoid misuses. For cases that want to cache by a canonical weak key to a value, the two type should be used in conjunction. Instead of incorrectly trying to combine into a single cache, use intern the key and use that for the cache lookup.
A weak keyed cache uses identity equivalence, as this is the only sensible approach when caching a value. If Objects.equals was used then the user would not be inclined to hold a reference to the canonical key and expect any equivalent key to cause the mapping to be retained. As that is an impossible expectation, the cache would seemingly discard prematurely. Therefore we disallow this case to hint that users should be more thoughtful about their desired behavior. An interner is a Set-based cache where the key is the only item of interest. This allows usages to resolve to a canonical instance, discard duplicates as determined by Object.equals, and thereby reduce memory usage. A weak interner allows the garbage collector to discard the canonical instance when all usages have been reclaimed. This special case of a weak keyed cache with Object.equals behavior is now supported, but hidden through an interface to make the behavior explicit and avoid misuses. For cases that want to cache by a canonical weak key to a value, the two type should be used in conjunction. Instead of incorrectly trying to combine into a single cache, use intern the key and use that for the cache lookup.
A weak keyed cache uses identity equivalence, as this is the only sensible approach when caching a value. If Objects.equals was used then the user would not be inclined to hold a reference to the canonical key and expect any equivalent key to cause the mapping to be retained. As that is an impossible expectation, the cache would seemingly discard prematurely. Therefore we disallow this case to hint that users should be more thoughtful about their desired behavior. An interner is a Set-based cache where the key is the only item of interest. This allows usages to resolve to a canonical instance, discard duplicates as determined by Object.equals, and thereby reduce memory usage. A weak interner allows the garbage collector to discard the canonical instance when all usages have been reclaimed. This special case of a weak keyed cache with Object.equals behavior is now supported, but hidden through an interface to make the behavior explicit and avoid misuses. For cases that want to cache by a canonical weak key to a value, the two type should be used in conjunction. Instead of incorrectly trying to combine into a single cache, use intern the key and use that for the cache lookup.
I brought over Guava's Interner<E> interner = Interner.newWeakInterner();
E canonical = interner.intern(sample); The strong implementation wraps a The weak implementation uses a weak keyed Neither implementation offer other bounding options as that doesn't make sense for this use-case. For caching where bounding of the values is useful, as discussed above one can combine a |
A weak keyed cache uses identity equivalence, as this is the only sensible approach when caching a value. If Objects.equals was used then the user would not be inclined to hold a reference to the canonical key and expect any equivalent key to cause the mapping to be retained. As that is an impossible expectation, the cache would seemingly discard prematurely. Therefore we disallow this case to hint that users should be more thoughtful about their desired behavior. An interner is a Set-based cache where the key is the only item of interest. This allows usages to resolve to a canonical instance, discard duplicates as determined by Object.equals, and thereby reduce memory usage. A weak interner allows the garbage collector to discard the canonical instance when all usages have been reclaimed. This special case of a weak keyed cache with Object.equals behavior is now supported, but hidden through an interface to make the behavior explicit and avoid misuses. For cases that want to cache by a canonical weak key to a value, the two type should be used in conjunction. Instead of incorrectly trying to combine into a single cache, use intern the key and use that for the cache lookup.
A weak keyed cache uses identity equivalence, as this is the only sensible approach when caching a value. If Objects.equals was used then the user would not be inclined to hold a reference to the canonical key and expect any equivalent key to cause the mapping to be retained. As that is an impossible expectation, the cache would seemingly discard prematurely. Therefore we disallow this case to hint that users should be more thoughtful about their desired behavior. An interner is a Set-based cache where the key is the only item of interest. This allows usages to resolve to a canonical instance, discard duplicates as determined by Object.equals, and thereby reduce memory usage. A weak interner allows the garbage collector to discard the canonical instance when all usages have been reclaimed. This special case of a weak keyed cache with Object.equals behavior is now supported, but hidden through an interface to make the behavior explicit and avoid misuses. For cases that want to cache by a canonical weak key to a value, the two type should be used in conjunction. Instead of incorrectly trying to combine into a single cache, use intern the key and use that for the cache lookup.
A weak keyed cache uses identity equivalence, as this is the only sensible approach when caching a value. If Objects.equals was used then the user would not be inclined to hold a reference to the canonical key and expect any equivalent key to cause the mapping to be retained. As that is an impossible expectation, the cache would seemingly discard prematurely. Therefore we disallow this case to hint that users should be more thoughtful about their desired behavior. An interner is a Set-based cache where the key is the only item of interest. This allows usages to resolve to a canonical instance, discard duplicates as determined by Object.equals, and thereby reduce memory usage. A weak interner allows the garbage collector to discard the canonical instance when all usages have been reclaimed. This special case of a weak keyed cache with Object.equals behavior is now supported, but hidden through an interface to make the behavior explicit and avoid misuses. For cases that want to cache by a canonical weak key to a value, the two type should be used in conjunction. Instead of incorrectly trying to combine into a single cache, use intern the key and use that for the cache lookup.
Sounds good, thanks for taking the effort 👍🏼 |
@ben-manes no problem!:) |
Released in 3.1.0 |
I'm not sure at all about this feature, but it's something I surely wanted it a lot some long time ago. Opening this issue to discuss it as requested.
In case of
weakKeys()
, identity-based lookup makes more often sense than usingequals
, but sometimes,equals
is better - imagine, you cache some pure function and the keys get re-created: When using equals, you may be lucky and get a hit, with identity, it's always a bad luck. All workarounds mean more work and more overhead.As Ben wrote, since there's already a wrapper for a weak key, replacing it by a different wrapper (which uses equals rather than identity) should add no overhead.
Allowing equals-based lookup makes
softKeys()
usable as the only reason against them no longer applies. OTOH soft references don't work very well and it's unclear if such a cache is worth it.I'm afraid, you'd need another version of
WeakKeyReference
andLookupKeyReference
and a few newNodeFactory
s and/or to pay some tiny runtime-overhead.Maybe
forceUsingEqualsForKeys
? But that's even longer.This is unfortunately true and it makes the idea less appealing than in case of Guava, which could do it easily using their custom map.
The text was updated successfully, but these errors were encountered: