-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: container: add weak reference maps #43615
Comments
I assume when iterating over a map the map would be "locked" in some way? To avoid async reads/writes to the map. |
I would assume it'd be the same as any other map. Up to you to ensure that you are not modifying it while the iterator is running. |
What about when the a weak reference gets removed by the runtime while iterating over the map? |
Same thing that would happen with anything else involving pointers in a moving GC: It's up to the runtime to have suitable barriers on or around their accesses. |
CC @bradfitz |
Some thoughts:
|
You can use any key type you want. It's just a normal map. No, entries can't be removed if there are references to them. If there's references to them, they still exist, and a weak reference should stay valid. The entire point is to have a reliable lookup for these objects if they still exist, without ensuring that they still exist. |
@seebs, how would weak references interact with finalizers (if at all)? Specifically: is the weak reference removed from the map before the finalizer is run, or after, or is it forbidden to obtain a weak reference to an object that has a finalizer? |
That is a fascinating question. On consideration: If we assume that nothing else ever users finalizers, finalizers might be a credible way to implement this, internally. Runtime can skip the storage for a weak map when marking, and set a finalizer for each item to delete it from the map, roughly. So one solution would be "forbidden to set a finalizer on weak references, or obtain a weak reference to an object with a finalizer". Except... that's horribly brittle, and might result in someone developing a time machine so they can go back to before it was done and add "never use this feature" to Effective Go. But this is a really interestingly-difficult philosophical question. It is permissible for a finalizer to actually do a thing that prevents a thing from being collected. So if we remove things from the map first, the object will potentially still exist but we will report it missing (and thus, in most use cases, create a new one). If we run finalizers first, we can have objects in the map which have had their finalizers run, and which can thus end up referenced again post-finalizer. And you can't tell, from the perspective of the caller of the finalizer, whether that finalizer may have stashed the object somewhere so that it won't really be collected. And it might be possible to fix this if finalizers had a way to indicate whether or not they were vetoing collection, but that could nonetheless be wrong... I think the answer is: Finalizers run after deletion from the map, but if your finalizer preserves the object, it becomes possible that the object exists but the map doesn't refer to it. So don't do that. But not in the sense of "runtime is allowed to explode in a shower of demons", just "yes, so, that previous object is around, and the map doesn't refer to it but might now have a new entry for that key." But philosophically, in general, a finalizer is supposed to run "after the object is dead and never referenced again", and it should not do something that would make the object get referenced again, it's just after-the-fact cleanup, and it seems fine for the object to no longer be in the map, and then get cleaned up, even if a new object for its former key ends up getting created. (Although if you did, say, "directory named after the key" and the finalizer was "delete the directory" that would be a Bad Idea, and also your own fault, don't do that.) |
From a purely philosophical view, is it proper to impose "at most one" on weak references to an object? |
This is a major language and implementation change on the garbage collector side. /cc @aclements @mknyszek |
The "major change" thing is both why it's a hard sell to put it in runtime, and why it's impossible to put it somewhere else. |
This is something I've thought about quite a bit in the past. My past thinking has led to a bit of a counter-proposal: don't allow iteration over weak maps. Right now, the garbage collector is invisible in the language semantics (other than finalizers, which are an unfortunate wart at best). The language specification is built on the illusion of infinite memory. If you have a map that allows lookups and not iteration, we maintain that illusion because the GC can delete unreachable keys from that map without any effect on program execution. It remains invisible. An un-iterable map still provides most functionality that people typically want from a weak map, including string interning and carrying side information for objects without forcing them to remain live. There is also exactly how ECMAScript's I would argue that this should at least wait on generics. At that point, we could provide this as a regular type with all the type-safety of the built-in Implementing the above for pointer keys is actually pretty easy. We'd mark the key storage as non-pointer data, use allocation specials to track the set of weak maps containing an object as a key, and delete keys from those weak maps during sweeping. Disallowing iteration simplifies a lot of questions around races with the GC. Implementing it for non-pointer keys is harder, but probably necessary to make it useful. I think the map would have to find the pointers in a key when it's added and attach specials to each of those allocations. During tracing, if the GC reaches any of these objects, it would have to trace across to everything else referenced by that key as well. Object tracing doesn't currently have to look at specials, so I'm not sure how to do this without a performance penalty to every object scanned during GC. |
That's a fascinating idea. My vague intuition is there might be cases where you'd want to iterate the map to do something about the data, but in all those cases, I think you'd also want it to be non-weak at that point. But I think "unreachable keys" feels like a slightly different question; I was assuming ordinary keys, which might be strings or integers, and potentially-unreachable values. So for instance, with a I guess the answer is that either keys or values could in principle be weak references. |
Could the implementation allocate for these and implicitly unbox when returning them? Then the GC could continue to deal only in pointers. |
A weak map has two types: a key type and a value type. Only the value type is weak. That is, if My point is that with those semantics I think it's fine to restrict the value type of a weak map to be a pointer. It's hard to even understand what it means to store a non-pointer in a weak map. What does it mean to say that there are no remaining references to a non-pointer value? Isn't that equivalent to saying that there are no pointers to the value? Are people thinking of different semantics? |
I was thinking of the value being of type (say) That said, I'd be perfectly happy to restrict the value type of a weak map to be a pointer. I imagine it'd have the same restrictions as the first argument to runtime.SetFinalizer, which are even stronger than "must be a pointer". |
It was pointed out to me that my point about non-pointer keys was really unclear and also wrong. I don't mean pointer-free keys; I'm not even sure what that would mean in a weak-keyed map (probably it would just act like a normal map). I mean keys that contain pointers, like
Ah, that is quite different from what I was talking about. I much less on-board with a weak-valued map than a weak-keyed (non-interable) map because it exposes the behavior and timing of the garbage collector, exactly the way finalizers do. Also, I understand concrete use cases of weak-keyed maps, but I don't actually understand concrete uses of weak-valued maps. Your example of a string-to-*Node map was rather abstract. Can you give a concrete example of what you would use that for?
I was definitely thinking of different semantics, but maybe I'm the only one. |
Well, hmm. You can't GC the int pointed to by one of the pointers unless you set that pointer to nil, because otherwise you'd have an easy way to get an invalid pointer unexpectedly. I think my intuition is that as long as the [2]*int exists, it is in fact a valid reference and the things it points to aren't collectible, but if nothing refers to it, then it can be collected, at which point the things it points to are no longer referenced (by it, anyway). But that implies that the value type really ought to be I am definitely leaning towards Ian's interpretation; the keys are not "weak" in any way, they're just keys, if they contain pointers they are references to them. The only difference from a regular map is that, if the values are pointers, they don't count as references, but if the thing they point to gets GCd, the key/value pair is deleted from the map when that happens. So, a concrete example of utility: Interning strings. You want to have a lot of objects which should contain a string, but you don't want to have multiple different allocations for the same string, and you want comparisons of them for sameness to be cheap. So...
The result of calling So why not just use a normal map? Because if the set of strings is largeish, and changes over time, and sometimes a string will stop being used, a non-weak With the weak map, you can discard any values that are no longer referenced anywhere, and you still have the guarantee that no two users who try to intern the same string will ever see different interned values. This can also be used for some kinds of caching. For instance, say you were doing a naive cache of fetched URLs. If you do it as So basically, the use case is: (1) it would make sense to have a map with these things as values, (2) we definitely don't want to throw them away if anyone's using them (3) it's fine to throw them away if no one's using them (4) it would be annoying to try to track who might be using them. I note that "not iterable" seems fine to me for a weak-valued map as well. It's not really useful to iterate it, generally. I'm not getting a clear picture of how a weak-keyed map would be used. Like, for interning, what would I be storing as keys? The addresses of strings can't be right, because the whole point is that I want to get the same thing from the map given two distinct strings which happen to compare equal. |
Okay, I think I understand your interning example (I certainly understand the code and intent; not sure I've completely wrapped my head around the implications :). And I think interning like this does require a structure that exposes GC behavior, since it has to distinguish whether objects still exist. So weak-keyed maps wouldn't work for interning. But my basic concern with a weak-valued map is that its semantics are deeply tied to GC behavior. It can't be described without explaining what it means for things to become unreachable. Finalizers are the only other thing in the language like that right now, and those are an unfortunate and necessary evil. In my opinion, the bar for adding another is extremely high. (sync.Pool's semantics don't expose GC behavior. It can be described simply as a lossy set. GC behavior is key to Pool's performance characteristics, but that's true for a lot of things.) Also, I believe weak-valued maps are ultimately no different from weak references, and I'm not sure why we would add weak-valued maps rather than just weak references. You can build a weak reference from a weak map using a single-entry map, and you can build a weak map from a weak reference by using weak references for the values (plus either finalizers or phantom reference queues, depending on the details of the weak reference API). My other concern is implementation. First, a weak-valued map (or any form of weak reference) cracks open the assumptions Go's current write barrier is based on. It might be possible to work around that by synchronizing carefully with the garbage collector during the "Get" and "Put" operations, but I'd have to work out those details. It's particularly hairy if an unreachable value is actively being swept during a Get. Sweeping would probably have to lock the map. (Weak-keyed maps have none of these problems.) Second, weak-valued maps have the problem I was (incorrectly) getting at above for values containing multiple pointers, like |
So, as a user, I mostly want a thing like this in a case where I don't care whether the object never existed or used to exist but got GCd, I just don't want to make an object if it already exists, but I don't want to pin a zillion objects that I might never use again. So, it sort of exposes a GC behavior, but in a very loose sense; nothing prevents the implementation from continuing to hold things for a long time. (Strictly speaking, existing maps already satisfy the expected usage behavior, in that they meet the constraint that the retrieved pointer is valid as long as the object hasn't been collected, it's just that it always turns out that the object hasn't been collected because the map counts as a reference to it. But the GC would be allowed to never get around to collecting objects anyway...) The reason I wrote this proposal with a map rather than with references is sort of glossed over in the initial proposal. Basically: (1) I can't think of a real use case where I want a weak-reference, and I don't want to be able to look it up by some kind of key. And then, thinking about it, it occurred to me that a lot of the things that make weak references seem like an annoying mess to track are potentially significantly mitigated by grouping them all together. If all the weak references are being kept in one object, then we have one object/region that needs special treatment. If you have a million structs each of which has a weak reference to something, you have to do whatever your special processing is for every one of those weak references, separately from whatever you do for pointers in those structs. If you have a weak map, you can do your special processing for everything in the map and nothing not in the map, and that seems like it should be simpler. I think a weak-valued map only makes sense if the values in it are themselves pointers to things. Maps aren't addressable; nothing can be pointing-to the map entry, so the map entry can't itself be collectable or pinned. So I think it's only meaningful for a map to be weak when the value type is a pointer type itself. My vague intuition, which I haven't fully thought through, is that if a weak-valued map didn't mark the things it contains, but did follow pointers within them and mark those, and it were prohibited to use SetFinalizer on things that live in a weak-valued map, and things lived in only one such map, it would be adequate for the runtime to SetFinalizer() on them to delete them from the map. Except, as you note, the sync issue. I'm still not quite able to make sense of a weak-keyed map. I can describe it but I'm not sure what I'd use it for. I guess it would be a more implicit usage. With the weak-valued map, I have a key and I want to know whether there's an associated value. With the weak-keyed map, if I have a key, I know the value can't have been deleted. |
“Looking up by key” and “stored in a field alongside other data” often end up being interchangeable. Imagine an expensive immutable cached result. You store it with a weak ref alongside the raw data and return it. If no one is using it any more, it returns to nil, and you recalculate again later. At the end of this conversation, I don’t see the advantage that maps have over plain old weak refs, except possibly in syntax. But with generics, we could have a runtime.WeakRef[T] type. |
Weak references are basically isomorphic to finalizers. It is possible to implement weak references using finalizers, it's just that the result is awkward to use because you can't use ordinary pointers. So if you don't like finalizers, you won't like weak references. |
That was not my experience. Please see the discussion at https://github.com/go4org/intern/blob/3eb7198706b254cddea46d066da7316b2620ba7a/intern.go#L169. I’d love to be wrong about this, but the conclusion I came to was that weak refs are not implementable using finalizers. |
@bcmills made a similar point on Gopher Slack, so cc’ing him here. |
@josharian I sketched it out at #41303 (comment). The problem is that everybody has to use |
Given that Go already has finalizers, I claim that we could enable users to implement weak-reference maps by adding only ordinary weak references to the language. That seems simpler to me, because the semantics of ordinary weak references seem straightforward (just the invariants described in #43615 (comment)), in comparison to the confusion and ambiguity of “weak keys” vs. “weak values”. |
I don't think this addresses the point I was trying to make but as a practical matter I think it's unimportant. |
@ianlancetaylor, I don't believe this is correct. Cycles through a weak reference can be collected. (There are other forms of finalizers that can also deal with cycles, such as phantoms.)
@dsnet, that was partly my fault. I brought up weak references vs weak maps here. There are many variations on weak references. One variation is in how you tell an application that the weak reference has been collected. The runtime can just nil out the reference and leave the application to discover that later, or it can provide a callback or a "reference queue" that lets the user do finalization-style things. Usually when used with weak references, the callback or queue doesn't provide the original object, but some other bit of information you associated with it, for example, the map and key it was stored under. Doing this with weak references and callbacks/queues is nice because it's clear what happens when there are multiple weak references. While there's nothing impossible about having multiple finalizers for an object, we disallow it because it gets messy quickly (what order do they run in? if one of them revives the object, do you run the rest? how do you remove a finalizer?). One reasons Go finalizers are so tricky in the runtime is that they provide the original object to the callback, which means the runtime has to revive the object just to run the callback. This is what makes objects with finalizers in cycles un-collectable, delays reclaiming of memory for finalized objects, and means that a chain of N objects with finalizers takes N GC cycles to clean up. If weak references had a callback, but didn't provide the original object, they don't have these problems. |
Part of the appeal, to me, of the map is that you don't have to think about a specific reference. If you have a reference, it's a strong reference. The weak-reference nature is restricted to grabbing things from a map, and that already has the |
Ephemerons[1] solve some of the problems mentioned and are as close to what
Austin is suggesting as the literature provides. Wikipedia reports use in
Smalltalk, .Net, and a few more languages. Ephemerons aren't in Java since
the Hayes paper didn't come out until a few years later. I doubt an
ephemeron proposal will be accepted but it has a track record and provides
a vocabulary that can be used to discuss the issues.
[1] Hayes, B. Ephemerons: A new finalization mechanism. In *Proceedings of
the 12<sup>th</sup> ACM SIGPLAN Conference on Object-Oriented Programming,
Systems, Languages, and Applications* (Atlanta, GA, Oct. 5--9). ACM, New
York, 1997, 176--183.
…On Fri, Jan 15, 2021 at 4:36 PM seebs ***@***.***> wrote:
Part of the appeal, to me, of the map is that you don't have to think
about a specific reference. If you have a reference, it's a strong
reference. The weak-reference nature is restricted to grabbing things from
a map, and that already has the , ok behavior.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#43615 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHNNH6MGW5Y5QJ45LWF7ALS2CYPXANCNFSM4V36V3UA>
.
|
I was looking into a memory leak that stemmed from dynamically created types not being garbage collected (#28783). In that issue, @randall77 comments:
I don't know much about the implementation of Even if dynamically constructed types could be garbage collected, we still have the problem where some libraries (e.g., |
I ran into another example of a case where I want a thing that behaves like this, only in this case, I think I actually want the keys to be the weak references. I have a set of test hooks that basically allow tracking open/close pairs (or any equivalent "must be exactly paired" operation, really) on arbitrary objects. So, when you mark an object open, I grab a stack trace, and when you mark it closed, I grab a stack trace. If you double-close it, I can report where it was created, when it was originally closed, and when it got double-closed. So I have a map, keyed by But... That also means that, for the entire life of the program, I'm keeping all of these objects alive in an enormous map. And I have to keep them, forever, in case of a double-close. The only way it would be "safe" (for purposes of this functionality) not to keep an item in the map would be if there were no other references to it. But if I could have the keys of the map be weak references, such that if they are the only reference they are deleted from the map, that would actually work perfectly. If nothing has any references to an object anymore, I don't need to worry about one of those references being used for a double-close, so it's fine if that entry goes away. At the end of whatever operation I'm testing, I can scan the map for things. I still have to check whether or not they've been marked as closed, because they might be closed but not yet garbage-collected, but all the ones that got garbage collected go away. So I think I'm now sold on both "keys are themselves weak references" and "values are weak references, and if the value gets collected, the key stops referring to it", or something. |
@seebs couldn't you set a finalizer on each instance of |
If I understand the suggestion correctly, the conversion to |
Ah, that is an unfortunate point. |
I'm working with an application where some medium-size strings (dozens of characters) are often individually repeated tens of times across a graph structure that is hundreds of megabytes in size. The significance of possible memory savings from string interning for this specific use-case have had me reviewing the options yet again today, but I have been unable to find an appealing option. The go4.org/intern library referenced by Tailscale is probably among the best options for this, but I hate to depend on something that is actively thwarting the garbage collector's best efforts, and could theoretically break in a future Go release. I really, really wish that Go had a built in |
@coder543 have you played with github.com/josharian/intern? |
@josharian thanks, it looks interesting, I think I'll have to give it a closer look soon. Based on a quick review of the (very small!) code, it looks like it won't provide optimal string interning, but it'll probably be a lot better than the nothing that I'm currently using. |
This is a really interesting problem, I implemented my own version of a weak-reference using go 1.18 and will try to do the same for a weak map, but theoretically you could use this already for a weak map. I could as well change the implementation so that I do not "reattach" the finalizer, but let the reference die, but this way it works better, at least with the current GC. In any case, interesting topic. |
@xeus2001, for the reference, you might be interested to check how GC can crash when a pointer/interface is recreated from uintptr even if pointed object is known to be still alive and referenced via another regular pointer: #41303 (comment). |
I also have my own weakref implementation for go1.18 that seems to be doing fine, based on @xeus2001 's code but with a map object too. @navytux I've looked at the #41303 issue and it looks like to be an issue with using direct rematerialization of pointer through transforming I've done some load testing and so far I haven't had any crash, even with |
@MagicalTux : I added a comment to the linked issue why I think that the implementation is buggy and what has to be done to fix it (adding a load barrier). However, you could directly try out my version, which uses two atomics where memory ordering is guaranteed (when I get the specs correct). |
Tried my hand at a weakmap implementation here that uses a single persistent finalizer per non-empty map to automatically evict old entries based on memory pressure. The approach may solve the weak map for memory caching use case but not others. |
I think with #67552 accepted and implemented this is no longer relevant as anyone can implement those? |
This is an outgrowth of #32779, and also of https://mdlayher.com/blog/unsafe-string-interning-in-go/ and ensuing discussion of this on Gopher Slack.
Occasionally, it would be desireable to have weak references. Unfortunately, you can't implement these as a user; you need hooks into the runtime to make a safe weak reference implementation, I think. The string interning hack described above probably works but is not complying with the uintptr conversion rules.
In nearly every case where weak references are desireable, it's desireable to be able to look up specific weak references, which suggests something similar to a map data type. It's also desireable to be able to find out whether the weak reference exists, or to be able to request the reference, getting an actual stable non-weak reference back, without a separation between the query and the retrieval.
Conveniently, a map with a pointer key type already provides exactly this behavior. What it doesn't provide is the potential for GC to remove objects.
Thus, my proposal:
cache := make(weak map[string]*Data)
This is equivalent to
make(map[string]*Data)
, except that things which are in the map but not referenced anywhere else may become deleted from the map at any arbitrary time. However, retrieval from the map is guaranteed to either yield a nil (and a false in the ,ok form) or the pointer last stored for that key, and if it yields a non-nil pointer, that pointer is safe to use.The internal implementation details aren't 100% obvious to me. You could nearly do it with SetFinalizer, except you need a way to ensure that this doesn't clash with any other finalizers used on the item. (For instance, the same value could be present in two weak maps...)
I'm not totally sure about the syntax, but I'm pretty sure that this wouldn't compile now under any possible circumstances, so it doesn't break any existing code.
Why not
sync.WeakMap
, parallel to sync.Map and/or sync.Pool? Because type safety is nice, and because the ability to use or not use the ", ok" form of retrieval seems desireable. Plain old maps are easier and friendlier to use, and less error-prone.That said,
sync.WeakMap
, which would vaguely combine the behaviors of sync.Map and sync.Pool, would be viable too. The main issue this is intended to address is that "would like to keep this if it's convenient and/or if it's being used, but don't need to retain it if there's memory pressure" seems like a common use case.The text was updated successfully, but these errors were encountered: