-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add modify! function for lookup/update/insert/delete in one go #33758
base: master
Are you sure you want to change the base?
Conversation
else | ||
if idx > 0 | ||
h.age += 1 | ||
@inbounds h.keys[idx] = key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied this line h.keys[idx] = key
from get!
. But why is this required? (I'm new to Dict
code.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What’s going on here is WeakKeyDict
uses Dict
internally and sets a finalizar to the keys to mutate the Dict
. Unfortunately this usage has leaked into the implementation of Dict
.
Any time the GC is called (to allocate) it may call finalizers which might mutate the dictionary. To protect itself it checks age
whenever an allocation might occur (depends on the dictionary key and element types, etc).
This works in single threaded concurrency - but I have no idea if the implementation is valid under multithreading?
(@vtjnash did I get those details right?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! So, is it effectively GC.@preserve key
? And h.keys[idx] = key
is used because of bootstrapping issue or something? Or maybe it's just cheaper this way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I didn't actually address your original question (I was thinking of the h.age += 1
line... and actually now I look at it I'm not sure that is necessary here or not, since I don't see where the GC might run after ht_keyindex2!
or how changing h.vals[idx]
should affect the operation WeakKeyDict
, but I would defer to Jameson on that).
So the ht_keyindex2!
function prepares a slot where a key (and value) might reside but doesn't actually populate them with anything. A positive token means the slot already exists, so I think you just need to populate the new value (delete this line).
In the idx < 0
case the _setindex!
function populates h.slots[-idx]
, h.keys[-idx]
and h.vals[-idx]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, existing get!
method I mentioned was this:
Lines 446 to 464 in 6eebbbe
function get!(default::Callable, h::Dict{K,V}, key::K) where V where K | |
index = ht_keyindex2!(h, key) | |
index > 0 && return h.vals[index] | |
age0 = h.age | |
v = convert(V, default()) | |
if h.age != age0 | |
index = ht_keyindex2!(h, key) | |
end | |
if index > 0 | |
h.age += 1 | |
@inbounds h.keys[index] = key | |
@inbounds h.vals[index] = v | |
else | |
@inbounds _setindex!(h, v, key, -index) | |
end | |
return v | |
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm interesting. Still not sure what it protects against... it would be ideal to hear from Jeff or Jameson, but leaving it in doesn’t hurt (except maybe performance slightly).
base/abstractdict.jl
Outdated
@@ -465,6 +465,63 @@ function hash(a::AbstractDict, h::UInt) | |||
hash(hv, h) | |||
end | |||
|
|||
""" | |||
modify!(f, d::AbstractDict{K, V}, key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modify!
sounds a bit like a weird name to me. update!
sounds better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coming from SQL nomenclature, I think modify!
is superior to update!
since this operation can also mean removing an entry... not just updating an entry's value. Perhaps set!
is the best choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-visiting this naming and reading @clarkevans's comment, I actually think modify!
is still the best name so far. I think it's better at conveying the point that this API could be used for updating, inserting, or removing an entry. In particular, update!
(and with a lesser extent, set!
) sounds a bit weak at reminding the reader of the code about this. So, my preference is modify!
> set!
>> update!
.
@nalimilan What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both "modify" and "update" refer to the dict rather than the value AFAICT, so they both cover the case where you remove the value.
set!
could be interesting due to the parallel with get!
. Not sure whether it's a good parallel or not...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer set!
over modify!
aesthetically, and modify!
to update!
. What is the resistance of calling it set!
? It's a very interesting function. Given that it'll be most likely used in a do val .... end
block, a few more characters don't matter that much, perhaps, as andyferris suggests, set_or_delete!
. Anyway, in my mind, set!
given a function returning nothing
seems perfectly compatible with removing a key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clarkevans I think @nalimilan mentioned a good point that set!
sounds like a counter part of get
. However, I don't think it's a good parallel. If you consider set!
as a counter part of get(::Container, ::Key) ::Union{Nothing,Some{Value}}
, the signature of set!
should be set!(::Container, ::Union{Nothing,Some{Value}}, ::Key)
. This is how getindex
and setindex!
work.
It is conceivable to have this set!
API since it's very useful when you want to "rollback" a container:
original = get(dict, key) # works even if `key` does not exist
dict[key] = tmp
try
... do something with dict ...
finally
# remove the `key` if it didn't exist; or insert the `original` value if it did.
set!(dict, original, key)
end
So, I prefer using another verb for the API for this PR.
Both "modify" and "update" refer to the dict rather than the value AFAICT
@nalimilan But doesn't get
refer to a "slot" rather than the dict? So, isn't it strange that the mutation API refers to the whole dict rather than a slot? Also, other mutation verbs like setindex!
and setproperty!
seem to refer to a slot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clarkevans I think @nalimilan mentioned a good point that
set!
sounds like a counter part ofget
. However, I don't think it's a good parallel. If you considerset!
as a counter part ofget(::Container, ::Key) ::Union{Nothing,Some{Value}}
, the signature ofset!
should beset!(::Container, ::Union{Nothing,Some{Value}}, ::Key)
. This is howgetindex
andsetindex!
work.
Well currently we have get(f::Function, collection, key)
, whose signature is very the same as modify!(f, d, key)
in this PR. If we added get(collection, key)::Union{Nothing,Some}
it could make sense to add set!(collection, value::Union{Nothing,Some}}, key)
too.
@nalimilan But doesn't
get
refer to a "slot" rather than the dict? So, isn't it strange that the mutation API refers to the whole dict rather than a slot? Also, other mutation verbs likesetindex!
andsetproperty!
seem to refer to a slot.
I find it hard to tell TBH. It's hard to argue about these things, and if we wanted a fully consistent naming scheme maybe get
should be called getkey
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a readability standpoint, if naively read set!
I'd probably expect the key to exist after the operation. Which is also just like get!
. I'd feel surprised if an operation named set!
deleted something, to be honest!
it could make sense to add
I suppose a simpler modify!(d, key, v::Union{Nothing, Some{Value}})
could be nice. I note in other places in Base
we do seem to be struggling whether dispatching on ::Function
is desirable and these arguments seemed to be getting widened to ::Any
. Partly because some callable things are Function
or Callable
(like python objects). (I was also sensing that there seemed to be some more general arguments that we should be able to know the semantics of a method by the function name and number of arguments only.)
But doesn't get refer to a "slot" rather than the dict?
I agree - I have always read these verbs as refering to the "slot" on the container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-Authored-By: Milan Bouchet-Valat <[email protected]>
Bump; I run into wanting this functionality all the time. Quick question though: the docs mention being able to return Also, looks like this PR needs a quick rebase w/ exports.jl |
I've created and been using the mutate-or-widen version I think the only blocker is the naming although it'd be nice if someone who knows the interaction between
Yeah, I thought about requiring |
@vtjnash @JeffBezanson It'd be great if you guys can have a look at it, especially the interaction with |
I also think
@vtjnash @JeffBezanson To clarify my earlier point, I didn't understand why the key would written in the case |
I still think having some functionaility like this would be great - just wanted to note there's a good discussion (FAQ) at the end a JavaScript proposal for adding Comparing with this PR, it doesn't include deletion in that API. Comparing with #12157 it is using a closure-passing style rather than a reference-returning style. |
Bump: I'm messing around with |
base/abstractdict.jl
Outdated
julia> modify!(dict, "a") do val | ||
Some(val === nothing ? 1 : something(val) + 1) | ||
end | ||
Some(2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should make this result Some(1) => Some(2)
, so you get back both old and new, instead of just the new value (similar to replacefield!
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now implemented in 28900ad.
Co-authored-by: Jameson Nash <[email protected]> Co-authored-by: Joaquim Dias Garcia <[email protected]>
dictionary `d` is removed. If `f` returns non-`nothing` value `x`, `key => something(x)` | ||
is inserted or updated in `d` (equivalent to `d[key] = something(x)` but more efficient). | ||
|
||
Whether `Some{V}(value)` or `Some{typeof(value)}(value)` is returned is an implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear what value is referred to here, as no single value is "returned". If it's about old
, then better to name it explicitly, e.g.
Whether `Some{V}(value)` or `Some{typeof(value)}(value)` is returned is an implementation | |
Whether `old` has type `Some{V}(value)` or `Some{typeof(value)}(value)` is an implementation |
Whether `Some{V}(value)` or `Some{typeof(value)}(value)` is returned is an implementation | ||
defined behavior. The callback function `f` must use `old === nothing` or `old isa Some` | ||
instead of `old isa Some{valtype(d)}` unless the type of the dictionary `d` is known to | ||
define a certain behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like a very rare event to know about a certain specific behavior of a dict type. I would just remove "unless ...", and change "must use" to
The callback function
f
should generally useold === nothing
orold isa Some
instead ...
(just a suggestion)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer requreing the dict to return Some{V}
Maybe better be strict for now and require returning either |
Perhaps require |
if haskey(dict, key) | ||
old = Some{V}(dict[key]) | ||
val = f(old) | ||
else | ||
old = nothing | ||
val = f(old) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the compiler do this union splitting automatically?
@inbounds vold = h.vals[idx] | ||
vold = Some{V}(vold) | ||
vnew = f(vold) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the abstract dict case we use old
and val
here we use vold
and vnew
and in the documentation we use old
and new
. We should probably be consistent and I prefer old
and new
Whether `Some{V}(value)` or `Some{typeof(value)}(value)` is returned is an implementation | ||
defined behavior. The callback function `f` must use `old === nothing` or `old isa Some` | ||
instead of `old isa Some{valtype(d)}` unless the type of the dictionary `d` is known to | ||
define a certain behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer requreing the dict to return Some{V}
@inbounds _setindex!(h, something(vnew), key, -idx) | ||
end | ||
end | ||
return vold => vnew |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While playing around with the code, I noticed that the last line above allocates unless I define modify!
with @inline
. EDIT: Even with @inline
it allocates if one uses the return value.
I also noticed that the code runs a bit slower on master compared to 1.8.2 and 1.9.0-beta2 (9.0ns vs 7.7ns, always using @inline
and discarding the return value).
Code used
# negate an existing entry
function neg!(h, key)
x, y = modify!(h, key) do sx
sx === nothing ? nothing : -something(sx)
end
# allocation:
# return x == y
# no allocation:
return nothing
end
h = Dict('a' => 4, 'b' => 3)
@btime neg!($h, 'b')
Dict{Any,Any} with 1 entry: | ||
"a" => 1 | ||
|
||
julia> modify!(_ -> nothing, dict, "a") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
julia> modify!(_ -> nothing, dict, "a") | |
julia> modify!(Returns(nothing), dict, "a") |
I think that Returns
is now the preferred way. Or would this make the example too complicated?
otherwise `nothing` is passed. If `f` returns `nothing`, corresponding entry in the | ||
dictionary `d` is removed. If `f` returns non-`nothing` value `x`, `key => something(x)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
otherwise `nothing` is passed. If `f` returns `nothing`, corresponding entry in the | |
dictionary `d` is removed. If `f` returns non-`nothing` value `x`, `key => something(x)` | |
otherwise `nothing` is passed. If `f` returns `nothing`, the corresponding entry in the | |
dictionary `d` is removed. If `f` returns a non-`nothing` value `x`, `key => something(x)` |
With this PR the method
and the dedicated method in |
@tkf, are you still interested in pushing this through? Is there anything you need to make this happen? |
I think this probably has bitrotted enough that we should just remake the PR. |
An alternate API I suggested in #31199 (comment) doesn't support deletion, but simply extends |
I'm not sure if BTW, I think I have solved the problem with the spurious allocations that I mentioned in a previous post: If one changes |
We could still do deletions with |
I don't know a use case where the old value matters, and I wouldn't mind about using
would be as fast as the current implementation, and so would be the Regarding the name, another option would be something like |
There have been several proposals already, but after thinking about it for some time, I would like to add another one. My apologies if it's not new. When I speak of the current The problemWith the current version of
I see two shortcomings of this approach:
My proposalI want to propose the following for I've played around with the code. It seems that the new Code
Any thoughts? |
Instead of a tuple, |
This implements
set!
function I proposed in #31367 (comment).See #31367 for discussion on alternative API (e.g., token-based).
Quoting the docstring: