Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add modify! function for lookup/update/insert/delete in one go #33758

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 57 additions & 0 deletions base/abstractdict.jl
Original file line number Diff line number Diff line change
Expand Up @@ -465,6 +465,63 @@ function hash(a::AbstractDict, h::UInt)
hash(hv, h)
end

"""
modify!(f, d::AbstractDict{K, V}, key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modify! sounds a bit like a weird name to me. update! sounds better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, update! and set! were suggested to be simpler shorthands of what I call modify! in this PR. I don't mind renaming to something else, though.

Copy link
Member

@clarkevans clarkevans Jul 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming from SQL nomenclature, I think modify! is superior to update! since this operation can also mean removing an entry... not just updating an entry's value. Perhaps set! is the best choice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-visiting this naming and reading @clarkevans's comment, I actually think modify! is still the best name so far. I think it's better at conveying the point that this API could be used for updating, inserting, or removing an entry. In particular, update! (and with a lesser extent, set!) sounds a bit weak at reminding the reader of the code about this. So, my preference is modify! > set! >> update!.

@nalimilan What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both "modify" and "update" refer to the dict rather than the value AFAICT, so they both cover the case where you remove the value.

set! could be interesting due to the parallel with get!. Not sure whether it's a good parallel or not...

Copy link
Member

@clarkevans clarkevans Jul 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer set! over modify! aesthetically, and modify! to update!. What is the resistance of calling it set!? It's a very interesting function. Given that it'll be most likely used in a do val .... end block, a few more characters don't matter that much, perhaps, as andyferris suggests, set_or_delete!. Anyway, in my mind, set! given a function returning nothing seems perfectly compatible with removing a key.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clarkevans I think @nalimilan mentioned a good point that set! sounds like a counter part of get. However, I don't think it's a good parallel. If you consider set! as a counter part of get(::Container, ::Key) ::Union{Nothing,Some{Value}}, the signature of set! should be set!(::Container, ::Union{Nothing,Some{Value}}, ::Key). This is how getindex and setindex! work.

It is conceivable to have this set! API since it's very useful when you want to "rollback" a container:

original = get(dict, key)  # works even if `key` does not exist
dict[key] = tmp
try
    ... do something with dict ...
finally
    # remove the `key` if it didn't exist; or insert the `original` value if it did.
    set!(dict, original, key)
end

So, I prefer using another verb for the API for this PR.

Both "modify" and "update" refer to the dict rather than the value AFAICT

@nalimilan But doesn't get refer to a "slot" rather than the dict? So, isn't it strange that the mutation API refers to the whole dict rather than a slot? Also, other mutation verbs like setindex! and setproperty! seem to refer to a slot.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clarkevans I think @nalimilan mentioned a good point that set! sounds like a counter part of get. However, I don't think it's a good parallel. If you consider set! as a counter part of get(::Container, ::Key) ::Union{Nothing,Some{Value}}, the signature of set! should be set!(::Container, ::Union{Nothing,Some{Value}}, ::Key). This is how getindex and setindex! work.

Well currently we have get(f::Function, collection, key), whose signature is very the same as modify!(f, d, key) in this PR. If we added get(collection, key)::Union{Nothing,Some} it could make sense to add set!(collection, value::Union{Nothing,Some}}, key) too.

@nalimilan But doesn't get refer to a "slot" rather than the dict? So, isn't it strange that the mutation API refers to the whole dict rather than a slot? Also, other mutation verbs like setindex! and setproperty! seem to refer to a slot.

I find it hard to tell TBH. It's hard to argue about these things, and if we wanted a fully consistent naming scheme maybe get should be called getkey...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a readability standpoint, if naively read set! I'd probably expect the key to exist after the operation. Which is also just like get!. I'd feel surprised if an operation named set! deleted something, to be honest!

it could make sense to add

I suppose a simpler modify!(d, key, v::Union{Nothing, Some{Value}}) could be nice. I note in other places in Base we do seem to be struggling whether dispatching on ::Function is desirable and these arguments seemed to be getting widened to ::Any. Partly because some callable things are Function or Callable (like python objects). (I was also sensing that there seemed to be some more general arguments that we should be able to know the semantics of a method by the function name and number of arguments only.)

But doesn't get refer to a "slot" rather than the dict?

I agree - I have always read these verbs as refering to the "slot" on the container.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about crud! it does create, update, delete... and retrieve. There is also merge! aka upsert! from SQL. I think crud! is a new kind of word that is fun to say and perhaps reflects this hybrid semantics?


Lookup and then update, insert or delete in one go without re-computing the hash.

`f` is a callable object that must accept `Union{Some{V}, Nothing}` and return
tkf marked this conversation as resolved.
Show resolved Hide resolved
`Union{T, Some{T}, Nothing}` where `T` is a type [`convert`](@ref)-able to the value type
tkf marked this conversation as resolved.
Show resolved Hide resolved
`V`. The value `Some(d[key])` is passed to `f` if `haskey(d, key)`; otherwise `nothing`
is passed. If `f` returns `nothing`, corresponding entry in the dictionary `d` is removed.
If `f` returns non-`nothing` value `x`, `something(x)` is inserted to `d`.
tkf marked this conversation as resolved.
Show resolved Hide resolved

`modify!` returns whatever `f` returns as-is.

# Examples
```jldoctest
julia> dict = Dict("a" => 1);

julia> modify!(dict, "a") do val
Some(val === nothing ? 1 : something(val) + 1)
end
Some(2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should make this result Some(1) => Some(2), so you get back both old and new, instead of just the new value (similar to replacefield!).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now implemented in 28900ad.


julia> dict
Dict{String,Int64} with 1 entry:
"a" => 2

julia> dict = Dict();

julia> modify!(dict, "a") do val
Some(val === nothing ? 1 : something(val) + 1)
tkf marked this conversation as resolved.
Show resolved Hide resolved
end
Some(1)

julia> dict
Dict{Any,Any} with 1 entry:
"a" => 1

julia> modify!(_ -> nothing, dict, "a")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
julia> modify!(_ -> nothing, dict, "a")
julia> modify!(Returns(nothing), dict, "a")

I think that Returns is now the preferred way. Or would this make the example too complicated?


julia> dict
Dict{Any,Any} with 0 entries
```
"""
function modify!(f, dict::AbstractDict, key)
if haskey(dict, key)
val = f(Some(dict[key]))
tkf marked this conversation as resolved.
Show resolved Hide resolved
else
val = f(nothing)
end
if val === nothing
delete!(dict, key)
else
dict[key] = something(val)
end
return val
end

function getindex(t::AbstractDict, key)
v = get(t, key, secret_table_token)
if v === secret_table_token
Expand Down
39 changes: 39 additions & 0 deletions base/dict.jl
Original file line number Diff line number Diff line change
Expand Up @@ -391,6 +391,45 @@ function setindex!(h::Dict{K,V}, v0, key::K) where V where K
return h
end

function modify!(f, h::Dict{K}, key0) where K
tkf marked this conversation as resolved.
Show resolved Hide resolved
key = convert(K, key0)
if !isequal(key, key0)
throw(ArgumentError("$(limitrepr(key0)) is not a valid key for type $K"))
end

# Ideally, to improve performance for the case that requires
# resizing, we should use something like `ht_keyindex` while
# keeping computed hash value and then do something like
# `ht_keyindex2!` if `f` returns non-`nothing`.
idx = ht_keyindex2!(h, key)

age0 = h.age
if idx > 0
@inbounds vold = h.vals[idx]
vnew = f(Some(vold))
tkf marked this conversation as resolved.
Show resolved Hide resolved
else
vnew = f(nothing)
end
if h.age != age0
idx = ht_keyindex2!(h, key)
end

if vnew === nothing
if idx > 0
_delete!(h, idx)
end
else
if idx > 0
h.age += 1
@inbounds h.keys[idx] = key
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this line h.keys[idx] = key from get!. But why is this required? (I'm new to Dict code.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What’s going on here is WeakKeyDict uses Dict internally and sets a finalizar to the keys to mutate the Dict. Unfortunately this usage has leaked into the implementation of Dict.

Any time the GC is called (to allocate) it may call finalizers which might mutate the dictionary. To protect itself it checks age whenever an allocation might occur (depends on the dictionary key and element types, etc).

This works in single threaded concurrency - but I have no idea if the implementation is valid under multithreading?

(@vtjnash did I get those details right?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! So, is it effectively GC.@preserve key? And h.keys[idx] = key is used because of bootstrapping issue or something? Or maybe it's just cheaper this way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't actually address your original question (I was thinking of the h.age += 1 line... and actually now I look at it I'm not sure that is necessary here or not, since I don't see where the GC might run after ht_keyindex2! or how changing h.vals[idx] should affect the operation WeakKeyDict, but I would defer to Jameson on that).

So the ht_keyindex2! function prepares a slot where a key (and value) might reside but doesn't actually populate them with anything. A positive token means the slot already exists, so I think you just need to populate the new value (delete this line).

In the idx < 0 case the _setindex! function populates h.slots[-idx], h.keys[-idx] and h.vals[-idx].

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, existing get! method I mentioned was this:

julia/base/dict.jl

Lines 446 to 464 in 6eebbbe

function get!(default::Callable, h::Dict{K,V}, key::K) where V where K
index = ht_keyindex2!(h, key)
index > 0 && return h.vals[index]
age0 = h.age
v = convert(V, default())
if h.age != age0
index = ht_keyindex2!(h, key)
end
if index > 0
h.age += 1
@inbounds h.keys[index] = key
@inbounds h.vals[index] = v
else
@inbounds _setindex!(h, v, key, -index)
end
return v
end

h.keys[index] = key was added in ff4706b (#9595)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm interesting. Still not sure what it protects against... it would be ideal to hear from Jeff or Jameson, but leaving it in doesn’t hurt (except maybe performance slightly).

@inbounds h.vals[idx] = something(vnew)
else
@inbounds _setindex!(h, something(vnew), key, -idx)
end
end
return vnew
end

"""
get!(collection, key, default)

Expand Down
1 change: 1 addition & 0 deletions base/exports.jl
Original file line number Diff line number Diff line change
Expand Up @@ -514,6 +514,7 @@ export
mapreduce,
merge!,
merge,
modify!,
pairs,
reduce,
setdiff!,
Expand Down
51 changes: 51 additions & 0 deletions test/dict.jl
Original file line number Diff line number Diff line change
Expand Up @@ -1057,8 +1057,11 @@ end
new{keytype(d), valtype(d)}(d)
end
end
Base.Dict(td::TestDict) = td.dict
Base.haskey(td::TestDict, key) = haskey(td.dict, key)
Base.setindex!(td::TestDict, args...) = setindex!(td.dict, args...)
Base.getindex(td::TestDict, args...) = getindex(td.dict, args...)
Base.delete!(td::TestDict, key) = delete!(td.dict, key)
Base.pairs(D::TestDict) = pairs(D.dict)
testdict = TestDict(:a=>1, :b=>2)
map!(v->v-1, values(testdict))
Expand All @@ -1072,3 +1075,51 @@ end
@test testdict[:b] == 1
end
end

@testset "modify!(f, ::$constructor, key)" for constructor in [
Dict,
TestDict,
]
@testset "update" begin
dict = constructor(Dict("a" => 1))

@test modify!(dict, "a") do val
Some(val === nothing ? 1 : something(val) + 1)
end == Some(2)

@test Dict(dict) == Dict("a" => 2)
end

@testset "insert" begin
dict = constructor(Dict())

@test modify!(dict, "a") do val
Some(val === nothing ? 1 : something(val) + 1)
end == Some(1)

@test Dict(dict) == Dict("a" => 1)
end

@testset "delete" begin
dict = constructor(Dict("a" => 1))
@test modify!(_ -> nothing, dict, "a") === nothing
@test Dict(dict) == Dict()
end

@testset "no-op" begin
dict = constructor(Dict("a" => 1))
@test modify!(_ -> nothing, dict, "b") === nothing
@test Dict(dict) == Dict("a" => 1)
end

@testset "mutation inside `f`" begin
dict = constructor(Dict())

@test modify!(dict, "a") do val
dict["a"] = 0
Some(val === nothing ? 1 : something(val) + 1)
end == Some(1)

@test Dict(dict) == Dict("a" => 1)
end
end