Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove rehash! from Dict constructor #24345

Merged
merged 1 commit into from
Nov 21, 2017
Merged

Conversation

bkamins
Copy link
Member

@bkamins bkamins commented Oct 26, 2017

This PR follows https://discourse.julialang.org/t/is-there-a-bug-in-dict/6672/4.
It implements:

  • Dict{K,V}(d::Dict{K,V}) constructor performs a bare copy of d (without rehash!);
  • new test that currently fails and should pass after the PR;
  • rename ht_keyindex2 to ht_keyindex2! (this is not strictly necessary for this PR so can be omitted, but it cleans up the related code, as ht_keyindex2 in some cases may call rehash! thus it may mutate the passed dictionary)

I recommend to remove rehash! from the constructor. Another approach would be to rehash a newly created Dict:

function Dict{K,V}(d::Dict{K,V}) where V where K
    d2 = new(copy(d.slots), copy(d.keys), copy(d.vals), d.ndel, d.count, d.age,
            d.idxfloor, d.maxprobe)
    d2.ndel > 0 && rehash!(d2)
    d2
end

but I do not see if there would be any huge benefit of this - rehash! will be called anyway if needed by setindex! or get! and it adds performance penalty during construction.

new(copy(d.slots), copy(d.keys), copy(d.vals), 0, d.count, d.age, d.idxfloor,
d.maxprobe)
new(copy(d.slots), copy(d.keys), copy(d.vals), d.ndel, d.count, d.age,
d.idxfloor, d.maxprobe)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do dnew = new(...) and then rehash!(dnew) if d.ndel > 0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly what I have considered (as given in the comment to the PR). I have decided not to add it as it introduces a performance penalty for Dict creation. Probably @JeffBezanson can provide the guidance here.
Another approach would be to test d.ndel > "some value greater than 0" (i.e. if we have a lot of deleted entries we do rehash!, and if there are only a few we leave it without rehashing).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a second thought - my recommendation would be to remove rehash! from the constructor. Its main benefit is that later we do not have to rehash! both dictionaries twice. But this is something that we cannot avoid anyway.
If we believe that being able to a clean dictionary is an important feature I would rather recommend to:

  • rename rehash! to _rehash!;
  • expose a new rehash! as an exported function and define it for Dict and Set. This new rehash! should perform an additional check if the size is not decreased (as _rehash! is designed for growing only).

@stevengj
Copy link
Member

Note that the rehash! was added by @JeffBezanson in 41f3cf8 ... I tend to think we should rehash! the newly created dictionary if needed, but I agree that we shouldn't modify the old dict.

@JeffBezanson
Copy link
Member

Yes, this is ok as a (quasi-)bug fix. As a later enhancement, if the fraction of deleted keys is high we should probably copy the pairs by for (k,v) in d; newd[k]=v; end, rather than copying the whole structure followed by rehashing it.

@bkamins
Copy link
Member Author

bkamins commented Oct 26, 2017

Good idea - I will do some benchmarks to find a reasonable break even and propose another PR.

@bkamins
Copy link
Member Author

bkamins commented Nov 21, 2017

CI failure seems unrelated. Should this be merged?

@JeffBezanson JeffBezanson merged commit 2dad958 into JuliaLang:master Nov 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants