RobinDict #501

eulerkochy · 2019-05-14T08:52:29Z

I call this RobinDict, which implements Robinhood Hashing technique .

oxinabox · 2019-05-14T09:11:15Z

Some early points since i know this is not done yet.

Please use 4 spaces for intending. Not tabs.
IIRC: The AbstractDict interface requires get(dict, key, default) to be implemented, and if you do that (along with some other things you've already done) a bunch of things (like equality) will work for free.
for good style please always use return to return a value from a multiline function.

eulerkochy · 2019-05-14T09:19:32Z

I didn't quite get the first point, shouldn't Tab size : 4 do the same thing?
PS: Look at the bottom right-hand corner of the attached screenshot

src/robin_dict.jl

eulerkochy · 2019-05-14T09:38:51Z

My bad. That block of code was written in Jupyter Notebook. Thanks for pointing out. I'll take care of it from now on.

…

On Tue 14 May, 2019, 3:05 PM Lyndon White, ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In src/robin_dict.jl <#501 (comment)> : > + h.maxprobe = 0 + h.totalcost = 0 + h.idxfloor = 0 + return h +end + +function rh_search(h::RobinDict{K, V}, key::K) where {K, V} + sz = length(h.keys) + index = hashindex(key, sz) + cdibs = 0 + while true + if h.slots[index] == 0x0 + return -1 + elseif cdibs > h.dibs[index] + return -1 + elseif h.keys[index] == key I'm not sure how to configure you editor, but I can tell you there are definately tabs on this line. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#501?email_source=notifications&email_token=AIC2RRORR54OCWA3YRXKHOTPVKBURA5CNFSM4HMXWKQKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOBYRFHAQ#pullrequestreview-237130626>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AIC2RRPMT7WNSYXJUBWX5W3PVKBURANCNFSM4HMXWKQA> .

src/robin_dict.jl

codecov · 2019-06-02T08:48:55Z

Codecov Report

Merging #501 into master will increase coverage by 4.27%.
The diff coverage is 99.08%.

@@            Coverage Diff             @@
##           master     #501      +/-   ##
==========================================
+ Coverage   89.11%   93.39%   +4.27%     
==========================================
  Files          31       32       +1     
  Lines        2408     2529     +121     
==========================================
+ Hits         2146     2362     +216     
+ Misses        262      167      -95

Impacted Files	Coverage Δ
src/DataStructures.jl	`100% <ø> (ø)`	⬆️
src/robin_dict.jl	`99.08% <99.08%> (ø)`
src/int_set.jl	`100% <0%> (+0.79%)`	⬆️
src/heaps/minmax_heap.jl	`100% <0%> (+0.95%)`	⬆️
src/multi_dict.jl	`75% <0%> (+1.08%)`	⬆️
src/mutable_list.jl	`99.32% <0%> (+1.32%)`	⬆️
src/list.jl	`100% <0%> (+1.53%)`	⬆️
src/disjoint_set.jl	`98.07% <0%> (+1.85%)`	⬆️
src/priorityqueue.jl	`98.64% <0%> (+1.95%)`	⬆️
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3529577...d464edf. Read the comment docs.

codecov · 2019-06-02T08:48:55Z

Codecov Report

Merging #501 into master will decrease coverage by 1.48%.
The diff coverage is 77.73%.

@@            Coverage Diff             @@
##           master     #501      +/-   ##
==========================================
- Coverage    89.4%   87.91%   -1.49%     
==========================================
  Files          31       32       +1     
  Lines        2407     2690     +283     
==========================================
+ Hits         2152     2365     +213     
- Misses        255      325      +70

Impacted Files	Coverage Δ
src/DataStructures.jl	`100% <ø> (ø)`	⬆️
src/robin_dict.jl	`77.73% <77.73%> (ø)`
src/container_loops.jl	`51.21% <0%> (-4.07%)`	⬇️
src/sorted_dict.jl	`84.03% <0%> (-1.69%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9c55bda...7cb18f2. Read the comment docs.

eulerkochy · 2019-06-18T20:06:19Z

Ready for review @oxinabox

changes to insert fix rh_insert, write rh_search, getindex

Fix erronous h.count

remove un-necessary print statement

minor corrections in code, add _tablesz for code readability mistake in constructor with pairs

minor changes in rh_insert!, write out rh_delete! correct a blunder in rehash! minor change in sizehint! yet another mistake in sizehint!

change in rh_delete! minor changes in rh_delete! and pop! fix a typo

introduce max_lf(for benchmarking purposes), write constructor for iterables

constructors for tuples and iterables updated to cover all test cases remove un-necessary constant declaration indentation changes

eulerkochy · 2019-06-25T10:26:08Z

I thought I'll just post the benchmarking results here and keep updating it as I refactor the code.
Here's the result after adding rh_insert_for_rehash!:

julia> include("test/bench_robin_dict.jl")
.
Sample #1 Key => Integer , Size => 10^6 entries
.
        add_entries for RobinDict()
  847.246 ms (5914106 allocations: 196.91 MiB)
        add_entries for Dict()
  1.122 s (5985262 allocations: 156.50 MiB)
        add_entries for RobinDict{Int, Int}()
  182.534 ms (48 allocations: 106.67 MiB)
        add_entries for Dict{Int, Int}()
  112.181 ms (54 allocations: 65.17 MiB)
.
Sample #2 Key => Float32 , Size => 10^6 entries
.
        add_entries for RobinDict()
  821.209 ms (5914028 allocations: 196.91 MiB)
        add_entries for Dict()
  1.107 s (5986513 allocations: 156.52 MiB)
        add_entries for RobinDict{Float32, Float32}()
  149.796 ms (46 allocations: 64.00 MiB)
        add_entries for Dict{Float32, Float32}()
  99.266 ms (52 allocations: 34.50 MiB)
.
Sample #3 Key => String , Size => 10^6 entries
.
        add_entries for RobinDict()
  589.279 ms (1956745 allocations: 136.53 MiB)
        add_entries for Dict()
  568.355 ms (2562043 allocations: 104.26 MiB)
        add_entries for RobinDict{String, String}()
  365.851 ms (48 allocations: 106.67 MiB)
        add_entries for Dict{String, String}()
  439.566 ms (54 allocations: 65.17 MiB)
.
Sample #4 Key => Integer , Size => 10^7 entries
.
        add_entries for RobinDict()
  6.338 s (35658047 allocations: 970.77 MiB)
        add_entries for Dict()
  12.688 s (57823447 allocations: 1.39 GiB)
        add_entries for RobinDict{Int, Int}()
  1.656 s (54 allocations: 426.67 MiB)
        add_entries for Dict{Int, Int}()
  1.544 s (72 allocations: 541.17 MiB)
.
Sample #5 Key => Integer , Size => 10^5 entries
.
        add_entries for RobinDict()
  19.470 ms (444164 allocations: 13.45 MiB)
        add_entries for Dict()
  17.759 ms (396983 allocations: 11.73 MiB)
        add_entries for RobinDict{Int, Int}()
  8.130 ms (36 allocations: 6.67 MiB)
        add_entries for Dict{Int, Int}()
  5.527 ms (36 allocations: 5.67 MiB)
.
Sample #6 Key => Float32 , Size => 10^5 entries
.
        add_entries for RobinDict()
  19.748 ms (444207 allocations: 13.45 MiB)
        add_entries for Dict()
  17.693 ms (399228 allocations: 11.76 MiB)
        add_entries for RobinDict{Float32, Float32}()
  7.281 ms (34 allocations: 4.00 MiB)
        add_entries for Dict{Float32, Float32}()
  4.871 ms (34 allocations: 3.00 MiB)
.
Sample #7 Key => String , Size => 10^5 entries
.
        add_entries for RobinDict()
  22.418 ms (121883 allocations: 8.53 MiB)
        add_entries for Dict()
  18.473 ms (115882 allocations: 7.44 MiB)
        add_entries for RobinDict{String, String}()
  15.422 ms (36 allocations: 6.67 MiB)
        add_entries for Dict{String, String}()
  15.398 ms (36 allocations: 5.67 MiB)

Here are the plots, which can provide an insight into the working of RobinDict at load factor of 70%

vtjnash · 2019-06-25T22:00:18Z

which is pretty bad with respect to memory consumption

FWIW, the growth factor use usually use for AbstractDict is:
rehash!(h, length(h) > 64000 ? sz*2 : sz*4)

…onalities

test/test_robin_dict.jl

eulerkochy · 2019-06-28T18:08:18Z

@chethega @vtjnash @oxinabox review ? I'll rebase after that

eulerkochy · 2019-07-02T20:12:59Z

Bump

docs/src/robin_dict.md

Co-Authored-By: Jameson Nash <[email protected]>

eulerkochy · 2019-07-05T06:27:30Z

Bump

chethega

Looks fine by me. Good work!

src/robin_dict.jl

chethega · 2019-07-05T12:39:29Z

src/robin_dict.jl

+function _setindex!(h::RobinDict{K,V}, key::K, v0) where {K, V}
+    v = convert(V, v0)
+    sz = length(h.keys)
+    (h.count > ROBIN_DICT_LOAD_FACTOR * sz) && rehash!(h, sz<<2)


Always resizing by a factor of 4 looks sketchy (leads to very low load factors and hence memory waste for long-lived dicts). RobinDict should now be much cheaper to rehash than Base dict.

Maybe just copy what Base does? (i.e. use threshold to decide whether to grow by factor 2 or 4)

I did that on purpose as this change showed some improvement in benchmarks . In details described here

chethega · 2019-07-05T12:44:55Z

src/robin_dict.jl

+    index = rh_search(h, key)
+
+    index > 0 && return h.vals[index]
+


In theory, we could use the partial traversal from rh_search and continue insertion from there, instead of starting from desired_index again: rh_search aborts at the first entry that would be relocated on insertion.

Just leaving that here for posterity (can be separate PR to improve perf).

oxinabox · 2019-07-05T13:34:13Z

Given @chethega has approved, I am going to merge this.
Nice work @eulerkochy

Implement the constructors, model setindex! for RobinDict

82fe9e5

oxinabox reviewed May 14, 2019

View reviewed changes

src/robin_dict.jl Outdated Show resolved Hide resolved

oxinabox reviewed May 20, 2019

View reviewed changes

src/robin_dict.jl Outdated Show resolved Hide resolved

vtjnash reviewed May 30, 2019

View reviewed changes

eulerkochy marked this pull request as ready for review May 30, 2019 13:39

eulerkochy closed this Jun 3, 2019

eulerkochy reopened this Jun 3, 2019

eulerkochy force-pushed the robinhood branch from c79fb22 to 40db5ea Compare June 18, 2019 20:04

eulerkochy added 16 commits June 19, 2019 02:18

write basic form of rh_insert!

7a3ae05

changes to insert fix rh_insert, write rh_search, getindex

fix duplicate keys insertion

7fdc5ce

Fix erronous h.count

write getkey, haskey, iterate for RobinDict

385c781

remove un-necessary print statement

crude form of rh_delete, delete, pop, get

da2b517

rehash! function, inspired by the base/dict.jl

939e798

minor corrections in code, add _tablesz for code readability mistake in constructor with pairs

implement sizehint!

3c3e24d

add comment about ROBIN_DICT_LOAD_FACTOR

7f673e7

minor changes in rh_insert!, write out rh_delete! correct a blunder in rehash! minor change in sizehint! yet another mistake in sizehint!

add test for RobinDict

30503df

Shift backwards, and change in rh_insert!

cacdf2a

change in rh_delete! minor changes in rh_delete! and pop! fix a typo

major changes in rehash!, corrected rh_delete!

bd4e066

remove isslotdeleted 'cuz there ain't no tombstones! 🎉

6a74f9a

introduce max_lf(for benchmarking purposes), write constructor for iterables

add tests

b863818

get function, and trying out grow_to! for dict_with_eltype

c82257f

constructors for tuples and iterables updated to cover all test cases remove un-necessary constant declaration indentation changes

test for RobinDict constructed from vararg of Pairs

275bc63

update get method, add tests

c75fe3f

write get! function for RobinDict

78df37d

write rh_insert_for_rehash, some change in benchmarking code

c3a45fd

minor fix

dfffad1

eulerkochy added 5 commits June 26, 2019 15:13

add tests for invariants, and filter function

70d0aa5

remove isslotfilled from test

057c58b

add documentation for RobinDict

433328d

Add RobinDict to index.md

3bd850a

make the CI run 🎉

7df1c64

eulerkochy changed the title ~~[WIP] Dict with Robinhood hashing technique~~ RobinDict Jun 26, 2019

include src file in test_robin_dict, so as to not export inner-functi…

7cb71bf

…onalities

chethega reviewed Jun 27, 2019

View reviewed changes

test/test_robin_dict.jl Show resolved Hide resolved

chethega reviewed Jun 27, 2019

View reviewed changes

test/test_robin_dict.jl Outdated Show resolved Hide resolved

eulerkochy added 7 commits June 28, 2019 16:17

add check_invariants

bbfd1ee

modify an assertion

bf62078

scrap totalcost, maxprobe

aaf3fdd

update test, check_invariants

cd5b0ab

change hash_key and update tests

89ad075

catching the bug-attempt 1

ceab89c

hopefully, now it'll pass ! _/\_

25225a6

vtjnash reviewed Jul 3, 2019

View reviewed changes

docs/src/robin_dict.md Outdated Show resolved Hide resolved

docs/src/robin_dict.md Outdated Show resolved Hide resolved

docs/src/robin_dict.md Show resolved Hide resolved

eulerkochy and others added 2 commits July 4, 2019 14:19

correcting a typo, correcting benchmarks

de00dad

Changes in robindict.md

09827d5

Co-Authored-By: Jameson Nash <[email protected]>

chethega approved these changes Jul 5, 2019

View reviewed changes

Update robin_dict.jl

d464edf

oxinabox merged commit 664847e into JuliaCollections:master Jul 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RobinDict #501

RobinDict #501

eulerkochy commented May 14, 2019 •

edited

Loading

oxinabox commented May 14, 2019 •

edited

Loading

eulerkochy commented May 14, 2019 •

edited

Loading

eulerkochy commented May 14, 2019 via email

codecov bot commented Jun 2, 2019 •

edited

Loading

codecov bot commented Jun 2, 2019

eulerkochy commented Jun 18, 2019

eulerkochy commented Jun 25, 2019

vtjnash commented Jun 25, 2019

eulerkochy commented Jun 28, 2019

eulerkochy commented Jul 2, 2019

eulerkochy commented Jul 5, 2019

chethega left a comment

chethega Jul 5, 2019

eulerkochy Jul 5, 2019

chethega Jul 5, 2019

oxinabox commented Jul 5, 2019

RobinDict #501

RobinDict #501

Conversation

eulerkochy commented May 14, 2019 • edited Loading

oxinabox commented May 14, 2019 • edited Loading

eulerkochy commented May 14, 2019 • edited Loading

eulerkochy commented May 14, 2019 via email

codecov bot commented Jun 2, 2019 • edited Loading

Codecov Report

codecov bot commented Jun 2, 2019

Codecov Report

eulerkochy commented Jun 18, 2019

eulerkochy commented Jun 25, 2019

vtjnash commented Jun 25, 2019

eulerkochy commented Jun 28, 2019

eulerkochy commented Jul 2, 2019

eulerkochy commented Jul 5, 2019

chethega left a comment

Choose a reason for hiding this comment

chethega Jul 5, 2019

Choose a reason for hiding this comment

eulerkochy Jul 5, 2019

Choose a reason for hiding this comment

chethega Jul 5, 2019

Choose a reason for hiding this comment

oxinabox commented Jul 5, 2019

eulerkochy commented May 14, 2019 •

edited

Loading

oxinabox commented May 14, 2019 •

edited

Loading

eulerkochy commented May 14, 2019 •

edited

Loading

codecov bot commented Jun 2, 2019 •

edited

Loading