-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hash maps #611
Hash maps #611
Conversation
Created the doc/specs/stdlib_hash_maps.md markdown documentation for the OR. [ticket: X]
Fixedd typos in stdlib_hash_maps.md [ticket: X]
Removed `max_bits` and `load_factor` from the hash map data types and transformed them into module constants. Removed `load_factor` function from `stdlib_open_hash_map` and added `relative_loading` function. [ticket: X]
Extensive rewrite reconciling two different versions of stdlib_hash_maps.md. [ticket: X]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Three general questions:
- Why represent the key / other type as array of
int8
rather than asclass(*), allocatable
? At least for the other type the latter representation would be more flexible without the need for memory manipulations withtransfer
- Would the definition of a container type be useful? This one could be used for all data structures (e.g. lists).
- Should the procedures for the maps be type bound? First this makes imports simpler and might also allow to define a common base class.
Regarding the naming of the modules I would propose to adjust the naming to be more concise:
If we can define an abstract base class, |
|
@awvwgk the proposed module names seem to me (and @milancurcic) better than what I currently have so I will change the modules and their files to what you suggest. |
In thinking further I am inclined tho rename |
I believe that |
doc/specs/stdlib_hash_maps.md
Outdated
map noted more for its diagnostics than its performance. Finally the | ||
module, `stdlib_open_hash_map` defines a datatype, | ||
`open_hash_map_type`, implementing a simple open addressing hash | ||
map noted more for its diagnostics than its performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So both of the hash maps are noted more for its diagnostics than its performance
? That's OK, just want to double check that is the intention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the performance is not baad. Generating the information used in the diagnostics is, I suspect, a small amount of overhead, but it is an overhead other tables will lack.
doc/specs/stdlib_hash_maps.md
Outdated
the ratio of the number of hash map probes to the number of subroutine | ||
calls. | ||
The maps make extensive use of pointers internally, but a private | ||
finalization subroutine avoids memory leaks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this rely on the compiler to automatically call the finalization subroutine when a derived type goes out of scope? Last time I checked (albeit not recently) gfortran still didn't do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it relies on this to avoid memory leaks, though the largest amount of memory will be in arrays and not in the linked list used to avoid memory allocations. I don't think there are memory leaks with my most recent code compiled using the latest version of gfortran
, though my testing was very indirect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another question, why is the index inmap
part of the public API? Shouldn't this be an implementation detail of the map type and hidden from the user?
Is there a drawback to reference entries sorely with a key? To retain a reference we could have a pointer to the data returned, if we want to safe redundant queries on the map.
doc/specs/stdlib_hash_maps.md
Outdated
``` | ||
|
||
|
||
#### `FIBONACCI_HASH` - maps an integer to a smaller number of bits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this different to the version in stdlib_hash_32bit
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is just giving access to the version in stdlib_hash_32bit
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is the case, then i suggest to to repeat the specs here. A link to the page of Fibonacci_hash
should be enough,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean "then i suggest not to repeat the specs here."?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is really important for stdlib
. I don't have expertise in these data structures, but made a few minor comments.
Then I believe |
In thinking further I will rename |
Changed `stdlib_32_bit_key_data_wrapper` to `stdlib_hashmap_wrappers` `stdlib_chaining_hash_map` to `stdlib_hashmap_chaining` and `stdlib_open_hash_map` to `stdlib_hashmap_open` and the corresponding file names. [ticket: X]
Revised the first paragraph of stdlib_hash_maps.md so it focusses more on hash maps than on hash functions. [ticket: X]
@awvwgk I have changed the module names and revised the first paragraph to focus more on hash maps than hash functions., and have pushed the revised document. |
The documentation was begun before the final versions of the hash functions codes and the modue stdlib_32_bit_hash_functions was renamed stdlib_hash_32bit. [ticket: X]
Improved documentation for copy_key, copy_other, and get. [ticket: X]
Better documented that inmap is only useful as an index if valid. [ticket: X]
Changed two error messages in get_char_keys so that they used error stop instead of stop and provided more detailed information. [ticket: X]
Removed write to error_unit aand documented error reporting via inmap. [ticket: X]
@14NGiestas git shows that I still have one change request from you that I have not addressed. Do you know which one that is? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you addressed all my issues with this PR, I have nothing to add so far, so I'm giving my approval.
@wclodius2 sorry for the delay. I opened a PR in your repo with some propositions for the specs. |
Review of the specs of hash maps
@jvdp1 I believe I have checked in your changes. |
Good. I am continuing the review. |
@wclodius2 I opened a 2 PR in your repo with some propositions for the docstrings and the tests: |
Some additions to hash maps
Addition of test_maps using test-drive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @wclodius2 for this feature.
@fortran-lang/stdlib @milancurcic @ivan-pi @LKedward @awvwgk @gareth-nx and others interested by hash maps: |
I haven't reviewed but agree for this PR to go forward. |
I have also updated from fortran-lang/stdlib. |
We discussed the status of this PR during the monthly call, and agreed to merge it such that users can test and play with this new feature. So, I will merge it. @wclodius2 Thank you for this PR. |
Now that hash functions have been added to the standard library, it is time to add hash maps. Currently this PR only includes the documentation,
doc/specs/stdlib_hash_maps.md
, giving the proposed API. The proposed API has three module files:stdlib_key_data_wrapper.f90
, which provides wrappers for the key and data components of the entries and some of the hash function;stdlib_chaining_hash_map.f90
, which implements a chaining hash map with linked lists; andstdlib_open_hash_maps.f90
, which implements a linear addressing open hashing map. After the (modified) API has been approved by two people I will add the required module files.