Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash maps #611

Merged
merged 92 commits into from
Jun 20, 2022
Merged

Hash maps #611

merged 92 commits into from
Jun 20, 2022

Conversation

wclodius2
Copy link
Contributor

Now that hash functions have been added to the standard library, it is time to add hash maps. Currently this PR only includes the documentation, doc/specs/stdlib_hash_maps.md, giving the proposed API. The proposed API has three module files: stdlib_key_data_wrapper.f90, which provides wrappers for the key and data components of the entries and some of the hash function; stdlib_chaining_hash_map.f90, which implements a chaining hash map with linked lists; and stdlib_open_hash_maps.f90, which implements a linear addressing open hashing map. After the (modified) API has been approved by two people I will add the required module files.

Created the doc/specs/stdlib_hash_maps.md markdown documentation for the OR.

[ticket: X]
Fixedd typos in stdlib_hash_maps.md

[ticket: X]
Removed `max_bits` and `load_factor` from the hash map data types and transformed
them into module constants. Removed `load_factor` function from
`stdlib_open_hash_map` and added `relative_loading` function.

[ticket: X]
Extensive rewrite reconciling two different versions of stdlib_hash_maps.md.

[ticket: X]
Copy link
Member

@awvwgk awvwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three general questions:

  1. Why represent the key / other type as array of int8 rather than as class(*), allocatable? At least for the other type the latter representation would be more flexible without the need for memory manipulations with transfer
  2. Would the definition of a container type be useful? This one could be used for all data structures (e.g. lists).
  3. Should the procedures for the maps be type bound? First this makes imports simpler and might also allow to define a common base class.

@awvwgk
Copy link
Member

awvwgk commented Dec 28, 2021

Regarding the naming of the modules I would propose to adjust the naming to be more concise:

  • stdlib_key_data_wrapper -> stdlib_container
  • stdlib_chaining_hash_map -> stdlib_hashmap_chaining
  • stdlib_open_hash_map -> stdlib_hashmap_open

If we can define an abstract base class, stdlib_map_class (for both ordered and unordered maps) or stdlib_hashmap_class (for unordered maps only) module might be a good name.

@wclodius2
Copy link
Contributor Author

@awvwgk

  1. I have never used CLASS(*). I assume I can ask for the underlying type using select(type). Does it preserve array information so I can ask rank and array size array size? Are rank independent procedure arguments widely implemented?
  2. I don't know if it will be useful to define a container type beyond what I define as the open_map_entry_type and chaining_map_entry_type. Would it be intended to replace the key and other components of those two types?
  3. I had started to define a parent type (in a module stdlib_hash_maps) to the open_hash_map_type and chaining_hash_map_type. It had some non_overridable type bound procedures and a large number of deferred type bound procedures. Implementing it fully would require decisions as to whether we want the use of inheritance, and whether the detailed implementation of the subtypes would be in modules or sub-modules.

@wclodius2
Copy link
Contributor Author

@awvwgk the proposed module names seem to me (and @milancurcic) better than what I currently have so I will change the modules and their files to what you suggest.

@wclodius2
Copy link
Contributor Author

In thinking further I am inclined tho rename stdlib_key_data_wrapper to stdlib_hashmap_container.

@gareth-nx
Copy link
Contributor

  1. have never used CLASS(*). I assume I can ask for the underlying type using select(type). Does it preserve array information so I can ask rank and array size array size? Are rank independent procedure arguments widely implemented?

I believe that CLASS(*) objects cannot be arrays. If you want it to hold an array then you have to create a derived type that holds the array.

map noted more for its diagnostics than its performance. Finally the
module, `stdlib_open_hash_map` defines a datatype,
`open_hash_map_type`, implementing a simple open addressing hash
map noted more for its diagnostics than its performance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So both of the hash maps are noted more for its diagnostics than its performance ? That's OK, just want to double check that is the intention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the performance is not baad. Generating the information used in the diagnostics is, I suspect, a small amount of overhead, but it is an overhead other tables will lack.

the ratio of the number of hash map probes to the number of subroutine
calls.
The maps make extensive use of pointers internally, but a private
finalization subroutine avoids memory leaks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this rely on the compiler to automatically call the finalization subroutine when a derived type goes out of scope? Last time I checked (albeit not recently) gfortran still didn't do this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it relies on this to avoid memory leaks, though the largest amount of memory will be in arrays and not in the linked list used to avoid memory allocations. I don't think there are memory leaks with my most recent code compiled using the latest version of gfortran, though my testing was very indirect.

Copy link
Member

@awvwgk awvwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question, why is the index inmap part of the public API? Shouldn't this be an implementation detail of the map type and hidden from the user?

Is there a drawback to reference entries sorely with a key? To retain a reference we could have a pointer to the data returned, if we want to safe redundant queries on the map.

```


#### `FIBONACCI_HASH` - maps an integer to a smaller number of bits
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this different to the version in stdlib_hash_32bit?

Copy link
Contributor Author

@wclodius2 wclodius2 Dec 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just giving access to the version in stdlib_hash_32bit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is the case, then i suggest to to repeat the specs here. A link to the page of Fibonacci_hash should be enough,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean "then i suggest not to repeat the specs here."?

Copy link
Contributor

@gareth-nx gareth-nx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is really important for stdlib. I don't have expertise in these data structures, but made a few minor comments.

@wclodius2
Copy link
Contributor Author

I believe that CLASS(*) objects cannot be arrays. If you want it to hold an array then you have to create a derived type that holds the array.

Then I believe CLASS(*) will be of very limited use.

@wclodius2
Copy link
Contributor Author

In thinking further I will rename stdlib_key_data_wrapper to stdlib_hashmap_wrappers.

Changed `stdlib_32_bit_key_data_wrapper` to `stdlib_hashmap_wrappers`
`stdlib_chaining_hash_map` to `stdlib_hashmap_chaining` and
`stdlib_open_hash_map` to `stdlib_hashmap_open` and the corresponding file names.

[ticket: X]
Revised the first paragraph of stdlib_hash_maps.md so it focusses more on
hash maps than on hash functions.

[ticket: X]
@wclodius2
Copy link
Contributor Author

@awvwgk I have changed the module names and revised the first paragraph to focus more on hash maps than hash functions., and have pushed the revised document.

The documentation was begun before the final versions of the hash functions codes
and the modue stdlib_32_bit_hash_functions was renamed stdlib_hash_32bit.

[ticket: X]
Improved documentation for copy_key, copy_other, and get.

[ticket: X]
Better documented that inmap is only useful as an index if valid.

[ticket: X]
Changed two error messages in get_char_keys so that they used error stop instead
of stop and provided more detailed information.

[ticket: X]
Removed write to error_unit aand documented error reporting via inmap.

[ticket: X]
@wclodius2
Copy link
Contributor Author

@14NGiestas git shows that I still have one change request from you that I have not addressed. Do you know which one that is?

Copy link
Member

@14NGiestas 14NGiestas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you addressed all my issues with this PR, I have nothing to add so far, so I'm giving my approval.

@jvdp1
Copy link
Member

jvdp1 commented Jun 17, 2022

@wclodius2 sorry for the delay. I opened a PR in your repo with some propositions for the specs.

Review of the specs of hash maps
@wclodius2
Copy link
Contributor Author

@jvdp1 I believe I have checked in your changes.

@jvdp1
Copy link
Member

jvdp1 commented Jun 18, 2022

@jvdp1 I believe I have checked in your changes.

Good. I am continuing the review.

@jvdp1
Copy link
Member

jvdp1 commented Jun 18, 2022

@wclodius2 I opened a 2 PR in your repo with some propositions for the docstrings and the tests:

  • PR 13 - addition of links
  • PR 14 - addition of tests

Addition of test_maps using test-drive
@wclodius2
Copy link
Contributor Author

@jvdp1 I have committed PR #13 and PR #14.

Copy link
Member

@jvdp1 jvdp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @wclodius2 for this feature.

@jvdp1
Copy link
Member

jvdp1 commented Jun 19, 2022

@fortran-lang/stdlib @milancurcic @ivan-pi @LKedward @awvwgk @gareth-nx and others interested by hash maps:
There are now 2 complete reviews. I am sure that we can still discuss the API. However, I propose that we merge this PR as experimental such that users can test it and play with it, except if you still see a major issue that would prohibit its merging.

@milancurcic
Copy link
Member

I haven't reviewed but agree for this PR to go forward.

@wclodius2
Copy link
Contributor Author

I have also updated from fortran-lang/stdlib.

@jvdp1
Copy link
Member

jvdp1 commented Jun 20, 2022

We discussed the status of this PR during the monthly call, and agreed to merge it such that users can test and play with this new feature. So, I will merge it.

@wclodius2 Thank you for this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reviewers needed This patch requires extra eyes topic: container (Abstract) data structures and containers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants