Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move HashMap to liballoc #51846

Closed
wants to merge 3 commits into from
Closed

Conversation

Amanieu
Copy link
Member

@Amanieu Amanieu commented Jun 27, 2018

This PR adds a weak lang item hashmap_random_keys which returns two random numbers which are used to initialize SipHash for a new HashMap. The previous implementation is moved into libstd which defines the lang item.

This eliminates the only reason for keeping HashMap in libstd, and allows it to be used in no_std projects. They will still need to define the lang item manually, but no_std binaries are still nightly-only at the moment.

r? @SimonSapin

cc @glandium

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 27, 2018
@SimonSapin
Copy link
Contributor

They will still need to define the lang item manually

This is a step in the opposite direction of #51607 and rust-lang/rfcs#2480 :(

As you say it doesn’t make a practical difference at the moment, but hopefully we can eliminate blockers one by one to get there. So I’d rather not add a new blocker without a plan of how to solve it eventually.

@rust-highfive

This comment has been minimized.

@Ericson2314
Copy link
Contributor

Ericson2314 commented Jun 27, 2018

You don't need to define the lang item manually. You can just not use RandomState, right? Also, @SimonSapin this doesn't actually preclude stabilization because we can still have unstable APIs in a stable crate, right?

@SimonSapin
Copy link
Contributor

My understanding of weak lang items is that just linking the alloc crate would require that item to be defined, even if HashMap or RandomState are not used. Is that correct @Amanieu? (Defining lang items is unstable.)

This wouldn’t directly preclude stabilization of the crate, only make it less useful.

@hanna-kruppe
Copy link
Contributor

You don't need to define the lang item manually. You can just not use RandomState, right?

If that is true, it's an accidential side effect of how rustc chooses which functions to emit code form, and thus rather fragile.

Besides, "don't use this stable API in certain circumstances or else you get linker errors" is horrible UX.

this doesn't actually preclude stabilization because we can still have unstable APIs in a stable crate, right?

If no_std usage of liballoc is liable to randomly break with linker errors (see above), the intended users of stable liballoc are hosed.

@Amanieu
Copy link
Member Author

Amanieu commented Jun 27, 2018

My understanding of weak lang items is that just linking the alloc crate would require that item to be defined, even if HashMap or RandomState are not used. Is that correct @Amanieu? (Defining lang items is unstable.)

That is correct.

This wouldn’t directly preclude stabilization of the crate, only make it less useful.

In my view, the stabilization of liballoc is primarily aimed at allowing library crates which require memory allocation to be used in no_std binaries. This will become possible without the need to gate it on a "nightly" feature.

However, creating no_std binaries is still only possible on nightly, so this PR isn't really blocking anything that is currently possible on stable.

As you say it doesn’t make a practical difference at the moment, but hopefully we can eliminate blockers one by one to get there. So I’d rather not add a new blocker without a plan of how to solve it eventually.

I'm actually somewhat uncertain on how we can move forward with this. The only reasonable option that I can think of at the moment is to use a dynamic hook like we do for OOM. However I feel that this makes it too easy to forget to set the hook in no_std projects.

@SimonSapin
Copy link
Contributor

I'm actually somewhat uncertain on how we can move forward with this.

Yes this is a tough problem, which is why I’m hesitant to accept this PR as proposed.

@Amanieu
Copy link
Member Author

Amanieu commented Jun 27, 2018

Maybe we could make HashMap in liballoc unstable, while keeping the re-export in libstd stable. This would allow us to explore alternative ways of providing HashMap in liballoc in the future such as:

  • Supporting default type parameters in typedefs. The liballoc HashMap would then need to be explicitly provided with a hasher, while the libstd typedef would automatically fill in the default hasher.
  • Using a newtype in libstd as suggested here, which has the same API as the liballoc HashMap but with a default hasher. I'm not a big fan of this one since it will cause API issues when passing a liballoc HashMap to a function which expects a libstd HashMap.

@rust-highfive

This comment has been minimized.

@SimonSapin
Copy link
Contributor

Defining HashMap in an unstable module and re-exporting it in a stable std module sounds fine for the reasons you gave. But it doesn’t help with the lang item being required, does it?

@Amanieu
Copy link
Member Author

Amanieu commented Jun 27, 2018

True, but no_std binaries require nightly anyways. And I feel that it's better to have HashMap as part of liballoc since it's a very widely used data structure which is useful to have in no_std.

@rust-highfive

This comment has been minimized.

@kennytm kennytm added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Jun 27, 2018
@SimonSapin
Copy link
Contributor

More than requiring Nightly, I suppose what bothers me is adding one more obscure step that one might not care about but need to go through anyway (with often unhelpful error messages) in order to get a no_std program to build.

@rust-highfive

This comment has been minimized.

@Ericson2314
Copy link
Contributor

@SimonSapin We agree that's bothersome, but not giving alloc HashMap, arguably tied with Vec as the most widely used datastruture, is also bothersome. The obscure step formalizes the TODO it's otherwise far too easy to forget about.

@glandium
Copy link
Contributor

no_std is very close to not requiring nightly, and this would be a huge step back.

@SimonSapin
Copy link
Contributor

For what it’s worth I’m not arguing for not making this move ever, only for having a plan of what stabilization might look like.

@Ericson2314
Copy link
Contributor

Ericson2314 commented Jun 27, 2018

It's not a coincidence that stuff like choosing the global allocator, the OOM hook, panic hook, and now this keeps on coming up. The stabilization has to be one of a) general needs-provides, b) magic-Heap like thing with trait c) default alias on type parameters, d) newtype e) dynamic mutable hooks, based on how much in a rush we decide to be. Some magic cheap yet elegant new type of solution has never come up, and I doubt ever will.

@rust-highfive

This comment has been minimized.

@Amanieu
Copy link
Member Author

Amanieu commented Jun 28, 2018

I feel that having HashMap available in liballoc is better than the current status quo, even though it is not ideal due to the added lang item:

  • As @Ericson2314 said, HashMap is the second most widely used collection after Vec. There is a lot of demand for having it available in no_std, even if it must be unstable.
  • Currently, anyone wanting to use HashMap in no_std must use my hashmap_core crate (nightly-only). This crate breaks every time the allocator API is updated.
  • Authors of no_std binaries have had to deal with a lot of churn recently due to the global allocator changes. Adding a lang item is not a big deal for them, since they are on nightly anyways.
  • We will open a tracking issue for HashMap in liballoc in which we can start working on solutions for proper HashMap support in liballoc.

Source: My current project uses no_std + liballoc.

@rust-highfive

This comment has been minimized.

@SimonSapin
Copy link
Contributor

Yes unstable features churn is painful for everyone, but for allocation we’re reaching the end: with 1.28 + rust-lang/rfcs#2480 I believe that hashmap_core can be made compatible with stable Rust.

@Amanieu
Copy link
Member Author

Amanieu commented Jun 28, 2018

hashmap_core is a straight copy & paste of src/libstd/collections/hash. It uses tons of unstable features internally, such as the Alloc trait, fused iterators, etc.

Full list of features used: alloc, dropck_eyepatch, allocator_api, fused, ptr_internals, try_reserve, nonnull_cast

@rust-highfive

This comment has been minimized.

@pitdicker
Copy link
Contributor

It seems to me the issue with hashmap_random_keys is related to the problem we have with no_std support in rand, rust-random/rand#313. If there is no OS, which I believe is the case for many no_std scenarios, where do you get some randomness from?

The current idea is to use the OS when available (OsRng), a user-space entropy collector as fallback (JitterRng), and to somehow make it possible to use a user-supplied source. That source may for example be a hardware RNG, like RDRAND on x86. See also rust-lang-nursery/portability-wg#3.

It would be nice if any solution for HashMap also worked for rand, as it is basically the same problem. Can I be cc-ed on further discussions?

@bors

This comment has been minimized.

@Ericson2314
Copy link
Contributor

Ericson2314 commented Jul 2, 2018

rust-lang/rfcs#2492 provides the single mechanism to solve separate HashMap from both defaults we would want it to have: Alloc and BuildHasher.

@rust-highfive

This comment has been minimized.

@Amanieu
Copy link
Member Author

Amanieu commented Jul 2, 2018

I've changed HashMap (and its associated modules) to be unstable in liballoc but stable in libstd. This allows use to change the API later on, for example by using a different default hasher in liballoc.

@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-llvm-3.9 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
[00:03:26]    Compiling std_unicode v0.0.0 (file:///checkout/src/libstd_unicode)
[00:03:27]    Compiling alloc_system v0.0.0 (file:///checkout/src/liballoc_system)
[00:03:27]    Compiling panic_abort v0.0.0 (file:///checkout/src/libpanic_abort)
[00:03:31]    Compiling panic_unwind v0.0.0 (file:///checkout/src/libpanic_unwind)
[00:03:35] warning: function is never used: `hashmap_random_keys`
[00:03:35]   --> libstd/sys/unix/rand.rs:14:1
[00:03:35]    |
[00:03:35] 14 | pub fn hashmap_random_keys() -> (u64, u64) {
[00:03:35]    |
[00:03:35]    = note: #[warn(dead_code)] on by default
[00:03:35] 
[00:03:35] warning: function is never used: `getrandom`
[00:03:35] warning: function is never used: `getrandom`
[00:03:35]   --> libstd/sys/unix/rand.rs:36:5
[00:03:35]    |
[00:03:35] 36 |     fn getrandom(buf: &mut [u8]) -> libc::c_long {
[00:03:35] 
[00:03:35] warning: function is never used: `getrandom_fill_bytes`
[00:03:35]   --> libstd/sys/unix/rand.rs:45:5
[00:03:35]    |
[00:03:35]    |
[00:03:35] 45 |     fn getrandom_fill_bytes(v: &mut [u8]) -> bool {
[00:03:35] 
[00:03:35] 
[00:03:35] warning: function is never used: `is_getrandom_available`
[00:03:35]   --> libstd/sys/unix/rand.rs:67:5
[00:03:35]    |
[00:03:35] 67 |     fn is_getrandom_available() -> bool {
[00:03:35] 
[00:03:35] warning: function is never used: `fill_bytes`
[00:03:35]   --> libstd/sys/unix/rand.rs:93:5
[00:03:35]    |
[00:03:35]    |
[00:03:35] 93 |     pub fn fill_bytes(v: &mut [u8]) {
[00:03:35] 
[00:03:42]     Finished release [optimized] target(s) in 39.13s
[00:03:42] Copying stage0 std from stage0 (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu / x86_64-unknown-linux-gnu)
[00:03:42] travis_fold:end:stage0-std
---
[00:55:02]    Compiling alloc v0.0.0 (file:///checkout/src/liballoc)
[00:55:04] error[E0308]: mismatched types
[00:55:04]     --> liballoc/collections/hash/map.rs:3666:20
[00:55:04]      |
[00:55:04] 3666 |         if let Err(CapacityOverflow) = empty_bytes.try_reserve(MAX_USIZE) {
[00:55:04]      |                    ^^^^^^^^^^^^^^^^ expected enum `collections::CollectionAllocErr`, found enum `std::collections::CollectionAllocErr`
[00:55:04]      = note: expected type `collections::CollectionAllocErr`
[00:55:04]                 found type `std::collections::CollectionAllocErr`
[00:55:04] 
[00:55:04] error[E0308]: mismatched types
[00:55:04] error[E0308]: mismatched types
[00:55:04]     --> liballoc/collections/hash/map.rs:3670:24
[00:55:04]      |
[00:55:04] 3670 |             if let Err(CapacityOverflow) = empty_bytes.try_reserve(max_no_ovf) {
[00:55:04]      |                        ^^^^^^^^^^^^^^^^ expected enum `collections::CollectionAllocErr`, found enum `std::collections::CollectionAllocErr`
[00:55:04]      = note: expected type `collections::CollectionAllocErr`
[00:55:04]                 found type `std::collections::CollectionAllocErr`
[00:55:04] 
[00:55:04] error[E0308]: mismatched types
[00:55:04] error[E0308]: mismatched types
[00:55:04]     --> liballoc/collections/hash/map.rs:3673:24
[00:55:04]      |
[00:55:04] 3673 |             if let Err(AllocErr) = empty_bytes.try_reserve(max_no_ovf) {
[00:55:04]      |                        ^^^^^^^^ expected enum `collections::CollectionAllocErr`, found enum `std::collections::CollectionAllocErr`
[00:55:04]      = note: expected type `collections::CollectionAllocErr`
[00:55:04]                 found type `std::collections::CollectionAllocErr`
[00:55:04] 
[00:55:06] error: aborting due to 3 previous errors

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@Ericson2314
Copy link
Contributor

CC @alexcrichton per your comments on the alloc RFC, I think would agree this is a good first step to take while things are figured out long term?

@alexcrichton
Copy link
Member

I would personally prefer to not merge this until that RFC is decided on. I don't think we're able to commit long-term to HashMap being available in liballoc yet, and moving it back to libstd would be an uphill and difficult battle

@Ericson2314
Copy link
Contributor

Since it's unstable in alloc but stable in std as of the latest revision of this PR, I don't think it would be hard to move it back at all. But fair enough, waiting a little bit shouldn't be too bad.

@pietroalbini
Copy link
Member

Closing this until the RFC is merged.

@pietroalbini pietroalbini added S-blocked-closed and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 9, 2018
@jyn514 jyn514 added S-blocked Status: Blocked on something else such as an RFC or other implementation work. and removed S-blocked-closed labels Mar 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-blocked Status: Blocked on something else such as an RFC or other implementation work. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.