Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid rehashing Fingerprint as a map key #76233

Merged
merged 1 commit into from
Sep 3, 2020
Merged

Conversation

cuviper
Copy link
Member

@cuviper cuviper commented Sep 2, 2020

This introduces a no-op Unhasher for map keys that are already hash-
like, for example Fingerprint and its wrapper DefPathHash. For these
we can directly produce the u64 hash for maps. The first use of this
is def_path_hash_to_def_id: Option<UnhashMap<DefPathHash, DefId>>.

cc #56308
r? @eddyb

This introduces a no-op `Unhasher` for map keys that are already hash-
like, for example `Fingerprint` and its wrapper `DefPathHash`. For these
we can directly produce the `u64` hash for maps. The first use of this
is `def_path_hash_to_def_id: Option<UnhashMap<DefPathHash, DefId>>`.
@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 2, 2020
@Mark-Simulacrum
Copy link
Member

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Sep 2, 2020

⌛ Trying commit 469ca37 with merge 780ecb71aa4946e12f8857c770a687286834e33d...

@bors
Copy link
Contributor

bors commented Sep 2, 2020

☀️ Try build successful - checks-actions, checks-azure
Build commit: 780ecb71aa4946e12f8857c770a687286834e33d (780ecb71aa4946e12f8857c770a687286834e33d)

@rust-timer
Copy link
Collaborator

Queued 780ecb71aa4946e12f8857c770a687286834e33d with parent 130359c, future comparison URL.

@jyn514 jyn514 added I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Sep 2, 2020
@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (780ecb71aa4946e12f8857c770a687286834e33d): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never

@jyn514
Copy link
Member

jyn514 commented Sep 2, 2020

-1% on hello world :) smaller decreases on the rest

#[inline]
default fn write_fingerprint(&mut self, fingerprint: &Fingerprint) {
self.write_u64(fingerprint.0);
self.write_u64(fingerprint.1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try a panic! here instead to try and track down all the cases where we are hashing fingerprints?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't, but I tried just now and it ran into a StableHasher for the parent DefPathHash:

fn compute_stable_hash(&self, parent_hash: DefPathHash) -> DefPathHash {
let mut hasher = StableHasher::new();
// We hash a `0u8` here to disambiguate between regular `DefPath` hashes,
// and the special "root_parent" below.
0u8.hash(&mut hasher);
parent_hash.hash(&mut hasher);

I suppose we could have that hash without the parent at first, and then Fingerprint::combine them for the final value. I'll give that a shot and see if anything else comes up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The next one that comes up is item_ids_hash.hash_stable(...):

// Combining the `DefPathHash`s directly is faster than feeding them
// into the hasher. Because we use a commutative combine, we also don't
// have to sort the array.
let item_ids_hash = item_ids
.iter()
.map(|id| {
let (def_path_hash, local_id) = id.id.to_stable_hash_key(hcx);
debug_assert_eq!(local_id, hir::ItemLocalId::from_u32(0));
def_path_hash.0
})
.fold(Fingerprint::ZERO, |a, b| a.combine_commutative(b));
item_ids.len().hash_stable(hcx, hasher);
item_ids_hash.hash_stable(hcx, hasher);

I wonder if we could instead add a combine operation directly in the StableHasher?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could specialize to allow Fingerprints to be used with StableHasher too, and unimplement the rest.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, supporting combine directly with StableHasher seems like a good idea -- we presumably want that (or similar) a lot.

I'm not sure if combine is "as good" as hashing though, from a "hash quality" perspective. I would sort of assume no because then we wouldn't get any wins from using it...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's another hit:

node_to_node_index: Sharded<FxHashMap<DepNode<K>, DepNodeIndex>>,

... where DepNode<K> is a K and a Fingerprint.

I'm inclined to let all of these hash normally for now. Unhasher is already a bit hacky itself...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we wanted though, DepNode<K> could ignore its K for hashing purposes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does sound like there's sort of a lot of potential here but the current Hash/Hasher API doesn't readily allow specializing like we want to.

I was looking at the Hasher API, and it feels like it might be worth adding something like a write_hash or similar, where the Hasher can expect the incoming value to already be "hashed" or at least nicely distributed across the range. To start we could just take a u64 since that's what Hasher::finish() returns, though e.g. for Fingerprint we really want u128. Maybe fn write_hash(impl Into<u128>) makes sense, not sure.

I am leaning towards saying that we should just merge this PR as-is: it seems like a clear, if small, win, and while there may be more hidden through careful hash-skipping it's probably better to evaluate each in a standalone manner, particularly given the relative complexity of the Unhasher design. We can consider other improvements, like the one I suggested in the previous paragraph, later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, supporting combine directly with StableHasher seems like a good idea -- we presumably want that (or similar) a lot.

FWIW, I was just wanting something like this for span's hash implementation, which currently re-hashes a hash of the file name.

@Mark-Simulacrum
Copy link
Member

@cuviper -- unless you specifically want review from @eddyb, given #76233 (comment) sounds reasonable to you, r=me

@cuviper
Copy link
Member Author

cuviper commented Sep 2, 2020

I'm OK with exploring others separately, and I had only chosen the reviewer based on #56308, so let's go!

@bors r=Mark-Simulacrum

@bors
Copy link
Contributor

bors commented Sep 2, 2020

📌 Commit 469ca37 has been approved by Mark-Simulacrum

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 2, 2020
@bors
Copy link
Contributor

bors commented Sep 2, 2020

⌛ Testing commit 469ca37 with merge 51f79b6...

@bors
Copy link
Contributor

bors commented Sep 3, 2020

☀️ Test successful - checks-actions, checks-azure
Approved by: Mark-Simulacrum
Pushing 51f79b6 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Sep 3, 2020
@bors bors merged commit 51f79b6 into rust-lang:master Sep 3, 2020
@cuviper
Copy link
Member Author

cuviper commented Sep 3, 2020

@Mark-Simulacrum I got an idea to create new map/set types where the key controls its own hash value:
https://github.com/cuviper/autohash

I haven't published that yet because I still need to convert the tests and documentation (forked from hashbrown), but I'd love to hear any thoughts you have on that approach.

@Mark-Simulacrum
Copy link
Member

I like that approach. I wish we didn't have to fork hashbrown though -- I suspect it might be avoidable, by using the raw tables similar to (I think) indexmap's approach, or even having a fake Hash/Hasher pair into the hashbrown HashMap/HashSet types directly -- I'd need to sit down and sketch it out in detail to explore that, and without delegation of some kind you still need to duplicate the whole API surface :/

@cuviper
Copy link
Member Author

cuviper commented Sep 3, 2020

Sorry, that wasn't clear -- I forked hashbrown to get the starting implementation of HashMap, HashSet, and everything that goes with them, but I did make this use hashbrown::raw::RawTable as a dependency.

The fake Hash/Hasher pair is kind of what I did here with Unhasher, but I don't really feel like it's a great solution.

@Mark-Simulacrum
Copy link
Member

Ah, okay, yeah -- in that case seems fine. The RawTable is where the really interesting bits are, after all, AFAIK.

I still feel like something along the lines of the write_hash idea from #76233 (comment) would be good to add to std, and then in the case of autohash the Hasher could always expect only a write_hash call. (You can obviously also emulate that with just a write_u64 call that's "known good").

I'm not sure that there's a great solution -- you kind of want trait Hash<H: Hasher> instead of trait Hash { fn hash<H: Hasher> } so that you could have less generic impls. But that ship has long sailed, and might've been a bad idea too, I'm not sure :).

@cuviper cuviper added this to the 1.48.0 milestone Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-compiletime Issue: Problems and improvements with respect to compile times. merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants