-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid rehashing Fingerprint as a map key #76233
Conversation
This introduces a no-op `Unhasher` for map keys that are already hash- like, for example `Fingerprint` and its wrapper `DefPathHash`. For these we can directly produce the `u64` hash for maps. The first use of this is `def_path_hash_to_def_id: Option<UnhashMap<DefPathHash, DefId>>`.
@bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit 469ca37 with merge 780ecb71aa4946e12f8857c770a687286834e33d... |
☀️ Try build successful - checks-actions, checks-azure |
Queued 780ecb71aa4946e12f8857c770a687286834e33d with parent 130359c, future comparison URL. |
Finished benchmarking try commit (780ecb71aa4946e12f8857c770a687286834e33d): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
-1% on hello world :) smaller decreases on the rest |
#[inline] | ||
default fn write_fingerprint(&mut self, fingerprint: &Fingerprint) { | ||
self.write_u64(fingerprint.0); | ||
self.write_u64(fingerprint.1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you try a panic! here instead to try and track down all the cases where we are hashing fingerprints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't, but I tried just now and it ran into a StableHasher
for the parent DefPathHash
:
rust/compiler/rustc_hir/src/definitions.rs
Lines 111 to 117 in 130359c
fn compute_stable_hash(&self, parent_hash: DefPathHash) -> DefPathHash { | |
let mut hasher = StableHasher::new(); | |
// We hash a `0u8` here to disambiguate between regular `DefPath` hashes, | |
// and the special "root_parent" below. | |
0u8.hash(&mut hasher); | |
parent_hash.hash(&mut hasher); |
I suppose we could have that hash without the parent at first, and then Fingerprint::combine
them for the final value. I'll give that a shot and see if anything else comes up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The next one that comes up is item_ids_hash.hash_stable(...)
:
rust/compiler/rustc_middle/src/ich/impls_hir.rs
Lines 52 to 65 in 130359c
// Combining the `DefPathHash`s directly is faster than feeding them | |
// into the hasher. Because we use a commutative combine, we also don't | |
// have to sort the array. | |
let item_ids_hash = item_ids | |
.iter() | |
.map(|id| { | |
let (def_path_hash, local_id) = id.id.to_stable_hash_key(hcx); | |
debug_assert_eq!(local_id, hir::ItemLocalId::from_u32(0)); | |
def_path_hash.0 | |
}) | |
.fold(Fingerprint::ZERO, |a, b| a.combine_commutative(b)); | |
item_ids.len().hash_stable(hcx, hasher); | |
item_ids_hash.hash_stable(hcx, hasher); |
I wonder if we could instead add a combine operation directly in the
StableHasher
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, we could specialize to allow Fingerprint
s to be used with StableHasher
too, and unimplement the rest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, supporting combine
directly with StableHasher seems like a good idea -- we presumably want that (or similar) a lot.
I'm not sure if combine is "as good" as hashing though, from a "hash quality" perspective. I would sort of assume no because then we wouldn't get any wins from using it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's another hit:
node_to_node_index: Sharded<FxHashMap<DepNode<K>, DepNodeIndex>>, |
... where
DepNode<K>
is a K
and a Fingerprint
.
I'm inclined to let all of these hash normally for now. Unhasher
is already a bit hacky itself...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we wanted though, DepNode<K>
could ignore its K
for hashing purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does sound like there's sort of a lot of potential here but the current Hash/Hasher
API doesn't readily allow specializing like we want to.
I was looking at the Hasher API, and it feels like it might be worth adding something like a write_hash
or similar, where the Hasher can expect the incoming value to already be "hashed" or at least nicely distributed across the range. To start we could just take a u64 since that's what Hasher::finish()
returns, though e.g. for Fingerprint we really want u128. Maybe fn write_hash(impl Into<u128>)
makes sense, not sure.
I am leaning towards saying that we should just merge this PR as-is: it seems like a clear, if small, win, and while there may be more hidden through careful hash-skipping it's probably better to evaluate each in a standalone manner, particularly given the relative complexity of the Unhasher design. We can consider other improvements, like the one I suggested in the previous paragraph, later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, supporting
combine
directly with StableHasher seems like a good idea -- we presumably want that (or similar) a lot.
FWIW, I was just wanting something like this for span's hash implementation, which currently re-hashes a hash of the file name.
@cuviper -- unless you specifically want review from @eddyb, given #76233 (comment) sounds reasonable to you, r=me |
📌 Commit 469ca37 has been approved by |
☀️ Test successful - checks-actions, checks-azure |
@Mark-Simulacrum I got an idea to create new map/set types where the key controls its own hash value: I haven't published that yet because I still need to convert the tests and documentation (forked from hashbrown), but I'd love to hear any thoughts you have on that approach. |
I like that approach. I wish we didn't have to fork hashbrown though -- I suspect it might be avoidable, by using the raw tables similar to (I think) indexmap's approach, or even having a fake Hash/Hasher pair into the hashbrown HashMap/HashSet types directly -- I'd need to sit down and sketch it out in detail to explore that, and without delegation of some kind you still need to duplicate the whole API surface :/ |
Sorry, that wasn't clear -- I forked hashbrown to get the starting implementation of The fake |
Ah, okay, yeah -- in that case seems fine. The RawTable is where the really interesting bits are, after all, AFAIK. I still feel like something along the lines of the write_hash idea from #76233 (comment) would be good to add to std, and then in the case of autohash the Hasher could always expect only a write_hash call. (You can obviously also emulate that with just a write_u64 call that's "known good"). I'm not sure that there's a great solution -- you kind of want |
This introduces a no-op
Unhasher
for map keys that are already hash-like, for example
Fingerprint
and its wrapperDefPathHash
. For thesewe can directly produce the
u64
hash for maps. The first use of thisis
def_path_hash_to_def_id: Option<UnhashMap<DefPathHash, DefId>>
.cc #56308
r? @eddyb