-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't store hashes in GroupOrdering #7029
Don't store hashes in GroupOrdering #7029
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me
FYI @mustafasrepo and @ozankabak -- this effectively should improve the speed of streamed / bounded group by
for (idx, &hash) in hashes.iter().enumerate() { | ||
self.map.insert(hash, (hash, idx), |(hash, _)| *hash); | ||
self.group_ordering.remove_groups(n); | ||
// SAFETY: self.map outlives iterator and is not modified concurrently |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I double checked: https://docs.rs/hashbrown/latest/hashbrown/raw/struct.RawTable.html#method.iter 👍
unsafe { | ||
for bucket in self.map.iter() { | ||
match bucket.as_ref().1.checked_sub(n) { | ||
None => self.map.erase(bucket), | ||
Some(sub) => bucket.as_mut().1 = sub, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is both wonderfully elegant as well as cryptic. How about some comments (this is so I don't have to refigure this out the next time I see this code):
unsafe { | |
for bucket in self.map.iter() { | |
match bucket.as_ref().1.checked_sub(n) { | |
None => self.map.erase(bucket), | |
Some(sub) => bucket.as_mut().1 = sub, | |
} | |
} | |
unsafe { | |
for bucket in self.map.iter() { | |
// decrement group index by n | |
match bucket.as_ref().1.checked_sub(n) { | |
// group index was < n, so remove from table | |
None => self.map.erase(bucket), | |
// group index was >= n, shift value down | |
Some(sub) => bucket.as_mut().1 = sub, | |
} | |
} |
I double checked https://docs.rs/hashbrown/latest/hashbrown/raw/struct.RawIter.html
You must not free the hash table while iterating (including via growing/shrinking).
It is fine to erase a bucket that has been yielded by the iterator.
Erasing a bucket that has not yet been yielded by the iterator may still result in the iterator yielding that bucket (unless reflect_remove is called).
It is unspecified whether an element inserted after the iterator was created will be yielded by that iterator (unless reflect_insert is called).
The order in which the iterator yields bucket is unspecified and may change in the future.
Which seems to be followed 👍
Which issue does this PR close?
Closes #.
Rationale for this change
The approach of storing hashes in GroupOrdering was causing merge conflicts for #7016 and is not actually necessary
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?