-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make hash table memory layout SIMD-independent #16
Make hash table memory layout SIMD-independent #16
Conversation
739a8b9
to
9ff30c6
Compare
…ffering SIMD support.
541bed1
to
91c66ce
Compare
I am just passing by here, but both X86_64 and Apple Silicon have SIMD instructions. Why would this cause the problem that was reported? |
f62aaee
to
920ce12
Compare
Right, SIMD might not be the correct term: it's SSE2 specifically that we are checking for. |
I expect that non-x86 architectures will become more prevalent in the near- and middle-term, so it may be worth avoiding conflating the two ideas when next that code is touched. |
I don't really understand. When cross-compiling to ARM the |
The problem comes from cross compiling the standard library. We use an x86_64 compiler (which has SSE2) to generate the crate metadata that ends up in the Aarch64 standard library RLIBs. This crate metadata has |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM and the rest of the code seems to use REFERENCE_GROUP_SIZE
in the appropriate places as well.
As the term SIMD is used in this context, it is actually correct. The crate has a So my comment from before should actually have been something like: Some platforms have a specialized SIMD-enabled implementation and prior to this PR the hash table memory layout depended on whether that was used or not -- when the memory layout really is required to be the same on all platforms and independent of whether any form of SIMD is used or not. |
Are these given 16-byte alignment when marshaled into memory? |
@workingjubilee Can you elaborate what you mean? |
I'm guessing there's a risk of baking in the SIMD alignment (which is 16 bytes for SSE2, as opposed to a maximum of 8 for anything not involving a SIMD vector type), into the shape of the on-disk data. However, it looks like the problem was more with a count (like a "throughput factor"?) so I don't think it's likely that alignment factors into it (OTOH, if there's SIMD loads straight from memory-mapped storage, they would all have to be unaligned ones, but that's a separate concern). EDIT: oh yeah, this is pretty straight-forward, and byte-oriented: odht/src/swisstable_group_query/sse2.rs Lines 13 to 19 in 22d9bb6
|
Yeah, these are all unaligned loads. |
Yeah, I was mostly just glancing over the code and wasn't sure what the alignments wound up as. My understanding is that it's somewhat preferable to just always align to 16 bytes in memory if you expect to use SIMD sometimes, as you don't pay a huge penalty for such an over-alignment while in-motion, but it doesn't matter when the data is at-rest on disk because the first pulls from disk will be somewhat slow anyways. Higher alignments (e.g. 32 bytes for |
I suspect these "groups" are always at an offset multiple of 16 in a larger byte array - could that byte array be aligned as a whole? Seems like it shouldn't have much of an overhead. I'm actually curious now what the impact would be, we can probably run perf with rustc depending on a patched version of |
Unfortunately a group can start at arbitrary indices, i.e. it will start at |
Ahh, my bad, was expecting groups to be chunks, but I guess they're windows (of buckets), that makes sense now. |
Yes, "window" is a more fitting better term. I used groups because that's what hashbrown calls them (not sure about the original SwissTable implementation). |
Unfortunately, on some processors unaligned loads and stores don't work well for SIMD, being severely pessimized. Once upon a time, this included x86-64. For AArch64, the major "second target", however, the penalties are relatively light and narrow, applying only to
So that's something to take into account. Probably not a serious concern, fwiw, since it mostly means a pessimization to scalar at worst. |
This PR makes sure that the memory layout of the hash table does not depend on whether SIMD is available on a platform or not. Previously, the size of the metadata array would be
entry_count + GROUP_SIZE
whereGROUP_SIZE
depends on how many metadata entries we can look at at once, which in turn depends on whether SIMD is present or not.This PR changes this to the platform- and SIMD-support-independent
REFERENCE_GROUP_SIZE
so that the metadata array will always have the same size everywhere.This should fix the issue report in rust-lang/rust#89085.