Make hash table memory layout SIMD-independent #16

michaelwoerister · 2021-09-20T09:57:11Z

This PR makes sure that the memory layout of the hash table does not depend on whether SIMD is available on a platform or not. Previously, the size of the metadata array would be entry_count + GROUP_SIZE where GROUP_SIZE depends on how many metadata entries we can look at at once, which in turn depends on whether SIMD is present or not.

This PR changes this to the platform- and SIMD-support-independent REFERENCE_GROUP_SIZE so that the metadata array will always have the same size everywhere.

This should fix the issue report in rust-lang/rust#89085.

…ffering SIMD support.

shepmaster · 2021-09-20T12:28:13Z

whether SIMD is available on a platform or not

I am just passing by here, but both X86_64 and Apple Silicon have SIMD instructions. Why would this cause the problem that was reported?

michaelwoerister · 2021-09-20T12:31:20Z

Right, SIMD might not be the correct term: it's SSE2 specifically that we are checking for.

shepmaster · 2021-09-20T13:11:32Z

SIMD might not be the correct term

I expect that non-x86 architectures will become more prevalent in the near- and middle-term, so it may be worth avoiding conflating the two ideas when next that code is touched.

hkratz · 2021-09-20T13:11:54Z

Right, SIMD might not be the correct term: it's SSE2 specifically that we are checking for.

I don't really understand. When cross-compiling to ARM the target_arch is aarch64 and sse2 should not be in the target_feature list. Otherwise a whole lot of other things would break as well IMHO.

michaelwoerister · 2021-09-20T13:31:41Z

The problem comes from cross compiling the standard library. We use an x86_64 compiler (which has SSE2) to generate the crate metadata that ends up in the Aarch64 standard library RLIBs. This crate metadata has odht hash tables in it, which are then decoded by an Aarch64 compiler (which does not have SSE2) when it is used to compile something.

wesleywiser

LGTM and the rest of the code seems to use REFERENCE_GROUP_SIZE in the appropriate places as well.

michaelwoerister · 2021-09-20T13:43:33Z

I expect that non-x86 architectures will become more prevalent in the near- and middle-term, so it may be worth avoiding conflating the two ideas when next that code is touched.

As the term SIMD is used in this context, it is actually correct. The crate has a no_simd feature which force-disables the use of any SIMD instructions. The fix in this PR also makes the table layout independent of whether any form of SIMD is used. It is by chance that currently the only form of SIMD supported is SSE2.

So my comment from before should actually have been something like: Some platforms have a specialized SIMD-enabled implementation and prior to this PR the hash table memory layout depended on whether that was used or not -- when the memory layout really is required to be the same on all platforms and independent of whether any form of SIMD is used or not.

workingjubilee · 2021-09-20T18:36:02Z

Are these given 16-byte alignment when marshaled into memory?

michaelwoerister · 2021-09-21T13:35:58Z

@workingjubilee Can you elaborate what you mean?

eddyb · 2021-09-21T16:45:44Z

I'm guessing there's a risk of baking in the SIMD alignment (which is 16 bytes for SSE2, as opposed to a maximum of 8 for anything not involving a SIMD vector type), into the shape of the on-disk data.

However, it looks like the problem was more with a count (like a "throughput factor"?) so I don't think it's likely that alignment factors into it (OTOH, if there's SIMD loads straight from memory-mapped storage, they would all have to be unaligned ones, but that's a separate concern).

EDIT: oh yeah, this is pretty straight-forward, and byte-oriented:

odht/src/swisstable_group_query/sse2.rs

Lines 13 to 19 in 22d9bb6

    
           impl GroupQuery { 
        
               #[inline] 
        
               pub fn from(group: &[u8; GROUP_SIZE], h2: u8) -> GroupQuery { 
        
                   assert!(std::mem::size_of::<x86::__m128i>() == GROUP_SIZE); 
        
                   unsafe { 
        
                       let group = x86::_mm_loadu_si128(group as *const _ as *const x86::__m128i);

michaelwoerister · 2021-09-21T17:00:11Z

Yeah, these are all unaligned loads.

workingjubilee · 2021-09-21T19:01:37Z

Yeah, I was mostly just glancing over the code and wasn't sure what the alignments wound up as.

My understanding is that it's somewhat preferable to just always align to 16 bytes in memory if you expect to use SIMD sometimes, as you don't pay a huge penalty for such an over-alignment while in-motion, but it doesn't matter when the data is at-rest on disk because the first pulls from disk will be somewhat slow anyways. Higher alignments (e.g. 32 bytes for _mm256) don't truly matter because the unaligned load/store penalty evaporates on AVX with VEX prefixes, and other higher-than-16-byte-SIMD processors don't require alignment, but SSE, AltiVec, and Neon all prefer 16 byte alignment to greater or lesser degrees.

eddyb · 2021-09-21T19:33:45Z

I suspect these "groups" are always at an offset multiple of 16 in a larger byte array - could that byte array be aligned as a whole? Seems like it shouldn't have much of an overhead.

I'm actually curious now what the impact would be, we can probably run perf with rustc depending on a patched version of odht.

michaelwoerister · 2021-09-22T08:24:59Z

Unfortunately a group can start at arbitrary indices, i.e. it will start at hash % table_size. That's just how swisstable works.

eddyb · 2021-09-22T08:39:08Z

Ahh, my bad, was expecting groups to be chunks, but I guess they're windows (of buckets), that makes sense now.

michaelwoerister · 2021-09-23T10:24:54Z

Yes, "window" is a more fitting better term. I used groups because that's what hashbrown calls them (not sure about the original SwissTable implementation).

workingjubilee · 2021-09-23T20:27:27Z

Unfortunately, on some processors unaligned loads and stores don't work well for SIMD, being severely pessimized. Once upon a time, this included x86-64. For AArch64, the major "second target", however, the penalties are relatively light and narrow, applying only to

Load operations that cross a cache-line (64-byte) boundary
Store operations that cross a 16-byte boundary

So that's something to take into account. Probably not a serious concern, fwiw, since it mostly means a pessimization to scalar at worst.

michaelwoerister force-pushed the fix-simd-vs-no-simd-group-sizes branch 3 times, most recently from 739a8b9 to 9ff30c6 Compare September 20, 2021 10:20

Add regression test for decoding files on platforms generated with di…

91c66ce

…ffering SIMD support.

michaelwoerister force-pushed the fix-simd-vs-no-simd-group-sizes branch from 541bed1 to 91c66ce Compare September 20, 2021 11:31

michaelwoerister added 2 commits September 20, 2021 13:38

Make hash table memory layout independent of SIMD support.

294493b

Add some more documentation to SIMD/no-SIMD testing.

eb92a50

michaelwoerister changed the title ~~[WIP] Fix target-dependent encoding~~ Make hash table memory layout SIMD-independent Sep 20, 2021

michaelwoerister marked this pull request as ready for review September 20, 2021 12:03

michaelwoerister mentioned this pull request Sep 20, 2021

Run hello world error on m1 mbp(macOS 12.0 (21A5506j)) rust-lang/rust#89085

Closed

Bump version to 0.3.0.

920ce12

michaelwoerister force-pushed the fix-simd-vs-no-simd-group-sizes branch from f62aaee to 920ce12 Compare September 20, 2021 12:28

wesleywiser approved these changes Sep 20, 2021

View reviewed changes

wesleywiser merged commit 22d9bb6 into rust-lang:main Sep 20, 2021

wesleywiser mentioned this pull request Sep 20, 2021

Provide a SIMD implementation of swisstable_group_query suitable for ARM #17

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make hash table memory layout SIMD-independent #16

Make hash table memory layout SIMD-independent #16

michaelwoerister commented Sep 20, 2021 •

edited

Loading

shepmaster commented Sep 20, 2021

michaelwoerister commented Sep 20, 2021

shepmaster commented Sep 20, 2021

hkratz commented Sep 20, 2021

michaelwoerister commented Sep 20, 2021

wesleywiser left a comment

michaelwoerister commented Sep 20, 2021

workingjubilee commented Sep 20, 2021

michaelwoerister commented Sep 21, 2021

eddyb commented Sep 21, 2021 •

edited

Loading

michaelwoerister commented Sep 21, 2021

workingjubilee commented Sep 21, 2021

eddyb commented Sep 21, 2021

michaelwoerister commented Sep 22, 2021

eddyb commented Sep 22, 2021

michaelwoerister commented Sep 23, 2021

workingjubilee commented Sep 23, 2021 •

edited

Loading

Make hash table memory layout SIMD-independent #16

Make hash table memory layout SIMD-independent #16

Conversation

michaelwoerister commented Sep 20, 2021 • edited Loading

shepmaster commented Sep 20, 2021

michaelwoerister commented Sep 20, 2021

shepmaster commented Sep 20, 2021

hkratz commented Sep 20, 2021

michaelwoerister commented Sep 20, 2021

wesleywiser left a comment

Choose a reason for hiding this comment

michaelwoerister commented Sep 20, 2021

workingjubilee commented Sep 20, 2021

michaelwoerister commented Sep 21, 2021

eddyb commented Sep 21, 2021 • edited Loading

michaelwoerister commented Sep 21, 2021

workingjubilee commented Sep 21, 2021

eddyb commented Sep 21, 2021

michaelwoerister commented Sep 22, 2021

eddyb commented Sep 22, 2021

michaelwoerister commented Sep 23, 2021

workingjubilee commented Sep 23, 2021 • edited Loading

michaelwoerister commented Sep 20, 2021 •

edited

Loading

eddyb commented Sep 21, 2021 •

edited

Loading

workingjubilee commented Sep 23, 2021 •

edited

Loading