Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use intrinsics for Mask::{to,from}_array #189

Merged
merged 1 commit into from
Jan 20, 2022

Conversation

workingjubilee
Copy link
Member

This significantly simplifies codegen and should improve mask perf.
Thanks to @programmerjake for the example for from_array!
This fixes #184.

@workingjubilee workingjubilee force-pushed the cast-mask-from-array branch 2 times, most recently from a32f2cf to d2ab646 Compare November 13, 2021 23:42
@workingjubilee
Copy link
Member Author

This version should work except that I believe it requires simd_cast to not vomit on Simd<isize, _>.

This significantly simplifies codegen and should improve mask perf.

Co-authored-by: Jacob Lifshay <[email protected]>
Copy link
Member

@programmerjake programmerjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, once simd_cast is fixed

// false: 0b_0000_0000
// Thus, an array of bools is also a valid array of bytes: [u8; N]
// This would be hypothetically valid as an "in-place" transmute,
// but these are "dependently-sized" types, so copy elision it is!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transmute_copy doesn't check if the input and output types have the same size.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, but we know they're the same size since the input type is [bool; LANES] and the output type is [u8; LANES] and Rust guarantees that bool is a single byte.

Comment on lines +119 to +120
let bools: Simd<i8, LANES> =
intrinsics::simd_ne(Simd::from_array(bytes), Simd::splat(0u8));
Copy link
Member

@bjorn3 bjorn3 Nov 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary? Can't this be just Simd::from_array(bytes)?

Edit: is it because this needs -1 for true instead of 1?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. It could also be simd_neg, come to think of it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simd_ne is best because llvm knows the result is supposed to be a mask, with neg llvm doesn't know that, so will likely optimize more poorly

@workingjubilee
Copy link
Member Author

Now that rust-lang/rust#92425 is merged, that should allow this to land. Just have to wait for nightly to roll over.

@workingjubilee workingjubilee merged commit a4f5f01 into master Jan 20, 2022
@calebzulawski calebzulawski deleted the cast-mask-from-array branch October 17, 2022 00:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

poor performance for Mask::from_array
3 participants