-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use std::arch for SIMD and target_feature #46
Comments
Preferred approach would be to move the heavy lifting and inner loops (dot product etc) to a separate crate in the style of https://github.com/bluss/numeric-loops or another existing already simdified crate. |
@bluss I am contributing to std::arch to make it a stable feature as soon as possible. I would like to undertake the simd-realization of ndarray. I think we can create a new branch from master for realizing and discussing. The following is a very simple example:
Output:
|
Hey, it's good if we talk about this before you get started. Notice that in this issue - it's not intended to be about arrays using those explicit simd types at all - that would be a different design - accelerating operations on IMO simd that we are most interested in, for x86 at least, is already stable. Notice also in this issue that I have suggested that any simd code like that happens in a new crate that we depend on. That means, it is not part of the ndarray crate. |
I tried to use the simd in the operator overloading of multiplication. here.
The result is as follows:
The operation of f64 has been accelerated by 2x+, and the operation of i32 has been accelerated by 4x+. I'm wondering if I am working in the right direction. |
@bluss Could you help pointing out which methods in ndarray should use simd in the first place? |
Here is my plan
|
I think you may be interested in this project, when simd is in std possibly ndarray will support this to further improve its performance. |
Is anybody working on this, or any reason I shouldn't attempt it? Just to clarify, I'm assuming this means extracting the internal contents (like loops and basic operations) of the existing Ndarray functions into a separate crate |
See rust-lang/rust/issues/29717
Use to select impl for unrolled dot product and scalar sum.
The text was updated successfully, but these errors were encountered: