-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support interleaved complex numbers #22
Comments
Hi @calebzulawski, Thank you for bringing up this issue. I'm assuming what you mean is that we should be able to provide the input to Namely, we should have a version of pub fn fft_64(input: Vec<Complex64>, direction: Direction) {
// <-- snipped -->
} The original reason why this was never tried in Let me know if I'm way off base or any other thoughts you may have. Thanks! Best, |
Yep, that's exactly what I mean. I agree that it would definitely be a bit slower, so it would be best to offer it both ways--mostly as a convenience because sometimes you can't choose your data format. Perhaps even offering combinations (interleaving only the input or output, for an out of place operation) would be helpful too. |
Thank you for getting back to me. I wonder if it could be more performant to just "pre-process" the array-of-structs into the struct-of-arrays and then back again (also taking into account how the output should be formatted). I know that the interleaved format needed a lot of permute instructions when I used it in the past. You may have more insight to offer when it comes to that though. Let me know what you think. Thanks! Best, |
What do you think of starting out by offering a version of pub fn separate_re_im(signal: &[Complex<f64>], chunk_size: usize) -> (Vec<f64>, Vec<f64>) {
let mut reals = Vec::with_capacity(signal.len());
let mut imaginaries = Vec::with_capacity(signal.len());
// We don't assume power of 2 size, so we don't use `chunks_exact`.
// Process each chunk, including the last chunk which may be smaller.
for chunk in signal.chunks(chunk_size) {
for num in chunk {
reals.push(num.re);
imaginaries.push(num.im);
}
}
(reals, imaginaries)
} Note that this function isn't generalized for before calling the Given how simple this function is, it should be possible to vectorize the heck out of it, just as @Shnatsel did for twiddle factor generation. Of course, we'd need the inverse function as well (i.e., split reals/imags back into I wonder if the overhead of this would be okay considering that it would allow us to avoid permute instructions when we utilize SIMD. In that past, I wasn't able to get around that when complex numbers were stored together. Looking forward to hearing your thoughts. Thanks! Best, |
Performance tip: always use And yes, it should be possible to vectorize the heck out of this. |
Yes, I think something like that is probably sufficient. With some trickery, it would also be possible to do it in place, since I think there's a little bit more performance to pick up by combining the input/output transformations with the first or last loops of the FFT, but is that really important? Probably not to start. |
I have found it can actually be slightly more efficient to instead segment like this, imagine you have 11 elements in chunks of 4: [0, 1, 2, 3], [4, 5, 6, 7], [7, 8, 9, 10]. The 7th element gets computed twice (assuming you have working memory that is not invalidated) but this allows you to do only vector loads, rather than a slow sequential load at the end. |
@calebzulawski The PR (#27) for this issue is finally merged. Please let me know if anything else should be added or addressed in order to close out this issue. Thank you!! |
A naive first shot at this could simply reorder the inputs and outputs, but a better implementation could embed this in the read and write loops.
The text was updated successfully, but these errors were encountered: