Add functions to support interleaved complex num #27

smu160 · 2024-04-30T16:07:55Z

@calebzulawski @Shnatsel

A draft PR to address issue #22.

codecov-commenter · 2024-05-01T17:44:32Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.16%. Comparing base (20fafba) to head (713b3a6).

Additional details and impacted files

@@           Coverage Diff           @@
##             main      #27   +/-   ##
=======================================
  Coverage   99.16%   99.16%           
=======================================
  Files           8        8           
  Lines         841      841           
=======================================
  Hits          834      834           
  Misses          7        7

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

smu160 · 2024-05-01T18:11:00Z

Hi @calebzulawski,

We now have the basics of what we'd need to support signals that are stored as an array of complex nums. Looking forward to hearing your thoughts!

Best,
Saveliy

Shnatsel · 2024-05-01T18:47:38Z

Cargo.toml

@@ -13,6 +13,7 @@ exclude = ["assets", "scripts", "benches"]
 [dependencies]
 num-traits = "0.2.18"
 multiversion = "0.7"
+num-complex = "0.4.5"


I think this should be an optional feature, maybe even a non-default one but enabled on docs.rs. The associated functions then need to be annotated as available with this feature only; there is a rustdoc feature for that.

I made the following changes:

num-complex is now a optional and non-default feature

associated functions and tests are annotated as available with this feature only

doc will not creates docs for all features (not sure if that's the desired outcome here?)

update the CI pipeline to run tests for all features (since I had to annotate tests that use Complex)

src/lib.rs

calebzulawski · 2024-05-03T20:34:12Z

Something like this looks good to me! Are you open to possibly using a more trait-oriented design in the future? I wouldn't want to hold up this functionality, but I think various input and output types could be combined into one generic function.

smu160 · 2024-05-03T20:38:39Z

Hi @calebzulawski,

Thank you for the review! I think your trait-oriented solution would be great. Just to clarify, do we want to wait on merging this PR? Or do we want this PR as an interim solution?

Best,
Saveliy

calebzulawski · 2024-05-03T21:00:48Z

I would probably merge it to have something to start from--just something to consider before a 1.0 release

Shnatsel · 2024-05-03T21:53:29Z

I've taken the liberty of rewriting separate_re_im and pushing to your branch.

I couldn't get it to vectorize on godbolt, but at least it should have the right public API, and we can always optimize it later.

Shnatsel · 2024-05-03T22:20:29Z

I tried a bunch more things and couldn't get it to autovectorize: https://godbolt.org/z/7Kd8v4vPG

We might need something like bytemuck here to view a Complex as a &[T; 2] or even better &[Complex<T>] as &[T] with double the length

calebzulawski · 2024-05-03T22:26:27Z

I think the approach I would take is iterate by chunks, bytemuck to [T; N * 2], Simd::deinterleave, wrapped in #[multiversion]

Shnatsel · 2024-05-03T22:29:14Z

Yeah that sounds good to me

calebzulawski · 2024-05-03T22:29:16Z

I am working on a simd-complex crate--this seems like an important enough algorithm to include there as an AoSoA <-> AoS conversion

- Make num-complex optional and non-default, and all functions that take and test interleaved FFT are now under the same feature - Bump num-complex version 0.4.5 --> 0.4.6 - Enable num-complex for docs/docs.rs - Update github action workflow to run tests for all features - Fix formatting in docs to fix links

smu160 · 2024-05-15T23:38:53Z

@Shnatsel

I have a rough version of bytemuck + Simd deinterleave, on godbolt.

Let me know your thoughts. Thank you!

Shnatsel · 2024-05-20T23:51:34Z

I've taken a look at the Godbolt version. Wow, that is some aggressive loop unrolling! It looks good to me though.

One more thing to try: we need to benchmark if iterating over vec![0; n] is faster or slower than Vec::with_capacity(n) plus Vec::extend_from_slice(deinterleaved.to_array()) in the main loop. The latter incurs slightly more instructions but avoids a memset to zero out both vectors.

Use bytemuck + SIMD::deinterleave to rearrange input data from a slice of Complex values into 2 slices of f32 or f64 values

src/lib.rs

smu160 · 2024-05-22T16:39:01Z

@Shnatsel I added the implementation using bytemuck + portable SIMD deinterleave. The solution in godbolt had a bounds check, so that was removed as well. You can see the difference in godbolt.

smu160 · 2024-05-22T17:35:05Z

@Shnatsel

One more thing to try: we need to benchmark if iterating over vec![0; n] is faster or slower than Vec::with_capacity(n) plus Vec::extend_from_slice(deinterleaved.to_array()) in the main loop. The latter incurs slightly more instructions but avoids a memset to zero out both vectors.

I added a benchmark, using criterion, to test this idea. The benchmark is probably not leveraging criterion to its full potential, so let me know.

Running benches/bench.rs (target/release/deps/bench-3ef076ccef6e83be)
Gnuplot not found, using plotters backend
separate_re_im/with_zeroed_vecs/1024
                        time:   [405.64 ns 408.02 ns 411.05 ns]
                        thrpt:  [2.4912 Gelem/s 2.5097 Gelem/s 2.5244 Gelem/s]
Found 16 outliers among 100 measurements (16.00%)
  3 (3.00%) high mild
  13 (13.00%) high severe
separate_re_im/with_capacity/1024
                        time:   [422.60 ns 425.43 ns 428.32 ns]
                        thrpt:  [2.3907 Gelem/s 2.4070 Gelem/s 2.4231 Gelem/s]
Found 18 outliers among 100 measurements (18.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  15 (15.00%) high severe
separate_re_im/with_zeroed_vecs/4096
                        time:   [1.9314 µs 1.9407 µs 1.9508 µs]
                        thrpt:  [2.0996 Gelem/s 2.1106 Gelem/s 2.1207 Gelem/s]
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) low mild
  5 (5.00%) high mild
  1 (1.00%) high severe
separate_re_im/with_capacity/4096
                        time:   [1.7073 µs 1.7213 µs 1.7380 µs]
                        thrpt:  [2.3567 Gelem/s 2.3795 Gelem/s 2.3991 Gelem/s]
Found 19 outliers among 100 measurements (19.00%)
  17 (17.00%) low mild
  2 (2.00%) high severe
separate_re_im/with_zeroed_vecs/16384
                        time:   [8.1350 µs 8.4114 µs 8.7453 µs]
                        thrpt:  [1.8735 Gelem/s 1.9478 Gelem/s 2.0140 Gelem/s]
separate_re_im/with_capacity/16384
                        time:   [8.5694 µs 8.6258 µs 8.7400 µs]
                        thrpt:  [1.8746 Gelem/s 1.8994 Gelem/s 1.9119 Gelem/s]
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) high mild
  6 (6.00%) high severe
separate_re_im/with_zeroed_vecs/65536
                        time:   [32.902 µs 33.322 µs 33.821 µs]
                        thrpt:  [1.9377 Gelem/s 1.9668 Gelem/s 1.9918 Gelem/s]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
separate_re_im/with_capacity/65536
                        time:   [39.719 µs 39.987 µs 40.303 µs]
                        thrpt:  [1.6261 Gelem/s 1.6389 Gelem/s 1.6500 Gelem/s]
Found 27 outliers among 100 measurements (27.00%)
  3 (3.00%) low severe
  11 (11.00%) low mild
  3 (3.00%) high mild
  10 (10.00%) high severe
separate_re_im/with_zeroed_vecs/262144
                        time:   [243.93 µs 247.17 µs 251.19 µs]
                        thrpt:  [1.0436 Gelem/s 1.0606 Gelem/s 1.0747 Gelem/s]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
separate_re_im/with_capacity/262144
                        time:   [276.91 µs 279.97 µs 283.36 µs]
                        thrpt:  [925.12 Melem/s 936.31 Melem/s 946.67 Melem/s]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
Benchmarking separate_re_im/with_zeroed_vecs/1048576: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.0s, enable flat sampling, or reduce sample count to 60.
separate_re_im/with_zeroed_vecs/1048576
                        time:   [1.2126 ms 1.2465 ms 1.2828 ms]
                        thrpt:  [817.42 Melem/s 841.24 Melem/s 864.72 Melem/s]
Found 16 outliers among 100 measurements (16.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
  13 (13.00%) high severe
Benchmarking separate_re_im/with_capacity/1048576: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.4s, enable flat sampling, or reduce sample count to 50.
separate_re_im/with_capacity/1048576
                        time:   [1.4784 ms 1.4848 ms 1.4942 ms]
                        thrpt:  [701.76 Melem/s 706.22 Melem/s 709.26 Melem/s]
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

Edit: Added in results

Shnatsel

Looks good, let's get this merged!

Add a function to separate AoS to SoA

21d77db

smu160 added the enhancement New feature or request label Apr 30, 2024

Fix formatting

58374f1

smu160 self-assigned this May 1, 2024

Add reverse separate fn and add test

59de096

smu160 added 2 commits May 1, 2024 14:05

Add fft_*_interleaved impls and tests

a716cd7

Fix formatting

ecca75b

smu160 marked this pull request as ready for review May 1, 2024 18:08

smu160 requested a review from Shnatsel May 1, 2024 18:11

Shnatsel reviewed May 1, 2024

View reviewed changes

calebzulawski approved these changes May 3, 2024

View reviewed changes

Simplify separate_re_im

23c1d21

smu160 added 5 commits May 8, 2024 12:57

Update docs for interleaved fft

13fe64c

Fix formatting

05713ff

Merge branch 'main' into feature/interleaved-complex-nums

2923d53

Fix formatting

1377ead

Vectorize deinterleaving of AoS --> SoA

9699f9e

Use bytemuck + SIMD::deinterleave to rearrange input data from a slice of Complex values into 2 slices of f32 or f64 values

Put macro definition begind feature flag

4d43190

Shnatsel reviewed May 22, 2024

View reviewed changes

src/lib.rs Outdated Show resolved Hide resolved

smu160 added 4 commits May 22, 2024 12:40

Fix formatting

356e468

Remove index tracker from hot path

fbb1bad

Add benchmark using criterion

3957834

Fix formatting

b429a32

smu160 added 2 commits May 22, 2024 16:19

Account for different signal sizes

b8affb2

Merge branch 'main' into feature/interleaved-complex-nums

c7856d0

Shnatsel mentioned this pull request Jun 19, 2024

Add deinterleaving function #32

Merged

smu160 added 8 commits June 27, 2024 14:11

Merge branch 'main' into feature/interleaved-complex-nums

f639900

Merge branch 'main' into feature/interleaved-complex-nums

5c09b89

Add new python benchmarking "framework"

98503e9

Update todo

a201e2c

Make examples output time elapsed in nanoseconds

403684b

Remove array_chunks

1343268

Use new de-interleaving function

e94f2fc

Update pre-commit hook to run clippy everywhere

713b3a6

Shnatsel approved these changes Jul 17, 2024

View reviewed changes

smu160 merged commit 08eedcb into main Jul 17, 2024
8 checks passed

smu160 mentioned this pull request Jul 19, 2024

Support interleaved complex numbers #22

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add functions to support interleaved complex num #27

Add functions to support interleaved complex num #27

smu160 commented Apr 30, 2024

codecov-commenter commented May 1, 2024 •

edited

Loading

smu160 commented May 1, 2024

Shnatsel May 1, 2024

smu160 May 8, 2024

calebzulawski commented May 3, 2024

smu160 commented May 3, 2024

calebzulawski commented May 3, 2024

Shnatsel commented May 3, 2024

Shnatsel commented May 3, 2024

calebzulawski commented May 3, 2024

Shnatsel commented May 3, 2024

calebzulawski commented May 3, 2024

smu160 commented May 15, 2024 •

edited

Loading

Shnatsel commented May 20, 2024

smu160 commented May 22, 2024

smu160 commented May 22, 2024 •

edited

Loading

Shnatsel left a comment

Add functions to support interleaved complex num #27

Add functions to support interleaved complex num #27

Conversation

smu160 commented Apr 30, 2024

codecov-commenter commented May 1, 2024 • edited Loading

Codecov Report

smu160 commented May 1, 2024

Shnatsel May 1, 2024

Choose a reason for hiding this comment

smu160 May 8, 2024

Choose a reason for hiding this comment

calebzulawski commented May 3, 2024

smu160 commented May 3, 2024

calebzulawski commented May 3, 2024

Shnatsel commented May 3, 2024

Shnatsel commented May 3, 2024

calebzulawski commented May 3, 2024

Shnatsel commented May 3, 2024

calebzulawski commented May 3, 2024

smu160 commented May 15, 2024 • edited Loading

Shnatsel commented May 20, 2024

smu160 commented May 22, 2024

smu160 commented May 22, 2024 • edited Loading

Shnatsel left a comment

Choose a reason for hiding this comment

codecov-commenter commented May 1, 2024 •

edited

Loading

smu160 commented May 15, 2024 •

edited

Loading

smu160 commented May 22, 2024 •

edited

Loading