Optimize qubit order for better data locality #26

totikom · 2024-02-13T18:27:42Z

(This may be completely unnecessary/inapplicable, so feel free to just close it)

The performance of simulating gates on big states drastically depends on its position (i.e. it is much faster to apply H to 0th qubit of the 30-qubit state that to the 29th)

So, for "unbalanced" circuits it is very beneficial to move "the most used" qubits to lower indices.

The text was updated successfully, but these errors were encountered:

smu160 · 2024-02-13T19:31:28Z

Hi @totikom,

Thank you for your insight! This would be a great initial start to a "circuit optimizer" that can be run prior to executing the actual circuit. If you're interested in implementing your suggestion, you are more than welcome to submit a PR.

For the sake of completeness and posterity, I will include some other findings/insights:

In my recent tests, I found that the single-threaded performance difference is negligible whether applying H to the 0th qubit, or to the 29th qubit; however, once you throw more threads at the problem, the story changes. Applying H to the 0th qubit is 2x faster than applying H to the 29th qubit. In fact, applying H to the 29th qubit with more than a single thread degrades performance. Hence, your suggestion may be a way to fix this issue in the case where we only apply a gate to a subset of the qubits.

Note that applying a gate to the kth qubit has the same exact memory access pattern as the kth stage in the decimation-in-time FFT. I've been looking into leveraging strategies from high performance FFT implementations in order to improve data locality. In principle, we can bring all pairs right next to each-other by looking at how the FFT interleaves data-points at each stage.

This may be a way to tackle the implementation of your suggestion.

Best,
Saveliy

totikom · 2024-02-13T20:32:14Z

Note that applying a gate to the kth qubit has the same exact memory access pattern as the kth stage in the decimation-in-time FFT. I've been looking into leveraging strategies from high performance FFT implementations in order to improve data locality. In principle, we can bring all pairs right next to each-other by looking at how the FFT interleaves data-points at each stage.

Yes! I've also though about it (I was doing a toy QC simulator for my curriculum two years ago), but I haven't found a data structure, which will "remain local" for all qubits, so I decided that it would be beneficial to reorder circuits.

Unfortunately, that course has ended and I was busy working on my Master's, so that project was abandoned.

I'll try to implement a naive optimizer for circuits and will make a PR.)

totikom · 2024-02-13T20:36:45Z

The optimizer can be based on a cost model, which will justify, whenever transformations are beneficial. I'm going to use LLVM VPlan as an example.

(Well, it is going to be not so naive optimizer)

smu160 · 2024-02-13T20:42:56Z

@totikom Thank you!

I found a great resource that discusses the same data locality issue, but in the context of FFT. Please see chapter 7 in Construction of a High-Performance FFT.

Best,
Saveliy

totikom changed the title ~~Optimize qubit order for batter data locality~~ Optimize qubit order for better data locality Feb 13, 2024

smu160 added the performance Poor performance or performance regression label Feb 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize qubit order for better data locality #26

Optimize qubit order for better data locality #26

totikom commented Feb 13, 2024

smu160 commented Feb 13, 2024

totikom commented Feb 13, 2024

totikom commented Feb 13, 2024

smu160 commented Feb 13, 2024 •

edited

Loading

Optimize qubit order for better data locality #26

Optimize qubit order for better data locality #26

Comments

totikom commented Feb 13, 2024

smu160 commented Feb 13, 2024

totikom commented Feb 13, 2024

totikom commented Feb 13, 2024

smu160 commented Feb 13, 2024 • edited Loading

smu160 commented Feb 13, 2024 •

edited

Loading