Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add new multithreaded TwoQubitPeepholeOptimization pass #13419

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

mtreinish
Copy link
Member

@mtreinish mtreinish commented Nov 10, 2024

Summary

This commit adds a new transpiler pass for physical optimization,
TwoQubitPeepholeOptimization. This replaces the use of Collect2qBlocks,
ConsolidateBlocks, and UnitarySynthesis in the optimization stage for
a default pass manager setup. The pass logically works the same way
where it analyzes the dag to get a list of 2q runs, calculates the matrix
of each run, and then synthesizes the matrix and substitutes it inplace.
The distinction this pass makes though is it does this all in a single
pass and also parallelizes the matrix calculation and synthesis steps
because there is no data dependency there.

This new pass is not meant to fully replace the Collect2qBlocks,
ConsolidateBlocks, or UnitarySynthesis passes as those also run in
contexts where we don't have a physical circuit. This is meant instead
to replace their usage in the optimization stage only. Accordingly this
new pass also changes the logic on how we select the synthesis to use
and when to make a substituion. Previously this logic was primarily done
via the ConsolidateBlocks pass by only consolidating to a UnitaryGate if
the number of basis gates needed based on the weyl chamber coordinates
was less than the number of 2q gates in the block (see #11659 for
discussion on this). Since this new pass skips the explicit
consolidation stage we go ahead and try all the available synthesizers

Right now this commit has a number of limitations, the largest are:

  • Only supports the target
  • It doesn't support the XX decomposer because it's not in rust (the TwoQubitBasisDecomposer and TwoQubitControlledUDecomposer are used)

For plugin handling I left the logic as running the three pass series,
but I'm not sure this is the behavior we want. We could say keep the
synthesis plugins for UnitarySynthesis only and then rely on our
built-in methods for physical optimiztion only. But this also seems less
than ideal because the plugin mechanism is how we support synthesizing
to custom basis gates, and also more advanced approximate synthesis
methods. Both of those are things we need to do as part of the synthesis
here.

Additionally, this is currently missing tests and documentation and while
running it manually "works" as in it returns a circuit that looks valid,
I've not done any validation yet. This also likely will need several
rounds of performance optimization and tuning.

Details and comments

Fixes #12007
Fixes #11659

TODO:

This PR is based on top of #13410 and will need to be rebased once that PR merges. You can view the contents of just this PR by looking at the HEAD commit: d75ebf3

@mtreinish mtreinish added performance Changelog: New Feature Include in the "Added" section of the changelog Rust This PR or issue is related to Rust code in the repository mod: transpiler Issues and PRs related to Transpiler labels Nov 10, 2024
@mtreinish mtreinish added this to the 2.0.0 milestone Nov 10, 2024
@coveralls
Copy link

coveralls commented Nov 10, 2024

Pull Request Test Coverage Report for Build 12113716869

Details

  • 57 of 398 (14.32%) changed or added relevant lines in 8 files are covered.
  • 12 unchanged lines in 4 files lost coverage.
  • Overall coverage decreased (-0.3%) to 88.61%

Changes Missing Coverage Covered Lines Changed/Added Lines %
crates/accelerate/src/two_qubit_decompose.rs 31 32 96.88%
crates/accelerate/src/unitary_synthesis.rs 1 2 50.0%
qiskit/transpiler/passes/optimization/two_qubit_peephole.py 0 20 0.0%
crates/accelerate/src/two_qubit_peephole.rs 4 323 1.24%
Files with Coverage Reduction New Missed Lines %
crates/accelerate/src/two_qubit_decompose.rs 1 92.11%
crates/accelerate/src/unitary_synthesis.rs 1 92.32%
crates/qasm2/src/lex.rs 4 92.98%
crates/qasm2/src/parse.rs 6 97.62%
Totals Coverage Status
Change from base Build 12106478374: -0.3%
Covered Lines: 79426
Relevant Lines: 89635

💛 - Coveralls

This commit adds a new transpiler pass for physical optimization,
TwoQubitPeepholeOptimization. This replaces the use of Collect2qBlocks,
ConsolidateBlocks, and UnitarySynthesis in the optimization stage for
a default pass manager setup. The pass logically works the same way
where it analyzes the dag to get a list of 2q runs, calculates the matrix
of each run, and then synthesizes the matrix and substitutes it inplace.
The distinction this pass makes though is it does this all in a single
pass and also parallelizes the matrix calculation and synthesis steps
because there is no data dependency there.

This new pass is not meant to fully replace the Collect2qBlocks,
ConsolidateBlocks, or UnitarySynthesis passes as those also run in
contexts where we don't have a physical circuit. This is meant instead
to replace their usage in the optimization stage only. Accordingly this
new pass also changes the logic on how we select the synthesis to use
and when to make a substituion. Previously this logic was primarily done
via the ConsolidateBlocks pass by only consolidating to a UnitaryGate if
the number of basis gates needed based on the weyl chamber coordinates
was less than the number of 2q gates in the block (see Qiskit#11659 for
discussion on this). Since this new pass skips the explicit
consolidation stage we go ahead and try all the available synthesizers

Right now this commit has a number of limitations, the largest are:

- Only supports the target
- It doesn't support any synthesizers besides the TwoQubitBasisDecomposer,
  because it's the only one in rust currently.

For plugin handling I left the logic as running the three pass series,
but I'm not sure this is the behavior we want. We could say keep the
synthesis plugins for `UnitarySynthesis` only and then rely on our
built-in methods for physical optimiztion only. But this also seems less
than ideal because the plugin mechanism is how we support synthesizing
to custom basis gates, and also more advanced approximate synthesis
methods. Both of those are things we need to do as part of the synthesis
here.

Additionally, this is currently missing tests and documentation and while
running it manually "works" as in it returns a circuit that looks valid,
I've not done any validation yet. This also likely will need several
rounds of performance optimization and tuning. t this point this is
just a rough proof of concept and will need a lof refinement along with
larger changes to Qiskit's rust code before this is ready to merge.

Fixes Qiskit#12007
Fixes Qiskit#11659
Since Qiskit#13139 merged we have another two qubit decomposer available to
run in rust, the TwoQubitControlledUDecomposer. This commit updates the
new TwoQubitPeepholeOptimization to call this decomposer if the target
supports appropriate 2q gates.
Clippy is correctly warning that the size difference between the two
decomposer types in the TwoQubitDecomposer enumese two types is large.
TwoQubitBasisDecomposer is 1640 bytes and TwoQubitControlledUDecomposer
is only 24 bytes. This means each element of ControlledU is wasting
> 1600 bytes. However, in this case that is acceptable in order to
avoid a layer of pointer indirection as these are stored temporarily
in a vec inside a thread to decompose a unitary. A trait would be more
natural for this to define a common interface between all the two qubit
decomposers but since we keep them instantiated for each edge in a Vec
they need to be sized and doing something like
`Box<dyn TwoQubitDecomposer>` (assuming a trait `TwoQubitDecomposer`
instead of a enum) to get around this would have additional runtime
overhead. This is also considering that TwoQubitControlledUDecomposer
has far less likelihood in practice as it only works with some targets
that have RZZ, RXX, RYY, or RZX gates on an edge which is less common.
Also don't run scoring more than needed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: New Feature Include in the "Added" section of the changelog mod: transpiler Issues and PRs related to Transpiler performance Rust This PR or issue is related to Rust code in the repository
Projects
None yet
2 participants