-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Sabre extended set population order #10651
Fix Sabre extended set population order #10651
Conversation
One or more of the the following people are requested to review this:
|
Pull Request Test Coverage Report for Build 5883464567
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic behind doing this looks sensible to me - it was always easier to write the BFS traversal in the order I did because it's append-only, but maintaining a separate vector for eager iteration looks like a good choice.
It might be worth running the large-scale QFT/QV benchmarks from the asv suite to see how runtime performance changes here, if at all.
5be81d0
to
1ab85c4
Compare
Pull Request Test Coverage Report for Build 7024159580
💛 - Coveralls |
There did not appear to be any significant changes when I ran the transpilation benchmarks 🙂. asv continuous -E virtualenv-py3.8 --split --no-only-change upstream/main wt-terra-sabre-ext-set-order --bench "Level.*time" --bench "transpil"
asv compare --only-changed upstream/main wt-terra-sabre-ext-set-order
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your copy in actually seems to suggest an improvement I think? I ran a different set of benchmarks and saw the expected no change:
jake@ninetales$ asv compare -s ibm/main kevinh/wt-terra-sabre-ext-set-order
Benchmarks that have stayed the same:
before after ratio
[50e81374] [1ab85c42]
<tmp-sabre^2> <tmp-sabre~1>
257±3ms 272±5ms 1.06 mapping_passes.PassBenchmarks.time_sabre_layout(14, 1024)
427±2ms 443±7ms 1.04 mapping_passes.PassBenchmarks.time_sabre_layout(20, 1024)
68.7±0.6ms 72.1±1ms 1.05 mapping_passes.PassBenchmarks.time_sabre_layout(5, 1024)
133±1ms 134±1ms 1.00 mapping_passes.PassBenchmarks.time_sabre_swap(14, 1024)
213±2ms 211±1ms 0.99 mapping_passes.PassBenchmarks.time_sabre_swap(20, 1024)
41.2±0.6ms 40.4±0.6ms 0.98 mapping_passes.PassBenchmarks.time_sabre_swap(5, 1024)
147±1ms 146±0.7ms 0.99 qft.LargeQFTMappingTimeBench.time_sabre_swap(115, 'decay')
152±2ms 152±0.7ms 1.00 qft.LargeQFTMappingTimeBench.time_sabre_swap(115, 'lookahead')
2.30±0.01s 2.31±0s 1.00 qft.LargeQFTMappingTimeBench.time_sabre_swap(409, 'decay')
2.23±0s 2.27±0.01s 1.02 qft.LargeQFTMappingTimeBench.time_sabre_swap(409, 'lookahead')
4.11±0.01s 4.07±0.01s 0.99 quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(1081, 10, 'decay')
2.54±0.02s 2.52±0.02s 1.00 quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(1081, 10, 'lookahead')
n/a n/a n/a quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(1081, 100, 'decay')
n/a n/a n/a quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(1081, 100, 'lookahead')
34.1±0.7ms 34.3±0.3ms 1.01 quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(115, 10, 'decay')
31.4±0.3ms 31.0±0.6ms 0.99 quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(115, 10, 'lookahead')
295±3ms 293±1ms 0.99 quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(115, 100, 'decay')
280±2ms 285±4ms 1.02 quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(115, 100, 'lookahead')
389±4ms 392±4ms 1.01 quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(409, 10, 'decay')
412±3ms 403±4ms 0.98 quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(409, 10, 'lookahead')
n/a n/a n/a quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(409, 100, 'decay')
n/a n/a n/a quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(409, 100, 'lookahead')
Maybe it's random-seed effects we're seeing in yours more? Either way, this looks good to me.
Summary
When building the extended set, the current traversal order uses a BFS of all nodes as they become unblocked. This means that the order in which we encounter 2Q operations isn't necessarily the ones closest to inclusion in the front layer first. Since we limit the size of the extended set, operations that are closer to the front layer might not actually be included.
To fix this, we change the traversal order to visit non-2Q nodes (namely, 1Q gates and control flow ops) as soon as they're unblocked. Otherwise, we do the original BFS. This way, we get a BFS specifically of 2Q nodes as they become unblocked.
Details and comments
Here's an example of the traversal orders before and after this PR for a 4 qubit circuit with a run of 1Q H gates, where the extended set is limited to size 4.
The magenta arrows show the traversal orders, and the numbers (1., 2., 3., 4.) denote the nodes that make up the extended set after traversal. Notice that the extended set on
main
doesn't contain the ECR gate that happens onq0
andq1
, even though it is closer to being pulled into the front layer than the latter 2 ECR gates onq2
andq3
.