-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Always run vf2 scoring serially #11112
Conversation
One or more of the the following people are requested to review this:
|
For comparison these are the performance results from this PR compared to main running in parallel: I generated that using this script: import time
from qiskit.transpiler.preset_passmanagers import common
from qiskit.transpiler.passes import VF2PostLayout
from qiskit.transpiler.preset_passmanagers import plugin
from qiskit.transpiler.passmanager_config import PassManagerConfig
from qiskit.providers.fake_provider import FakeSherbrooke
from qiskit import QuantumCircuit
backend = FakeSherbrooke()
pm_config = PassManagerConfig.from_backend(backend, seed_transpiler=42)
pm = common.generate_unroll_3q(backend.target)
pm += plugin.PassManagerStagePluginManager().get_passmanager_stage("layout", "default", pm_config, optimization_level=3)
call_limit, max_trials = common.get_vf2_limits(optimization_level=3)
vf2_pass = VF2PostLayout(target=backend.target, call_limit=call_limit, max_trials=max_trials, seed=42, strict_direction=False)
times = []
for n in range(2, 127):
qc = QuantumCircuit(n, n - 1)
qc.x(n-1)
qc.h(range(n))
qc.cx(range(n-1), n-1)
qc.h(range(n-1))
qc.barrier()
qc.measure(range(n-1), range(n-1))
to_post_layout = pm.run(qc)
start = time.perf_counter()
vf2_pass(to_post_layout)
stop = time.perf_counter()
times.append(stop - start)
print(stop - start)
print(times) |
Pull Request Test Coverage Report for Build 6643541599
💛 - Coveralls |
As part of the VF2Layout and VF2PostLayout passes when there are a large number of matches found we're spending an inordinate amount of time in scoring rebuilding the same views over and over again of the interaction graph for each scoring call. For example, in one test cProfile showed that with Qiskit#11112 when running transpile() on a 65 Bernstein Vazirani circuit with a secret of all 1s for FakeSherbrooke with optimization_level=3 we were calling vf2_utils.score_layout() 161,761 times which took a culmulative time of 14.33 secs. Of that time though we spent 5.865 secs building the edge list view. These views are fixed for a given interaction graph which doesn't change during the duration of the run() method on these passes. To remove this inefficiency this commit moves the construction of the views to the beginning of the passes and just reuses them by reference for each scoring call, avoiding the reconstruction overhead.
The VF2Layout and VF2PostLayout passes' Rust scoring function has the option to use a parallel iterator for computing the score of a layout given a sufficiently large circuit (both in number of 2q interactions or qubits). Previously the scoring would be multithreaded if there were > 50 qubits or > 50 2q interactions in the circuit. However, as recent testing has shown in most cases the performance of doing this operation in parallel is much worse than serial execution. To address this issue, this commit just always uses the serial version. The parallel option is left in place for now just in case someone was manually calling the scoring function with the parallel option set to ``True``.
8c741af
to
f7375c1
Compare
As part of the VF2Layout and VF2PostLayout passes when there are a large number of matches found we're spending an inordinate amount of time in scoring rebuilding the same views over and over again of the interaction graph for each scoring call. For example, in one test cProfile showed that with Qiskit#11112 when running transpile() on a 65 Bernstein Vazirani circuit with a secret of all 1s for FakeSherbrooke with optimization_level=3 we were calling vf2_utils.score_layout() 161,761 times which took a culmulative time of 14.33 secs. Of that time though we spent 5.865 secs building the edge list view. These views are fixed for a given interaction graph which doesn't change during the duration of the run() method on these passes. To remove this inefficiency this commit moves the construction of the views to the beginning of the passes and just reuses them by reference for each scoring call, avoiding the reconstruction overhead.
This looks good. Technically changing the default behavior in vf2_utils would be a breaking change right? Though the impact is non-functional, and even still, going from parallel to single threaded wouldn't really have caller implications. Still might be good to have a release note. |
Yeah, I'm not sure it's a breaking change, |
I added the release note in: a868032 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Always run vf2 scoring serially The VF2Layout and VF2PostLayout passes' Rust scoring function has the option to use a parallel iterator for computing the score of a layout given a sufficiently large circuit (both in number of 2q interactions or qubits). Previously the scoring would be multithreaded if there were > 50 qubits or > 50 2q interactions in the circuit. However, as recent testing has shown in most cases the performance of doing this operation in parallel is much worse than serial execution. To address this issue, this commit just always uses the serial version. The parallel option is left in place for now just in case someone was manually calling the scoring function with the parallel option set to ``True``. * Add release note (cherry picked from commit 0764a33)
As part of the VF2Layout and VF2PostLayout passes when there are a large number of matches found we're spending an inordinate amount of time in scoring rebuilding the same views over and over again of the interaction graph for each scoring call. For example, in one test cProfile showed that with Qiskit#11112 when running transpile() on a 65 Bernstein Vazirani circuit with a secret of all 1s for FakeSherbrooke with optimization_level=3 we were calling vf2_utils.score_layout() 161,761 times which took a culmulative time of 14.33 secs. Of that time though we spent 5.865 secs building the edge list view. These views are fixed for a given interaction graph which doesn't change during the duration of the run() method on these passes. To remove this inefficiency this commit moves the construction of the views to the beginning of the passes and just reuses them by reference for each scoring call, avoiding the reconstruction overhead.
* Always run vf2 scoring serially The VF2Layout and VF2PostLayout passes' Rust scoring function has the option to use a parallel iterator for computing the score of a layout given a sufficiently large circuit (both in number of 2q interactions or qubits). Previously the scoring would be multithreaded if there were > 50 qubits or > 50 2q interactions in the circuit. However, as recent testing has shown in most cases the performance of doing this operation in parallel is much worse than serial execution. To address this issue, this commit just always uses the serial version. The parallel option is left in place for now just in case someone was manually calling the scoring function with the parallel option set to ``True``. * Add release note (cherry picked from commit 0764a33) Co-authored-by: Matthew Treinish <mtreinish@kortar.org>
* Reuse VF2 scoring views for all scoring As part of the VF2Layout and VF2PostLayout passes when there are a large number of matches found we're spending an inordinate amount of time in scoring rebuilding the same views over and over again of the interaction graph for each scoring call. For example, in one test cProfile showed that with #11112 when running transpile() on a 65 Bernstein Vazirani circuit with a secret of all 1s for FakeSherbrooke with optimization_level=3 we were calling vf2_utils.score_layout() 161,761 times which took a culmulative time of 14.33 secs. Of that time though we spent 5.865 secs building the edge list view. These views are fixed for a given interaction graph which doesn't change during the duration of the run() method on these passes. To remove this inefficiency this commit moves the construction of the views to the beginning of the passes and just reuses them by reference for each scoring call, avoiding the reconstruction overhead. * Add EdgeList Rust pyclass to avoid repeated conversion This commit adds a new pyclass written in rust that wraps a rust Vec. Previously the scoring function also used an dict->IndexMap conversion, but the mapping structure wasn't necessary and added additional overhead, so it was converted to a list/Vec to speed up the execution even further. By using this new pyclass as the input to the rust scoring function we avoid converting the edge list from a list to an Vec on each call which will reduce the overhead even further.
* Reuse VF2 scoring views for all scoring As part of the VF2Layout and VF2PostLayout passes when there are a large number of matches found we're spending an inordinate amount of time in scoring rebuilding the same views over and over again of the interaction graph for each scoring call. For example, in one test cProfile showed that with #11112 when running transpile() on a 65 Bernstein Vazirani circuit with a secret of all 1s for FakeSherbrooke with optimization_level=3 we were calling vf2_utils.score_layout() 161,761 times which took a culmulative time of 14.33 secs. Of that time though we spent 5.865 secs building the edge list view. These views are fixed for a given interaction graph which doesn't change during the duration of the run() method on these passes. To remove this inefficiency this commit moves the construction of the views to the beginning of the passes and just reuses them by reference for each scoring call, avoiding the reconstruction overhead. * Add EdgeList Rust pyclass to avoid repeated conversion This commit adds a new pyclass written in rust that wraps a rust Vec. Previously the scoring function also used an dict->IndexMap conversion, but the mapping structure wasn't necessary and added additional overhead, so it was converted to a list/Vec to speed up the execution even further. By using this new pyclass as the input to the rust scoring function we avoid converting the edge list from a list to an Vec on each call which will reduce the overhead even further. (cherry picked from commit a67fe87)
* Reuse VF2 scoring views for all scoring As part of the VF2Layout and VF2PostLayout passes when there are a large number of matches found we're spending an inordinate amount of time in scoring rebuilding the same views over and over again of the interaction graph for each scoring call. For example, in one test cProfile showed that with #11112 when running transpile() on a 65 Bernstein Vazirani circuit with a secret of all 1s for FakeSherbrooke with optimization_level=3 we were calling vf2_utils.score_layout() 161,761 times which took a culmulative time of 14.33 secs. Of that time though we spent 5.865 secs building the edge list view. These views are fixed for a given interaction graph which doesn't change during the duration of the run() method on these passes. To remove this inefficiency this commit moves the construction of the views to the beginning of the passes and just reuses them by reference for each scoring call, avoiding the reconstruction overhead. * Add EdgeList Rust pyclass to avoid repeated conversion This commit adds a new pyclass written in rust that wraps a rust Vec. Previously the scoring function also used an dict->IndexMap conversion, but the mapping structure wasn't necessary and added additional overhead, so it was converted to a list/Vec to speed up the execution even further. By using this new pyclass as the input to the rust scoring function we avoid converting the edge list from a list to an Vec on each call which will reduce the overhead even further. (cherry picked from commit a67fe87) Co-authored-by: Matthew Treinish <mtreinish@kortar.org>
Summary
The VF2Layout and VF2PostLayout passes' Rust scoring function has the option to use a parallel iterator for computing the score of a layout given a sufficiently large circuit (both in number of 2q interactions or qubits). Previously the scoring would be multithreaded if there were > 50 qubits or > 50 2q interactions in the circuit. However, as recent testing has shown in most cases the performance of doing this operation in parallel is much worse than serial execution. To address this issue, this commit just always uses the serial version. The parallel option is left in place for now just in case someone was manually calling the scoring function with the parallel option set to
True
.Details and comments