-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add adaptive limits for VF2Layout in preset passmanagers #7705
Comments
Hey @mtreinish, as a starting point in bench marking of |
Well there is technically no limit to the size of the coupling graph size. For benchmarking to start I would probably do a sweep from like 10 node to 10000 nodes (which yeah is really big) and see if we can find a trend, but if 10000 is too big and takes too long you can start smaller. Using some of the retworkx generator functions: https://qiskit.org/documentation/retworkx/api.html#generators (especially https://qiskit.org/documentation/retworkx/apiref/retworkx.generators.directed_heavy_square_graph.html and https://qiskit.org/documentation/retworkx/apiref/retworkx.generators.directed_heavy_hex_graph.html ) will be useful here to quickly create large graphs. I expect that the performance of the |
Isn't it most time consuming closer to half the total graph size? There are only so many options for placing an N-1 size circuit |
Well I was thinking more about time to find that initial isomorphic mapping, When the size of the graphs are closer the search space to verify a mapping is valid is a lot bigger because you have to check more nodes. Yeah, there are less overall posibilities of valid mappings if the subgraph when the graphs are closer in size, but to find that first mapping you have to do a lot more work. I did a very small benchmark to test this with import csv
import time
import numpy as np
import retworkx as rx
with open("vf2_times.csv", "w", newline="") as csvfile:
times_writer = csv.writer(csvfile)
times_writer.writerow(["cmap_size", "imap_size", "time"])
cmap = rx.generators.directed_grid_graph(7, 7, bidirectional=True, multigraph=False)
for i in rang(5, 45):
imap = rx.generators.directed_path_graph(i)
times = []
for _ in range(5):
start = time.perf_counter()
next(rx.vf2_mapping(cmap, imap, subgraph=True, id_order=False, induced=False))
stop = time.perf_counter()
run_time = stop - start
times.append(run_time)
avg_run_time = float(np.mean(times))
times_writer.writerow([len(cmap), len(imap), avg_run_time]) graphing that output was: |
Have you thought about using McKay's |
I wasn't familiar with There's also a more practical concern around integrating an external c library/standalone tool as a dependency which we really don't have a good way to do directly in terra. If we did decide to start leveraging it we'd probably need to make a standalone python library that wrapped nauty and published precompiled binaries for nauty and then add that as a dependency for terra. |
Given one graph, [1] http://users.cecs.anu.edu.au/~bdm/data/formats.txt That said, I am interested in looking at benchmarks for the |
Misc comments
Proposal@mtreinish says [via Slack chat], "It [ HM0880: The best predictor of run time is the size of the subgraph H. |H| = 21 runs under 0.1 seconds for up to the 100x100 grid graph G. For the three levels of the preset pass managers, I propose the following settings for the Table 1: Proposed
Plots
|
Sorry I wasn't clear in my offline comment, the level 1: which was our baseline times for running this pass. The open question now is based on the data that you've collected do you think that dynamically scaling that |
Oh, now I understand how “3e7” corresponds to ~60 seconds of run time; thanks. Hopefully, the text below answers your question.
Yes.
In Figure 1 from yesterday, the running times start to saturate when |G| is 20x20, so it is reasonable to treat |G| = 100x100 as a limiting size. I tried fitting the exponential curve Figure 4: Fit of the exponential curve To be independent of the classical device computing power, the best method would be to run similar benchmarks, but instead of reporting the time, report the value of the internal |
Well it depends on how you want to return it. The easiest way would probably be to just print it with |
Here are I ended up using a combination of The line that goes through the green circles is the best-fit line ( A 10 B*|H| ) to those points. The other line (slightly above the fit line) is a scaled version of the Figure 5: Log plot of |
What is the expected enhancement?
In #7213 we added the vf2layout pass to all the preset passmanagers. For the instantiation of that pass on each optimization level we had to set limits on the pass, both in number of internal vf2 state visits (which gets passed directly to retworkx), as well as other options to limit the amount of time we spend searching for an ideal solution. For example:
in each optimization level we set an increasing limit to roughly model the maximum amount of time we want to spend on the pass: 100ms for level 1, 10sec for level 2, and 60 sec for level 3. For a first approach this makes sense as it was easy to implement and will improve the quality of the transpiler significantly when there is a perfect mapping available. However, the limitation with this approach is it doesn't take into account the complexity of the input problem. As both the size of the interaction graph and the coupling graph increase the amount of time it takes for vf2 to find a potential solution can increase significantly. To accommodate this we should come up with a scaling factor for the hard limits we place on the pass so that we can scale these limits to something appropriate for inputs that will require more time. This shouldn't be a problem as the other passes typically scale linearly with the number of qubits (and routing passes scale exponentially) so spending extra time up front to potentially avoid the slower passes later is a decent tradeoff as long as we don't spend too much time.
To implement this we'll need to do some benchmarking of retworkx's
vf2_mapping()
function to figure out the scaling properties of how long it takes to find a result. Once we have a good model of that we can come up with a formula on how to scale the limits appropriately for each optimization level and apply that in the preset passmanager based on the input target backend.The text was updated successfully, but these errors were encountered: