Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements in is_isomorphic algorithm #288

Merged
merged 15 commits into from
May 4, 2021

Conversation

georgios-ts
Copy link
Collaborator

@georgios-ts georgios-ts commented Mar 22, 2021

Introduced changes:

  • VF2 algorithm begins with an empty mapping and gradually extends it. It tries to match nodes according to a specified order. Currently, nodes are sorted based on their ids. This PR implements a simpler version (does not take into account labels of nodes) of the matching order proposed in VF2++ paper in Section 4.1.
  • VF2 algorithm uses cutting rules to prune early mappings that would not lead to a complete mapping. This PR adds some new rules (R_out, R_in, R_new) based on the original VF2 paper.

@georgios-ts
Copy link
Collaborator Author

A benchmark:

import time
import retworkx as rx


def permute(g, seed=None):
    nodes = list(g.node_indexes())
    edges = list(g.edge_list())
    
    if seed:
        np.random.seed(seed)

    np.random.shuffle(nodes)
    
    edges = map(lambda e: (nodes[e[0]], nodes[e[1]]), g.edge_list())
    edges = list(edges)
    
    if isinstance(g, rx.PyDiGraph):
        res = rx.PyDiGraph()
    else:
        res = rx.PyGraph()
        
    res.add_nodes_from([None for _ in range(len(nodes))])
    res.add_edges_from_no_data(edges)
    
    return res


n = 10000
degrees = [10, 15, 50, 100]


total = 0
for deg in degrees:
    p = 2 * deg / (n - 1)

    g_a = rx.directed_gnp_random_graph(n, p, seed=42)
    g_b = permute(g_a, seed=4242)
        
    start = time.time()
    res = rx.is_isomorphic(g_a, g_b);
    assert res
    stop = time.time()
        
    dt = stop - start
    total += dt

print(total)
Branch Run time (sec)
master 135.61
#288 2.73

@coveralls
Copy link

coveralls commented Mar 22, 2021

Pull Request Test Coverage Report for Build 808147690

  • 172 of 177 (97.18%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.5%) to 96.543%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/isomorphism.rs 166 171 97.08%
Totals Coverage Status
Change from base Build 804841300: 0.5%
Covered Lines: 6842
Relevant Lines: 7087

💛 - Coveralls

Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just skimmed the code so far (I'll do an in depth review later) but besides the doc build failing can you also add a release note for the new default_order kwarg on the is_isomorphic function?

I'm also wondering if there is any backwards compatibility concern here. I haven't had a chance to read the vf2++ paper yet, but does using the heuristic matching order expose us to any risk of breaking users? I would assume not and this is fine then because we definitely want to default to using it because of the much faster performance.

There is also is_isomorphic_node_match() do we want to expose default order there too?

@georgios-ts
Copy link
Collaborator Author

@mtreinish Sure. release note added and kwarg id_order exposed in is_isomorphic_node_match() as well. (i renamed default_order to id_order)

No, i don't think there is any concern for backwards compatibility here. The algorithm will still work as expected. I should note however that there might be cases that the order based on ids will actually run faster since we are only talking about a heuristic order. This was one of the reasons i decided to add id_order as a user-defined argument.

@mtreinish
Copy link
Member

So I was curious how the heuristic works on qiskit's DAGCircuit.__eq__ which is the primary user of the vf2 method and I'm thinking this actually results in a pretty big regression if we change the default. I ran:

import time

from qiskit.circuit.random import random_circuit
from qiskit.converters import circuit_to_dag

circuit_a = circuit_to_dag(random_circuit(15, 4096, seed=42))
circuit_b = circuit_to_dag(random_circuit(15, 4096, seed=512))

start = time.time()
circuit_a == circuit_b
stop = time.time()
print(stop - start)

start = time.time()
circuit_a == circuit_a
stop = time.time()
print(stop - start)

Which produced:

version unequal time (sec) equal time (sec)
0.8.0 1.8835067749023438e-05 122.7758412361145
master 7.867813110351562e-05 128.5686149597168
#288 0.018624544143676758 135.3075144290924

I'm thinking we probably shouldn't change the default order if it's going to have such a large impact on qiskit's performance.

@georgios-ts
Copy link
Collaborator Author

in this example, you are "cheating" since you are comparing identical circuits and the order based on node ids is the "optimal". in fact the only work that is_isomorphic does is calling python to check semantic equality between the nodes.

I guess a more realistic example would be:

import retworkx as rx
import numpy as np
import time

from qiskit.circuit.random import random_circuit
from qiskit.converters import circuit_to_dag

def permute(graph):
    num_nodes = len(graph)

    data  = list(graph.nodes())
    edges = list(graph.edge_list())
    
    nodes = np.random.permutation(num_nodes)
    edges = map(lambda edge: (nodes[edge[0]], nodes[edge[1]]), 
                graph.edge_list())
    edges = list(edges)
    
    if isinstance(graph, rx.PyDiGraph):
        res = rx.PyDiGraph()
    else:
        res = rx.PyGraph()
        
    pdata = [None for _ in range(num_nodes)]
    for i, ni in enumerate(nodes):
        pdata[ni] = data[i]

    res.add_nodes_from(pdata)
    res.add_edges_from_no_data(edges)
    
    return res

circuit_a = circuit_to_dag(random_circuit(15, 4096, seed=42))
graph_a = circuit_a._multi_graph
graph_a_perm = permute(graph_a)


start = time.time()
res = rx.is_isomorphic(graph_a, graph_a, id_order=False)
assert res
stop = time.time()
print(stop - start)
---
1.999 sec


start = time.time()
res = rx.is_isomorphic(graph_a, graph_a, id_order=True)
assert res
stop = time.time()
print(stop - start)
---
1.026 sec


start = time.time()
res = rx.is_isomorphic(graph_a, graph_a_perm, id_order=False)
assert res
stop = time.time()
print(stop - start)
---
1.927 sec


start = time.time()
res = rx.is_isomorphic(graph_a, graph_a_perm, id_order=True)
assert res
stop = time.time()
print(stop - start)
---
No termination even after 20mins

@georgios-ts
Copy link
Collaborator Author

@mtreinish After some experiment, i guess you are right and keeping the default order is indeed the right path for qiskit perfmornace. I guess the reason is that we are restricted to a small subset of all the possible permutation of nodes if two quantum circuits have isomorphic dagcircuit representations and matching the nodes according to their ids order is faster.

Outside of qiskit, i believe this PR makes sense.

@mtreinish
Copy link
Member

@mtreinish After some experiment, i guess you are right and keeping the default order is indeed the right path for qiskit perfmornace. I guess the reason is that we are restricted to a small subset of all the possible permutation of nodes if two quantum circuits have isomorphic dagcircuit representations and matching the nodes according to their ids order is faster.

Outside of qiskit, i believe this PR makes sense.

Yeah, I agree with that. I think the path forward here is for this first PR we just add the option to change the order now but don't switch to the heuristic by default for the first release. This will give us a chance to adjust terra to explicitly say it wants the id order. Then we can deprecate and change the default order to use the heuristic. For the non-qiskit case the best we can do for this first PR is add a detailed docstring about setting the id_order flag to false for better performance on general graphs.

@mtreinish mtreinish added this to the 0.9.0 milestone Apr 6, 2021
Copy link
Collaborator

@IvanIsCoding IvanIsCoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great addition to the library! It's good to have one of the best algorithms that can compete with LEMON and other graph libraries.

I requested changes mostly to modify the name of Graph<Ty> and to see if we can avoid copy and pasting those for loops in try_match.

Lastly, with regards to the performance regression: keep in mind the authors of the VF2++ paper never tested it on directed graphs or DAGs. So for our specific case in Qiskit, the difference between VF2 and VF2++ won't be as impressive as for random graphs.

To avoid a regression in Qiskit, we should default to no heuristic. Later we can explicit disable the heuristic in Qiskit's code and enable the heuristic by default on our side.

@georgios-ts
Copy link
Collaborator Author

@IvanIsCoding Thanks for the review!
Regarding performance, my understanding is that VF2++ is still faster in DAGs (altough they are not showing any results in the paper indeed). The reason we see a slight performance regression in Qiskit, is the following: using Qiskit circuit API , the number of ways we will end up with isomorphic circuit dags are the number of different topological orders of the dag. This is considerably less than the number of permutations.

// repeatedly bring largest element in front.
for i in 0..vd.len() {
let (index, &item) = vd[i..]
.par_iter()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious did you benchmark this? I'm a bit worried that using a parallel iterator here will actually make this slower, especially as i approaches vd.len(). Parallel iterators aren't always faster, especially for smaller iterators where the overhead often is higher than the execution time. So we should be tactical about where we leverage rayon and not just use it everywhere because we can.

Copy link
Collaborator

@IvanIsCoding IvanIsCoding Apr 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtreinish I support benchmarking it, but I can give some insight on why I suggested Georgios to parallelize it.

That loop looks like it could be one of the most expensive for sparse graphs. It does a procedure similar to Selection Sort, and it is quadratic on vd.len() (and from what I understood vd contains all graph nodes at a given BFS level). So it makes sense to parallelize the inner loop that is executed multiple times, because for most of the iterations it will be an expensive loop.

For the benchmark, I suggest running it against a sparse graph and comparing it. The simplest one that comes to my mind is a Star Graph.

Ideally, we should do like in all or other functions and decide if we parallelize the loop based on the vector size:

for i in 0..vd.len() {
  let (index, &item) = if vd.len() - i >= parallel_threshold {
     // parallel loop
  }
  else {
     // fall back to sequential loop
  }
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after some benchmarks on complete, star and gnp graphs, parallel iterator is a bit slower. i tried to set a parallel_threshold but no improvements either. so i switched back to serial iterator.

Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks for doing this, just a couple small doc formatting issues and a question about the type for the new argument in the rust code. But other than that I think this is good to go

Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the fast update.

@mtreinish mtreinish merged commit e940fd5 into Qiskit:main May 4, 2021
@georgios-ts georgios-ts deleted the pr-speedup-vf2 branch May 8, 2021 07:41
@georgios-ts georgios-ts mentioned this pull request Jun 27, 2021
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants