-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up Clifford.compose
#7483
Speed up Clifford.compose
#7483
Conversation
Pull Request Test Coverage Report for Build 3891390584
💛 - Coveralls |
Three FYI comments:
|
@merav-aharoni is currently working on speeding-up 2-qubit RB in qiskit-experiments. |
I'd like to have this feature. RB module in qiskit-experiments is going to have its own implementation of 1Q/2Q Cliffords for optimizing performance of 1Q/2Q RB (In that, I'm planning to borrow the hash idea shown in this PR). However, 3Q/more RB will continue to use general Cliffords in terra. So the speed up by this PR is still useful for RB experiments. |
Now that #7269 is merged, I will revisit this PR once Terra 0.22 is out the door and we can get it in Terra 0.23 - it's been held up for a while because we didn't know what the internals of |
This is a fairly "dumb" optimisation; the algorithm remains exactly the same, but the loops are refactored to do everything meaningful within Numpy vectorised operations. This has a slightly larger memory footprint, than the previous versions in order to vectorise most efficiently, but it is the same time complexity. In practice, however, the speed-ups are very large. On my machine, the timings went (for Cliffords of `n` qubits and made from 10 gates each): n new old 1 0.025 0.057 3 0.146 0.162 10 0.366 1.30 30 1.11 13.5 100 8.51 337 where all timings are in ms. The scaling should not (in theory) be different, but in practice, the faster Numpy vectorisation means larger matrices get bigger speedups. This transforms the 1-qubit case into a lookup table, populated on the first call to `Clifford.compose` with 1-qubit operators. There are 576 possible combinations of 1-qubit Clifford operators, which is small enough to simply look up. Having the special case is an optimisation for randomised benchmarking scripts. (In those cases, doing the lookup manually can save another factor of 2x, because half the time spend in this function is just copying the mutable values from the lookup table.) This also adds a `copy` argument to the constructor of `Clifford` to avoid copying the stabilizer table when it is not necessary.
dee30d1
to
fdbd751
Compare
This is now rebased over |
@ikkoham or @t-imamichi, please could I ask one of you to review this ahead of Terra 0.23? There's no massive rush - I'm about to be on holiday until January - I'm just keen for it not to get dropped, and I know @ihincks is also interested in having a faster platform for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is looking really clean! Although test_compose_method()
looks pretty trustworthy, I have also further confirmed that the new compose implementation is equal to the old one up to tableau bits by running the following script on main and then on this branch:
from qiskit.quantum_info import random_clifford, Clifford
import qiskit
import pickle
import sys
def to_bools(cliff):
return list(map(list, cliff.tableau))
def from_bools(bools):
return Clifford(bools)
x = random_clifford(10)
assert x == from_bools(to_bools(x))
if __name__ == "__main__":
print(qiskit.__file__)
if sys.argv[1] == "write":
data = []
for n in [1, 2, 3, 10, 13]:
for _ in range(10):
a = random_clifford(n)
b = random_clifford(n)
c = a.compose(b)
data.append(tuple(map(to_bools, (a,b,c))))
with open("cliffords.data", "wb") as f:
pickle.dump(data, f)
print("wrote.")
else:
with open("cliffords.data", "rb") as f:
data = pickle.load(f)
count = 0
for a, b, c in data:
assert Clifford(a).compose(Clifford(b)) == Clifford(c)
count += 1
print(f"{count} valid assertions")
Co-authored-by: Ian Hincks <ian.hincks@ibm.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
LGTM! Note that @ihincks has already reviewed and approved this PR in the past. Nevertheless, I have carefully double-checked the code (including manually working out |
@Mergifyio requeue |
✅ The queue state of this pull request has been cleaned. It can be re-embarked automatically |
Summary
This is a fairly "dumb" optimisation; the algorithm remains exactly the
same, but the loops are refactored to do everything meaningful within
Numpy vectorised operations. This has a slightly larger memory
footprint, than the previous versions in order to vectorise most
efficiently, but it is the same time complexity.
In practice, however, the speed-ups are very large. On my machine, the
timings went (for Cliffords of
n
qubits and made from 10 gates each):where all timings are in ms. The scaling should not (in theory) be
different, but in practice, the faster Numpy vectorisation means larger
matrices get bigger speedups.
This transforms the 1-qubit case into a lookup table, populated on the
first call to
Clifford.compose
with 1-qubit operators. There are 576possible combinations of 1-qubit Clifford operators, which is small
enough to simply look up. Having the special case is an optimisation
for randomised benchmarking scripts. (In those cases, doing the lookup
manually can save another factor of 2x, because half the time spend in
this function is just copying the mutable values from the lookup table.)
This also adds a
copy
argument to the constructor ofClifford
toavoid copying the stabilizer table when it is not necessary.
Details and comments
There may well be a better speed-up available with a different algorithm, and the memory usage could certainly be reduced if I rewrote this in Cython because the accumulator variables used to facilitate vectorisation wouldn't be necessary. I don't know if the method is used enough to warrant the extra compile time, though.
edit: I've now rebased this over
main
. I don't know what exactly I did with the original benchmarks, but I just re-ran them now, and there's still good speedups (though performance of the oldClifford.compose
doesn't seem to be quite as bad as I originally had it).