-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate variance in benchmarking results #251
Comments
Haven't found a clear issue yet, but noticed a few things that could cause uncontrolled sources of variance
|
I was able to recreate a bit of this locally by focusing on the large change in To test this on the GIthub Runner, I made a change to set the max threads to 1 when constructing the AERSimulator instances (https://github.com/unitaryfund/ucc/tree/bc-test-AER-1-thread). Running the benchmarks shows this reverts back to the prior timing Baseline (current master branch)
With setting max threads to 1
which is closer to the older averages
But still off by a factor of two. I'll next try separating the timing runs from the expectation runs to see if that normalizes things better (maybe something else is causing contention on these processes). I would expect the performance isn't so sensitive we'd need to pin to specific cores. |
Following on the above investigations (which can now be tracked in https://github.com/unitaryfund/ucc/tree/bc-test-AER-1-thread), it looks like both adding the expectation value runs AND varying the parallelism specified to Below, I'll summarize these observations and then make some recommendations on next steps. ObservationsLimiting
|
Thanks for investigating @bachase. I am still not clear why updating PyTKET was the change where this showed up. The parallelism has been set to the same number since we began using the Actions workflow, so again not sure why this would be an issue here. |
@jordandsullivan I admit I don't think I've got it fully explained, but here's my best attempt focusing on just that change in PyTKET compile times. To confirm we are talking about the same observation, the question is: Why did the benchmark's average compilation time increase so much for PyTKET around Feb 19, given the benchmarks have been using the same version (1.40.0) of PyTKET since Jan 28? Looking at benchmark runs around this date, the jump started in #250 as you had originally noticed and flagged.
The main change with #250 was to upgrade to the latest version of dependencies in the Instead I tried to recreate this locally and noticed the parallelism was impacting the duration of the benchmarks. I confirmed that also happened on the Github runners. The successive results in comment above show how changing different aspects of parallelization dropped the The biggest impact does look to be from having the expectation value benchmark runs enabled. You can see this drift upwards starting around Feb 11 or so as well. So I think that does match your instinct here, where something else than the use of But following up on your instinct here, rather than my original proposal to remove all parallelism, perhaps we focus on just having the expected value benchmarks run separately from the timing benchmarks? That seems to control for most of this variance.
|
Yep, I think running the expected value benchmarks separately (or successively, making sure they do not get run in parallel with the compile time/gate count benchmarks) would be very logical. |
Originally posted by @jordandsullivan in #250 (comment)
The text was updated successfully, but these errors were encountered: