Improve reproducibility and consistency of benchmarking workflow #253

bachase · 2025-02-24T18:37:33Z

These changes will

Reduce the parallelism in the benchmarking runs to minimize contention that might otherwise impact the results. This includes restricting Qiskit-AER simulations to a maximum of 1 parallel thread and run expected value benchmarks separate in parallel separate from timing benchmarks in parallel. ~~and forcing just 1 parallel job in the benchmark script itself~~.
Use the locked version of dependencies in the benchmarking runs to avoid an uncontrolled change in a dependent library or transitive dependency.
Ensure only one instance of the benchmark github workflow runs at a time by specifying a concurrency group.

This partially addresses #251

natestemen

Makes sense to me. Do you know by how much time this increases the runtime of run_benchmarks.sh?

bachase · 2025-02-25T19:41:26Z

Makes sense to me. Do you know by how much time this increases the runtime of run_benchmarks.sh?

Looking at https://github.com/unitaryfund/ucc/actions/workflows/ucc-benchmarks.yml?query=branch%3Abc-test-AER-1-thread, from about 10 minutes to about 30 minutes.

Per some more recent discussion in #251, we could

Retain the parallelism for the expected value benchmark runs as those are not measuring timing
Retain parallelism for all benchmark runs, but just schedule the expected value ones separate from the timing ones.

Option 2 would keep the overall runtime fastest, but option 1 is more conservative until we get a stronger handle on how the parallel jobs are getting scheduled on the runner. But given the historical benchmark runs, this might be overly conservative as things were fairly steady for a while.

These changes will 1. Reduce the parallelism in the benchmarking runs to minimize contention that might otherwise impact the results. This includes restricting Qiskit-AER simulations to a maxmimum of 1 parallel thread and forcing just 1 parallel job in the benchmark script itself. 2. Use the locked version of dependencies in the benchmarking runs to avoid an uncontrolled change in a dependent library or transitive dependency. 3. Ensure only one instance of the benchmark github workflow runs at a time by specifying a concurrency group.

bachase · 2025-02-25T20:29:42Z

Updated this to be Option 2, per latest discussion with @jordandsullivan in #251.

bachase · 2025-02-26T21:38:00Z

Note the commit with benchmark results that show this addresses the recent increase in PyTKET (thanks @jordandsullivan for kicking them off).

bachase requested review from natestemen and Misty-W February 24, 2025 18:37

natestemen reviewed Feb 25, 2025

View reviewed changes

bachase force-pushed the 251-improve-benchmarks branch from 3a57b16 to 10b8bf5 Compare February 25, 2025 20:28

Update benchmark results [benchmark chore]

fa7a73f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve reproducibility and consistency of benchmarking workflow #253

Improve reproducibility and consistency of benchmarking workflow #253

bachase commented Feb 24, 2025 •

edited

Loading

natestemen left a comment

bachase commented Feb 25, 2025

bachase commented Feb 25, 2025

bachase commented Feb 26, 2025

Improve reproducibility and consistency of benchmarking workflow #253

Are you sure you want to change the base?

Improve reproducibility and consistency of benchmarking workflow #253

Conversation

bachase commented Feb 24, 2025 • edited Loading

natestemen left a comment

Choose a reason for hiding this comment

bachase commented Feb 25, 2025

bachase commented Feb 25, 2025

bachase commented Feb 26, 2025

bachase commented Feb 24, 2025 •

edited

Loading