Refactor benchmark jobs in ci.yml #16431

ScottTodd · 2024-02-16T00:00:41Z

These jobs in the ci.yml workflow are increasing infrastructure complexity and are a bottleneck on total presubmit time:

build_benchmark_tools
build_e2e_test_artifacts
test_benchmark_suites
compilation_benchmarks (opt-in)
execution_benchmarks (opt-in)
process_benchmark_results (opt-in)

When no benchmarks are enabled, the build_e2e_test_artifacts -> test_benchmark_suites jobs still run. The intent here was to verify that benchmark suite models can be successfully compiled and executed correctly even if they are not benchmarked. The reality is that the test_benchmark_suites workflows are only running one test: iree/tools/test/iree_run_module_correctness_test, and the build_e2e_test_artifacts workflow takes 9 minutes to build multiple platform variants of the same program. The coverage provided has redundancies and the overall setup has multiple excess minutes of overhead (not even counting time spent waiting for expensive self-hosted runners.

I propose we

remove the test_benchmark_suites matrix entirely
move build_benchmark_tools and build_e2e_test_artifacts to opt-in based on benchmarks being requested (using a preset?)
(as needed) add correctness integration test cases for benchmark programs to experimental/regression_suite rather than try to bolt something on to the existing benchmarking infrastructure like those "run module tests"
(as cleanup) eject the benchmark jobs from ci.yml and place them in benchmark.yml
- they can still use presubmit artifacts via GCS storage or GitHub Actions Artifacts. Ideally they could also run on release artifacts
- they can continue to use configure_ci.py and PR labels as they currently do

cc @pzread

The text was updated successfully, but these errors were encountered:

ScottTodd · 2024-02-16T00:17:44Z

FWIW, I looked at the logs for a sample run of build_e2e_test_artifacts to see if there were some easy optimizations to make and it doesn't look like it.

4m spent compiling various programs (no huge outlier program like other jobs have had before)
1m40s spent uploading outputs
1m fetching Python deps from pip
40 seconds downloading a Dockerfile
10s configuring CMake
< 5s downloading sources and importing them

just... lots of overhead and then a decent number of programs to compile

pzread · 2024-02-16T17:25:10Z

SG. I'll say from time to time we catch problems with build_e2e_test_artifacts, but we can switch and observe failure in postsubmit, then pick the important models into regression_suite. And thanks for looking into these

pzread · 2024-02-16T17:26:13Z

For the test_benchmark_suites, @hcindyl might know more about it

ScottTodd · 2024-02-16T17:28:58Z

I'm seeing if I can pull some statistics for how long each job spends queued right now. (If that ends up taking longer than I'd like I may just proceed without data, but I'd like to know...). For queueing, we could also add more runners, or switch those jobs to use GitHub-hosted runners instead of our own.

I'd rather just eliminate the overhead and complexity instead of throwing more/different resources at it though. Queueing time / job orchestration is only part of the costs.

hcindyl · 2024-02-16T18:38:54Z

See #16438 (comment) for the use of test_benchmark_suites. It is placed in the CI to check LLVM codegen regression for a few realistic models. I have lost track on when they got turned off in https://github.com/openxla/iree/tree/main/tests/e2e, but historically the test gave us some warnings of the RISCV codegen breakage before it trickles down to the project that uses IREE, e.g., https://opensecura.googlesource.com/sw/vec_iree and https://opensecura.googlesource.com/ml-models-public

Progress on #16431. The `build_e2e_test_artifacts` job produces artifacts used for benchmarking but it has been also providing test coverage that the programs in the benchmark suite can be compiled correctly. That test coverage would be a better fit for a test job like `test_gpu` (ci.yml) or `regression_test_cpu` (pkgci.yml). As it is now, that job is building multiple very similar configurations and it then uploads those files to the cloud (to the tune of 70-130GB per CI run). This also adds back coverage that was missing from `test_benchmark_suites`. Those tests should be migrated to another workflow not connected to benchmarks so they can be ran and debugged in isolation. This setup is too brittle and complex.

ScottTodd · 2024-02-19T19:23:49Z

Slight adjustments to the plans from the initial issue description:

build_benchmark_tools, build_e2e_test_artifacts, and test_benchmark_suites are now opt-in and will only run when benchmarks are requested (including all postsubmit runs)
test_benchmark_suites has some useful coverage again, but those tests should be moved to a regular (non-benchmark) presubmit CI job - likely under experimental/regression_suite, but cross_compile_and_test in ci.yml could also be used in the short term
the benchmark jobs should still be moved out of ci.yml

ScottTodd added the infrastructure Relating to build systems, CI, or testing label Feb 16, 2024

ScottTodd self-assigned this Feb 16, 2024

ScottTodd added cleanup 🧹 infrastructure/benchmark Relating to benchmarking infrastructure labels Feb 16, 2024

This was referenced Feb 16, 2024

Delete test_benchmark_suites job matrix. #16438

Closed

Only run benchmark workflows when requested. #16439

Merged

ScottTodd mentioned this issue Apr 9, 2024

Running benchmarks is blocked by build_all #17001

Closed

ScottTodd mentioned this issue Aug 13, 2024

Delete all in-tree benchmark infrastructure code. #18144

Merged

ScottTodd closed this as completed in 9b05f17 Aug 14, 2024

ScottTodd closed this as completed in #18144 Aug 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor benchmark jobs in ci.yml #16431

Refactor benchmark jobs in ci.yml #16431

ScottTodd commented Feb 16, 2024

ScottTodd commented Feb 16, 2024

pzread commented Feb 16, 2024 •

edited

Loading

pzread commented Feb 16, 2024

ScottTodd commented Feb 16, 2024

hcindyl commented Feb 16, 2024

ScottTodd commented Feb 19, 2024

Refactor benchmark jobs in ci.yml #16431

Refactor benchmark jobs in ci.yml #16431

Comments

ScottTodd commented Feb 16, 2024

ScottTodd commented Feb 16, 2024

pzread commented Feb 16, 2024 • edited Loading

pzread commented Feb 16, 2024

ScottTodd commented Feb 16, 2024

hcindyl commented Feb 16, 2024

ScottTodd commented Feb 19, 2024

pzread commented Feb 16, 2024 •

edited

Loading