Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor benchmark jobs in ci.yml #16431

Closed
ScottTodd opened this issue Feb 16, 2024 · 6 comments · Fixed by #18144
Closed

Refactor benchmark jobs in ci.yml #16431

ScottTodd opened this issue Feb 16, 2024 · 6 comments · Fixed by #18144
Assignees
Labels
cleanup 🧹 infrastructure/benchmark Relating to benchmarking infrastructure infrastructure Relating to build systems, CI, or testing

Comments

@ScottTodd
Copy link
Member

Following this discussion on Discord.

These jobs in the ci.yml workflow are increasing infrastructure complexity and are a bottleneck on total presubmit time:

  • build_benchmark_tools
  • build_e2e_test_artifacts
  • test_benchmark_suites
  • compilation_benchmarks (opt-in)
  • execution_benchmarks (opt-in)
  • process_benchmark_results (opt-in)

When no benchmarks are enabled, the build_e2e_test_artifacts -> test_benchmark_suites jobs still run. The intent here was to verify that benchmark suite models can be successfully compiled and executed correctly even if they are not benchmarked. The reality is that the test_benchmark_suites workflows are only running one test: iree/tools/test/iree_run_module_correctness_test, and the build_e2e_test_artifacts workflow takes 9 minutes to build multiple platform variants of the same program. The coverage provided has redundancies and the overall setup has multiple excess minutes of overhead (not even counting time spent waiting for expensive self-hosted runners.

I propose we

  • remove the test_benchmark_suites matrix entirely
  • move build_benchmark_tools and build_e2e_test_artifacts to opt-in based on benchmarks being requested (using a preset?)
  • (as needed) add correctness integration test cases for benchmark programs to experimental/regression_suite rather than try to bolt something on to the existing benchmarking infrastructure like those "run module tests"
  • (as cleanup) eject the benchmark jobs from ci.yml and place them in benchmark.yml
    • they can still use presubmit artifacts via GCS storage or GitHub Actions Artifacts. Ideally they could also run on release artifacts
    • they can continue to use configure_ci.py and PR labels as they currently do

cc @pzread

@ScottTodd ScottTodd added the infrastructure Relating to build systems, CI, or testing label Feb 16, 2024
@ScottTodd ScottTodd self-assigned this Feb 16, 2024
@ScottTodd ScottTodd added cleanup 🧹 infrastructure/benchmark Relating to benchmarking infrastructure labels Feb 16, 2024
@ScottTodd
Copy link
Member Author

FWIW, I looked at the logs for a sample run of build_e2e_test_artifacts to see if there were some easy optimizations to make and it doesn't look like it.

  • 4m spent compiling various programs (no huge outlier program like other jobs have had before)
  • 1m40s spent uploading outputs
  • 1m fetching Python deps from pip
  • 40 seconds downloading a Dockerfile
  • 10s configuring CMake
  • < 5s downloading sources and importing them

just... lots of overhead and then a decent number of programs to compile

@pzread
Copy link
Contributor

pzread commented Feb 16, 2024

SG. I'll say from time to time we catch problems with build_e2e_test_artifacts, but we can switch and observe failure in postsubmit, then pick the important models into regression_suite. And thanks for looking into these

@pzread
Copy link
Contributor

pzread commented Feb 16, 2024

For the test_benchmark_suites, @hcindyl might know more about it

@ScottTodd
Copy link
Member Author

I'm seeing if I can pull some statistics for how long each job spends queued right now. (If that ends up taking longer than I'd like I may just proceed without data, but I'd like to know...). For queueing, we could also add more runners, or switch those jobs to use GitHub-hosted runners instead of our own.

I'd rather just eliminate the overhead and complexity instead of throwing more/different resources at it though. Queueing time / job orchestration is only part of the costs.

@hcindyl
Copy link
Contributor

hcindyl commented Feb 16, 2024

See #16438 (comment) for the use of test_benchmark_suites. It is placed in the CI to check LLVM codegen regression for a few realistic models. I have lost track on when they got turned off in https://github.com/openxla/iree/tree/main/tests/e2e, but historically the test gave us some warnings of the RISCV codegen breakage before it trickles down to the project that uses IREE, e.g., https://opensecura.googlesource.com/sw/vec_iree and https://opensecura.googlesource.com/ml-models-public

ScottTodd added a commit that referenced this issue Feb 16, 2024
Progress on #16431.

The `build_e2e_test_artifacts` job produces artifacts used for
benchmarking but it has been also providing test coverage that the
programs in the benchmark suite can be compiled correctly. That test
coverage would be a better fit for a test job like `test_gpu` (ci.yml)
or `regression_test_cpu` (pkgci.yml). As it is now, that job is building
multiple very similar configurations and it then uploads those files to
the cloud (to the tune of 70-130GB per CI run).

This also adds back coverage that was missing from
`test_benchmark_suites`. Those tests should be migrated to another
workflow not connected to benchmarks so they can be ran and debugged in
isolation. This setup is too brittle and complex.
@ScottTodd
Copy link
Member Author

Slight adjustments to the plans from the initial issue description:

  • build_benchmark_tools, build_e2e_test_artifacts, and test_benchmark_suites are now opt-in and will only run when benchmarks are requested (including all postsubmit runs)
  • test_benchmark_suites has some useful coverage again, but those tests should be moved to a regular (non-benchmark) presubmit CI job - likely under experimental/regression_suite, but cross_compile_and_test in ci.yml could also be used in the short term
  • the benchmark jobs should still be moved out of ci.yml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleanup 🧹 infrastructure/benchmark Relating to benchmarking infrastructure infrastructure Relating to build systems, CI, or testing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants