Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce complexity and cloud storage needs across benchmarking workflows #16396

Closed
ScottTodd opened this issue Feb 13, 2024 · 0 comments · Fixed by #18144
Closed

Reduce complexity and cloud storage needs across benchmarking workflows #16396

ScottTodd opened this issue Feb 13, 2024 · 0 comments · Fixed by #18144
Labels
cleanup 🧹 infrastructure/benchmark Relating to benchmarking infrastructure infrastructure Relating to build systems, CI, or testing

Comments

@ScottTodd
Copy link
Member

I spotted some low hanging fruit here: https://discord.com/channels/689900678990135345/1166024193599615006/1207085419959947294 and here: https://groups.google.com/g/iree-discuss/c/uy0L4Vdl3hs/m/YLe0iLCGAAAJ.

  • The process_benchmark_results step takes around 2 minutes to download a mysterious iree-oss/benchmark-report Dockerfile. The generation scripts only need a few Python deps (markdown_strings, requests), so they could just pip install what they need directly.
  • Benchmark execution jobs are spending upwards of 30 seconds checking out runtime submodules. They likely don't need any submodules at all.
  • The compilation_benchmarks job could be folded into build_e2e_test_artifacts. Then, we wouldn't need to upload and store large compile-stats/module.vmfb files or spend 30 seconds + downloading those files for 0-2 seconds of statistics aggregation and uploading. If there are issues (for whatever reason) with the build machine not having the right setup for uploading to the dashboard server, we could pass results via workflow artifacts. No need to send 50-100GB of (what should be) transient files over the network.
@ScottTodd ScottTodd added infrastructure Relating to build systems, CI, or testing cleanup 🧹 infrastructure/benchmark Relating to benchmarking infrastructure labels Feb 13, 2024
ScottTodd added a commit that referenced this issue May 14, 2024
This drops support for capturing traces as part of CI benchmarking to
fix #16856. This PR is a synced
and updated version of #16857.

While traces are invaluable in analyzing performance,
* This implementation is difficult to maintain (updating IREE's Tracy
version can't be performed without also building for multiple operating
systems and uploading binary files to a cloud bucket with limited
permissions)
* Trace collection nearly doubles CI time for every benchmark run (i.e.
10m->20m or 20m->40m), leading to occasionally long queueing (2h+)
* Trace collection contributes to cloud storage and network usage (there
are larger offenders: #16396, but
we still need to trim costs)

---------

Co-authored-by: Benoit Jacob <jacob.benoit.1@gmail.com>
bangtianliu pushed a commit to bangtianliu/iree that referenced this issue Jun 5, 2024
This drops support for capturing traces as part of CI benchmarking to
fix iree-org#16856. This PR is a synced
and updated version of iree-org#16857.

While traces are invaluable in analyzing performance,
* This implementation is difficult to maintain (updating IREE's Tracy
version can't be performed without also building for multiple operating
systems and uploading binary files to a cloud bucket with limited
permissions)
* Trace collection nearly doubles CI time for every benchmark run (i.e.
10m->20m or 20m->40m), leading to occasionally long queueing (2h+)
* Trace collection contributes to cloud storage and network usage (there
are larger offenders: iree-org#16396, but
we still need to trim costs)

---------

Co-authored-by: Benoit Jacob <jacob.benoit.1@gmail.com>
LLITCHEV pushed a commit to LLITCHEV/iree that referenced this issue Jul 30, 2024
This drops support for capturing traces as part of CI benchmarking to
fix iree-org#16856. This PR is a synced
and updated version of iree-org#16857.

While traces are invaluable in analyzing performance,
* This implementation is difficult to maintain (updating IREE's Tracy
version can't be performed without also building for multiple operating
systems and uploading binary files to a cloud bucket with limited
permissions)
* Trace collection nearly doubles CI time for every benchmark run (i.e.
10m->20m or 20m->40m), leading to occasionally long queueing (2h+)
* Trace collection contributes to cloud storage and network usage (there
are larger offenders: iree-org#16396, but
we still need to trim costs)

---------

Co-authored-by: Benoit Jacob <jacob.benoit.1@gmail.com>
Signed-off-by: Lubo Litchev <lubol@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleanup 🧹 infrastructure/benchmark Relating to benchmarking infrastructure infrastructure Relating to build systems, CI, or testing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant