Measure generate_time on iOS benchmark #8580

huydhn · 2025-02-19T20:47:35Z

Addresses the iOS part of #8576 (comment)

Let's see if this fix the issue

Testing

~~https://github.com/pytorch/executorch/actions/runs/13423283360~~ Another test https://github.com/pytorch/executorch/actions/runs/13442356702

cc @guangy10 @kirklandsign @shoumikhin

pytorch-bot · 2025-02-19T20:47:40Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8580

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit aa54be8 with merge base b6ffe1a ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / macos / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

guangy10 · 2025-02-20T18:52:58Z

.github/scripts/extract_benchmark_results.py

+        if metric_name == "Clock Monotonic Time, s":
+            benchmark_result["metric"] = "generate_time(ms)"
+            benchmark_result["actualValue"] = metric_value * 1000
+
+        elif metric_name == "Tokens Per Second, t/s":
+            benchmark_result["metric"] = "token_per_sec"
+            benchmark_result["actualValue"] = metric_value


Seems like they are measuring same thing, we won't need both if that's the case. Left a comment in #8576 (comment) for @shoumikhin to clarify.

guangy10 · 2025-02-20T18:56:06Z

.github/scripts/extract_benchmark_results.py

+    elif method == "generate":
+        if metric_name == "Clock Monotonic Time, s":
+            benchmark_result["metric"] = "generate_time(ms)"
+            benchmark_result["actualValue"] = metric_value * 1000


I think in code, TPS and generate_time are calculated independently. That's why I see some fields report regression only on TPS but not on generate_time. See the screenshot below. @shoumikhin Should we consolidate the measurement (use the one that is more reliable seems like generate_time?) and only report one metric instead I'd prefer TPS as it's more human readable.

Oh, I think this is the bug is about, the current generate_time is actually avg_inference_latency from forward. After this change fixes this issue, we can consider keeping just TPS I guess

@huydhn Do you want to remove/hide generate_time from the dashboard in a follow-up PR? The Android side needs to be corrected first I guess before it. cc: @kirklandsign

Yup, the PR for that is ready here pytorch/test-infra#6314

guangy10

Left a few comment inline. Idk Anthony would have time to take a look as he is OOO this week. @huydhn if you agree with my comment, we can go ahead fix the confusion now, and open to make further adjustment on feedback

guangy10 · 2025-02-20T19:18:29Z

The title should be measure "avg_inference_time" right?

guangy10 · 2025-02-20T19:19:09Z

@kirklandsign we should make the same fix on Android side as well

huydhn · 2025-02-20T22:36:15Z

The title should be measure "avg_inference_time" right?

Nah, it measure generate_time. What is called generate_time atm is actually avg_inference_time

Measure generate_time on iOS benchmark

aa54be8

huydhn requested review from shoumikhin and guangy10 February 19, 2025 20:47

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 19, 2025

huydhn added module: benchmark Issues related to the benchmark infrastructure topic: not user facing labels Feb 19, 2025

huydhn had a problem deploying to upload-benchmark-results February 19, 2025 23:09 — with GitHub Actions Failure

shoumikhin approved these changes Feb 20, 2025

View reviewed changes

guangy10 reviewed Feb 20, 2025

View reviewed changes

guangy10 approved these changes Feb 20, 2025

View reviewed changes

huydhn temporarily deployed to upload-benchmark-results February 20, 2025 19:59 — with GitHub Actions Inactive

huydhn merged commit fc5a492 into main Feb 20, 2025
67 of 72 checks passed

huydhn deleted the fix-ios-generate-time-benchmark-metric branch February 20, 2025 22:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Measure generate_time on iOS benchmark #8580

Measure generate_time on iOS benchmark #8580

huydhn commented Feb 19, 2025 •

edited

Loading

pytorch-bot bot commented Feb 19, 2025 •

edited

Loading

guangy10 Feb 20, 2025

guangy10 Feb 20, 2025 •

edited

Loading

huydhn Feb 20, 2025

guangy10 Feb 21, 2025

huydhn Feb 21, 2025

guangy10 left a comment

guangy10 commented Feb 20, 2025

guangy10 commented Feb 20, 2025

huydhn commented Feb 20, 2025

Measure generate_time on iOS benchmark #8580

Measure generate_time on iOS benchmark #8580

Conversation

huydhn commented Feb 19, 2025 • edited Loading

Testing

pytorch-bot bot commented Feb 19, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8580

✅ You can merge normally! (1 Unrelated Failure)

guangy10 Feb 20, 2025

Choose a reason for hiding this comment

guangy10 Feb 20, 2025 • edited Loading

Choose a reason for hiding this comment

huydhn Feb 20, 2025

Choose a reason for hiding this comment

guangy10 Feb 21, 2025

Choose a reason for hiding this comment

huydhn Feb 21, 2025

Choose a reason for hiding this comment

guangy10 left a comment

Choose a reason for hiding this comment

guangy10 commented Feb 20, 2025

guangy10 commented Feb 20, 2025

huydhn commented Feb 20, 2025

huydhn commented Feb 19, 2025 •

edited

Loading

pytorch-bot bot commented Feb 19, 2025 •

edited

Loading

guangy10 Feb 20, 2025 •

edited

Loading