Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measure generate_time on iOS benchmark #8580

Merged
merged 1 commit into from
Feb 20, 2025

Conversation

huydhn
Copy link
Contributor

@huydhn huydhn commented Feb 19, 2025

Copy link

pytorch-bot bot commented Feb 19, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8580

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit aa54be8 with merge base b6ffe1a (image):

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 19, 2025
@huydhn huydhn added module: benchmark Issues related to the benchmark infrastructure topic: not user facing labels Feb 19, 2025
@huydhn huydhn had a problem deploying to upload-benchmark-results February 19, 2025 23:09 — with GitHub Actions Failure
Comment on lines +241 to +247
if metric_name == "Clock Monotonic Time, s":
benchmark_result["metric"] = "generate_time(ms)"
benchmark_result["actualValue"] = metric_value * 1000

elif metric_name == "Tokens Per Second, t/s":
benchmark_result["metric"] = "token_per_sec"
benchmark_result["actualValue"] = metric_value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like they are measuring same thing, we won't need both if that's the case. Left a comment in #8576 (comment) for @shoumikhin to clarify.

elif method == "generate":
if metric_name == "Clock Monotonic Time, s":
benchmark_result["metric"] = "generate_time(ms)"
benchmark_result["actualValue"] = metric_value * 1000
Copy link
Contributor

@guangy10 guangy10 Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in code, TPS and generate_time are calculated independently. That's why I see some fields report regression only on TPS but not on generate_time. See the screenshot below. @shoumikhin Should we consolidate the measurement (use the one that is more reliable seems like generate_time?) and only report one metric instead I'd prefer TPS as it's more human readable.

Screenshot 2025-02-20 at 10 54 28 AM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think this is the bug is about, the current generate_time is actually avg_inference_latency from forward. After this change fixes this issue, we can consider keeping just TPS I guess

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huydhn Do you want to remove/hide generate_time from the dashboard in a follow-up PR? The Android side needs to be corrected first I guess before it. cc: @kirklandsign

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, the PR for that is ready here pytorch/test-infra#6314

Copy link
Contributor

@guangy10 guangy10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comment inline. Idk Anthony would have time to take a look as he is OOO this week. @huydhn if you agree with my comment, we can go ahead fix the confusion now, and open to make further adjustment on feedback

@guangy10
Copy link
Contributor

The title should be measure "avg_inference_time" right?

@guangy10
Copy link
Contributor

@kirklandsign we should make the same fix on Android side as well

@huydhn huydhn temporarily deployed to upload-benchmark-results February 20, 2025 19:59 — with GitHub Actions Inactive
@huydhn
Copy link
Contributor Author

huydhn commented Feb 20, 2025

The title should be measure "avg_inference_time" right?

Nah, it measure generate_time. What is called generate_time atm is actually avg_inference_time

@huydhn huydhn merged commit fc5a492 into main Feb 20, 2025
67 of 72 checks passed
@huydhn huydhn deleted the fix-ios-generate-time-benchmark-metric branch February 20, 2025 22:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: benchmark Issues related to the benchmark infrastructure topic: not user facing
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants