-
Notifications
You must be signed in to change notification settings - Fork 452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Measure generate_time on iOS benchmark #8580
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8580
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit aa54be8 with merge base b6ffe1a ( BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
if metric_name == "Clock Monotonic Time, s": | ||
benchmark_result["metric"] = "generate_time(ms)" | ||
benchmark_result["actualValue"] = metric_value * 1000 | ||
|
||
elif metric_name == "Tokens Per Second, t/s": | ||
benchmark_result["metric"] = "token_per_sec" | ||
benchmark_result["actualValue"] = metric_value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like they are measuring same thing, we won't need both if that's the case. Left a comment in #8576 (comment) for @shoumikhin to clarify.
elif method == "generate": | ||
if metric_name == "Clock Monotonic Time, s": | ||
benchmark_result["metric"] = "generate_time(ms)" | ||
benchmark_result["actualValue"] = metric_value * 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in code, TPS
and generate_time
are calculated independently. That's why I see some fields report regression only on TPS
but not on generate_time
. See the screenshot below. @shoumikhin Should we consolidate the measurement (use the one that is more reliable seems like generate_time
?) and only report one metric instead I'd prefer TPS as it's more human readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I think this is the bug is about, the current generate_time
is actually avg_inference_latency
from forward
. After this change fixes this issue, we can consider keeping just TPS I guess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@huydhn Do you want to remove/hide generate_time
from the dashboard in a follow-up PR? The Android side needs to be corrected first I guess before it. cc: @kirklandsign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, the PR for that is ready here pytorch/test-infra#6314
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comment inline. Idk Anthony would have time to take a look as he is OOO this week. @huydhn if you agree with my comment, we can go ahead fix the confusion now, and open to make further adjustment on feedback
The title should be measure "avg_inference_time" right? |
@kirklandsign we should make the same fix on Android side as well |
Nah, it measure |
Addresses the iOS part of #8576 (comment)
Let's see if this fix the issue
Testing
https://github.com/pytorch/executorch/actions/runs/13423283360Another test https://github.com/pytorch/executorch/actions/runs/13442356702cc @guangy10 @kirklandsign @shoumikhin