-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add request routers - least kv cache, least expected latency #543
Conversation
* add scheduler metrics * add metrics into mock app * refact CacheUsagePerc of CPU and GPU * add instance label into promQL
Change-Id: Icc2a017cb2db445fb760ced2c0034a65f9b37fa8
Change-Id: I36a0f54ca1c8a3c16b89c0077df77a119440bed3
e1e2fee
to
ec2cff9
Compare
pkg/metrics/metrics.go
Outdated
MetricType: MetricType{ | ||
Query: PromQL, | ||
}, | ||
PromQL: `increase(vllm:request_prompt_tokens_sum{instance="${instance}", model_name="${model_name}", job="pods"}[1d]) / increase(vllm:request_prompt_tokens_count{instance="${instance}", model_name="${model_name}", job="pods"}[1d])`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how was 1d
decided?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metrics AvgPromptToksPerReq
and AvgGenerationToksPerReq
are used for token length prediction, which should be stable and reflect the overall distribution, so we chose a long time window of 1d
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense
cntGeneration += 1 | ||
} | ||
} | ||
guessPromptTokens := 10.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious about the 10.0 and following 100.0. are these numbers meaningful or just for initializaiton?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just for initialization, need experiments to be tuned.
continue | ||
} | ||
|
||
busyTimeRatio, err := r.cache.GetPodMetric(pod.Name, "gpu_busy_time_ratio") // todo: replace mock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what metric do you expect here? GPU utilization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, GPU utilization measured by busy time ratio. We've discussed with @brosoul and this metric will be added later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good. for metrics not that mature, we can add some // TODOs
and do not enable the policy in the initialization, which means we can still merge the PR without waiting.
b4d67ba
to
22e0fa6
Compare
* Add random adapter scheduler * Add leastExpectedLatency request router * Add least latency scheduler * Add least kv cache router * Add bin packing scheduler (first-fit as examole) * Add least utilization scheduler (RPM, TPM, kv_cache, busy_time as utilization) * Add least busy time (or least gpu utilization) router * Add weighted round robin router * Add metrics that scheduling needed (#486) * add scheduler metrics * add metrics into mock app * refact CacheUsagePerc of CPU and GPU * add instance label into promQL * 适配metrics接口 Change-Id: Icc2a017cb2db445fb760ced2c0034a65f9b37fa8 * add .vscode to gitignore Change-Id: I36a0f54ca1c8a3c16b89c0077df77a119440bed3 * fix mock cpu_cache_usage_perc metrics * feat: add least kv cache into route strategy * add 2 new routers * rm stateful router: weighted round robin * rm scheduler changes --------- Co-authored-by: chenbinbin <chenbinbin.1996@bytedance.com> Co-authored-by: chenzuzhi <chenzuzhi@bytedance.com> Co-authored-by: brosoul <brosoul@126.com>
Pull Request Description
Related Issues
Resolves: #303
Important: Before submitting, please complete the description above and review the checklist below.
Contribution Guidelines (Expand for Details)
We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:
Pull Request Title Format
Your PR title should start with one of these prefixes to indicate the nature of the change:
[Bug]
: Corrections to existing functionality[CI]
: Changes to build process or CI pipeline[Docs]
: Updates or additions to documentation[API]
: Modifications to aibrix's API or interface[CLI]
: Changes or additions to the Command Line Interface[Misc]
: For changes not covered above (use sparingly)Note: For changes spanning multiple categories, use multiple prefixes in order of importance.
Submission Checklist
By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.