optimize benchmark scripts for autoscaler, add more logs #356

kr11 · 2024-11-08T06:10:11Z

Pull Request Description

After optimizing some KPA configurations, KPA shows faster upscaling and better replica decisions, resulting in lower latency and higher throughput.

The main changes are:

Changed the resync period from 30s to 10s.
Fixed the KPA YAML config: the target value (50%) should be 0.5, not 50.
Added more detailed logs.

Related Issues

Resolves: #[Insert issue number(s)]

Important: Before submitting, please complete the description above and review the checklist below.

Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

[Bug]: Corrections to existing functionality
[CI]: Changes to build process or CI pipeline
[Docs]: Updates or additions to documentation
[API]: Modifications to aibrix's API or interface
[CLI]: Changes or additions to the Command Line Interface
[Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

PR title includes appropriate prefix(es)
Changes are clearly explained in the PR description
New and existing tests pass successfully
Code adheres to project style and best practices
Documentation updated to reflect changes (if applicable)
Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

Jeffwan

Since we deploy the autoscaler with other controllers, let's make sure we only print necessary logs, if you need additional debug logs, lets use right log level.

Jeffwan · 2024-11-08T17:43:53Z

benchmarks/autoscaling/benchmark.py

-)
+def setup_logging(log_filename, level=logging.INFO):
+    """
+    设置全局日志配置，日志将被写入指定的文件，并同时输出到控制台。


let's remove non English characters in github repo. the policy is a different bit different from our internal repo

Corrected it. thanks.

Jeffwan · 2024-11-08T17:44:06Z

benchmarks/autoscaling/benchmark.py

@@ -3,6 +3,7 @@
 import os
 import random
 import time
+from datetime import datetime


@happyandslow Can you help review the benchmark changes?

Jeffwan · 2024-11-08T17:44:39Z

benchmarks/autoscaling/kpa.yaml

@@ -11,8 +11,8 @@ spec:
    apiVersion: apps/v1
    kind: Deployment
    name: aibrix-model-deepseek-coder-7b-instruct
-  minReplicas: 1
+  minReplicas: 2


why do we change to 2 here?

@Jeffwan
it's to align with hpa.

When our experiment started, the initial replicas of the llm service were set to 2, and we would wait for 30 seconds before adding traffic. We found that KPA would quickly reduce the replica to 1 as beginning, but HPA would not immediately scale down even without traffic. This would lead to an unfair experiment.

Should we uniformly set the min-replica of HPA.yaml and KPA.yaml to 2? I will modify the hpa.yaml to keep it consistent with the kpa.yaml.

talked with @kr11 offline on this. We determined to change to 1.

Jeffwan · 2024-11-08T17:45:17Z

pkg/controller/podautoscaler/scaler/kpa.go

@@ -202,6 +202,14 @@ func (k *KpaAutoscaler) Scale(originalReadyPodsCount int, metricKey metrics.Name

 	isOverPanicThreshold := dppc/readyPodsCount >= spec.PanicThreshold

+	klog.InfoS("--- KPA Details", "readyPodsCount", readyPodsCount,


is this a debug message, if so, let's use V(4)

is this a debug message, if so, let's use V(4)

I have fixed it together with two other debug klogs.

* optimize workload scripts and result output * add more logs. resync period: 30->10, fix kpa.yaml * fix lint * add right klog level. unify min-replica in hpa.yaml and hpa.yaml * unify hpa and kpa yaml min-replica to 1

kr11 added 2 commits November 8, 2024 13:56

optimize workload scripts and result output

505f090

add more logs. resync period: 30->10, fix kpa.yaml

7975865

kr11 changed the title ~~optimize benchmark scripts for autoscaler~~ optimize benchmark scripts for autoscaler, add more logs Nov 8, 2024

fix lint

c06eb6a

kr11 requested a review from Jeffwan November 8, 2024 06:48

Jeffwan reviewed Nov 8, 2024

View reviewed changes

kr11 added 2 commits November 11, 2024 14:42

add right klog level. unify min-replica in hpa.yaml and hpa.yaml

b48c07e

unify hpa and kpa yaml min-replica to 1

107b497

Jeffwan approved these changes Nov 11, 2024

View reviewed changes

Jeffwan merged commit 7a45b60 into main Nov 11, 2024
9 checks passed

Jeffwan deleted the benchmark/autoscaling_bench_optimize branch November 11, 2024 07:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize benchmark scripts for autoscaler, add more logs #356

optimize benchmark scripts for autoscaler, add more logs #356

kr11 commented Nov 8, 2024 •

edited

Loading

Jeffwan left a comment

Jeffwan Nov 8, 2024

kr11 Nov 11, 2024

Jeffwan Nov 8, 2024

Jeffwan Nov 8, 2024

kr11 Nov 11, 2024

Jeffwan Nov 11, 2024

Jeffwan Nov 8, 2024

kr11 Nov 11, 2024

		@@ -202,6 +202,14 @@ func (k *KpaAutoscaler) Scale(originalReadyPodsCount int, metricKey metrics.Name

		isOverPanicThreshold := dppc/readyPodsCount >= spec.PanicThreshold

		klog.InfoS("--- KPA Details", "readyPodsCount", readyPodsCount,

optimize benchmark scripts for autoscaler, add more logs #356

optimize benchmark scripts for autoscaler, add more logs #356

Conversation

kr11 commented Nov 8, 2024 • edited Loading

Pull Request Description

Related Issues

Pull Request Title Format

Submission Checklist

Jeffwan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kr11 commented Nov 8, 2024 •

edited

Loading