Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync main branch changes to release-0.1 branch #375

Merged
merged 42 commits into from
Nov 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
b541e42
Update manifests version to v0.1.0-rc.3 (#287)
Jeffwan Oct 9, 2024
8b8d120
[Misc] Add sync images step and scripts in release process (#283)
Jeffwan Oct 9, 2024
2f32a01
[batch] E2E works with driver and request proxy (#272)
xinchen384 Oct 10, 2024
8a067d2
Fix address already in use when AIRuntime start in pod (#289)
brosoul Oct 11, 2024
7fd4be4
Read model name from request body (#290)
varungup90 Oct 11, 2024
9bdf772
Fix redis bootstrap flaky connection issue (#293)
varungup90 Oct 14, 2024
f8ff7f6
skip docs CI if no changes in /docs dir (#294)
varungup90 Oct 14, 2024
d2fd044
Improve Rayclusterreplicaset Status (#295)
Yicheng-Lu-llll Oct 14, 2024
73f91c0
Add request trace for profiling (#291)
varungup90 Oct 15, 2024
5aad38c
Update the crd definiton due to runtime upgrade (#298)
Jeffwan Oct 15, 2024
64a0d7f
Push images to Github registry in release pipeline (#301)
Jeffwan Oct 17, 2024
f19f5d8
Build autoscaler abstractions like fetcher, client and scaler (#300)
Jeffwan Oct 17, 2024
d753c88
Support pod autoscaler periodically check (#306)
Jeffwan Oct 20, 2024
a4eb7a9
Add timeout in nc check for redis bootstrap (#309)
varungup90 Oct 22, 2024
77cfee2
Refactor AutoScaler: metricClient, context, reconcile (#308)
kr11 Oct 22, 2024
5d8d843
Cut v0.1.0-rc.4 release (#314)
Jeffwan Oct 22, 2024
75e5cfc
[doc] update runtime readme (#318)
brosoul Oct 25, 2024
4d756aa
Add env for routing strategy override (#323)
varungup90 Oct 26, 2024
2fd50cd
Fix pod autoscaler enqueue issues (#329)
Jeffwan Oct 27, 2024
ea5dc77
Autoscaling benchmark (#337)
kr11 Oct 28, 2024
6fda762
Initial lora benchmark result (#321)
Jeffwan Oct 28, 2024
19c9a10
Adding plotting script (#338)
happyandslow Oct 28, 2024
3ce4659
Update the downloader performance plot (#341)
Jeffwan Oct 29, 2024
43b989f
Reduce pod metrics refresh interval (#343)
varungup90 Oct 29, 2024
a1f3117
Enable ipv6 for envoy proxy (#342)
varungup90 Oct 29, 2024
d5f8e8d
Add benchmark scrips for gateway client side changes (#340)
Jeffwan Oct 31, 2024
73a49be
Update the plots based on feedback (#346)
Jeffwan Oct 31, 2024
33e21d0
[batch] use volcano TOS as batch storage (#344)
xinchen384 Nov 5, 2024
89cafe1
Add check if no pods are present (#345)
varungup90 Nov 5, 2024
65d3e56
Add model exists check (#353)
varungup90 Nov 7, 2024
aa16fa9
[Misc] Disable fastapi docs in runtime default action (#350)
brosoul Nov 7, 2024
106992f
Add check for acceptable routing strategies (#352)
varungup90 Nov 7, 2024
32c3a8a
optimize PA messages: const 'HPA' -> actual pa type (#354)
kr11 Nov 8, 2024
84bb220
[Misc] Runtime server startup with args (#355)
brosoul Nov 8, 2024
65b74ed
[Misc] Add python format script (#357)
brosoul Nov 8, 2024
7a45b60
Optimize benchmark scripts for autoscaler, add more logs (#356)
kr11 Nov 11, 2024
aa5edec
Update the mocked app to cleaner state (#361)
Jeffwan Nov 11, 2024
19a6093
Update manifests & docs about service httproute naming trick (#362)
Jeffwan Nov 11, 2024
8364605
Add reference grant to support httprouting for different namespace (#…
varungup90 Nov 11, 2024
ec8e4f7
Validate routing strategy bug fix (#364)
varungup90 Nov 11, 2024
fa3176e
Bug fix for setting routing strategy via env var (#369)
varungup90 Nov 11, 2024
2e0179c
Improve the routing env value & flag retrieval (#373)
Jeffwan Nov 12, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/workflows/docker-build-images.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@ on:

jobs:
build:
# This prevents the job from running as other steps cover its functionality.
# We use 'if: false' to keep the file for future reference without deleting it.
if: false
runs-on: ubuntu-latest
steps:
- name: Check out code
Expand Down
22 changes: 20 additions & 2 deletions .github/workflows/release-build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,16 +24,34 @@ jobs:
username: ${{ secrets.DOCKER_HUB_USERNAME }}
password: ${{ secrets.DOCKER_HUB_ACCESS_TOKEN }}

# Build container images
# Log in to Github Registry
- name: Login to the Container registry
uses: docker/login-action@65b78e6e13532edd9afa3aa52ac7964289d1a9c1
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

# Build container images with docker registry namespace
- name: Build Container Images
run: |
GIT_COMMIT_HASH=${{ github.ref_name }} make docker-build-all

# Push container image to container registry
# Push container image to DockerHub
- name: Push container image to container registry
run: |
GIT_COMMIT_HASH=${{ github.ref_name }} make docker-push-all

# Build container images with Github registry namespace
- name: Build Container Images with Github Container Registry prefix
run: |
GIT_COMMIT_HASH=${{ github.ref_name }} AIBRIX_CONTAINER_REGISTRY_NAMESPACE=ghcr.io/aibrix make docker-build-all

# Push container image to Github container registry
- name: Push Container Images to Github Container Registry
run: |
GIT_COMMIT_HASH=${{ github.ref_name }} AIBRIX_CONTAINER_REGISTRY_NAMESPACE=ghcr.io/aibrix make docker-push-all

python-wheel-release:
runs-on: ubuntu-latest
strategy:
Expand Down
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,8 @@ __pycache__
docs/build/
!**/*.template.rst


# benchmark logs, result and figs
benchmarks/autoscaling/logs
benchmarks/autoscaling/output_stats
benchmarks/autoscaling/workload_plot
13 changes: 13 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,19 @@ build:
os: ubuntu-22.04
tools:
python: "3.10"
jobs:
post_checkout:
# Cancel building pull requests when there aren't changed in the docs directory or YAML file.
# You can add any other files or directories that you'd like here as well,
# like your docs requirements file, or other files that will change your docs build.
#
# If there are no changes (git diff exits with 0) we force the command to return with 183.
# This is a special exit code on Read the Docs that will cancel the build immediately.
- |
if [ "$READTHEDOCS_VERSION_TYPE" = "external" ] && git diff --quiet origin/main -- docs/ .readthedocs.yaml;
then
exit 183;
fi

# Build documentation in the "docs/" directory with Sphinx
sphinx:
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,10 +34,10 @@ kubectl create -k config/default
Install stable distribution
```shell
# Install component dependencies
kubectl create -k "github.com/aibrix/aibrix/config/dependency?ref=v0.1.0-rc.1"
kubectl create -k "github.com/aibrix/aibrix/config/dependency?ref=v0.1.0-rc.4"

# Install aibrix components
kubectl create -k "github.com/aibrix/aibrix/config/default?ref=v0.1.0-rc.1"
kubectl create -k "github.com/aibrix/aibrix/config/default?ref=v0.1.0-rc.4"
```

## Documentation
Expand Down
174 changes: 174 additions & 0 deletions benchmarks/autoscaling/7b.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
model.aibrix.ai/name: deepseek-coder-7b-instruct
model.aibrix.ai/port: "8000"
name: aibrix-model-deepseek-coder-7b-instruct
namespace: default
spec:
replicas: 1
selector:
matchLabels:
model.aibrix.ai/name: deepseek-coder-7b-instruct
strategy:
type: Recreate
template:
metadata:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8000"
prometheus.io/path: "/metrics"
labels:
model.aibrix.ai/name: deepseek-coder-7b-instruct
spec:
containers:
- command:
- python3
- -m
- vllm.entrypoints.openai.api_server
- --host
- "0.0.0.0"
- --port
- "8000"
- --model
- /models/deepseek-coder-6.7b-instruct
- --served-model-name
- deepseek-coder-7b-instruct
- --trust-remote-code
- --max-model-len
- "10240"
- --api-key
- sk-kFJ12nKsFVfVmGpj3QzX65s4RbN2xJqWzPYCjYu7wT3BlbLi
image: aibrix-container-registry-cn-beijing.cr.volces.com/aibrix/vllm-openai:v0.6.2-distributed
imagePullPolicy: Always
livenessProbe:
failureThreshold: 3
httpGet:
path: /health
port: 8000
scheme: HTTP
initialDelaySeconds: 90
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 1
name: vllm-openai
ports:
- containerPort: 8000
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /health
port: 8000
scheme: HTTP
initialDelaySeconds: 90
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
nvidia.com/gpu: "1"
requests:
nvidia.com/gpu: "1"
# We need to use dataset cache
volumeMounts:
- mountPath: /models
name: model-hostpath
- name: dshm
mountPath: /dev/shm
- name: aibrix-runtime
image: aibrix-container-registry-cn-beijing.cr.volces.com/aibrix/runtime:v0.1.0-rc.4
command:
- gunicorn
- -b
- :8080
- app:app
- -k
- uvicorn.workers.UvicornWorker
ports:
- containerPort: 8080
protocol: TCP
volumeMounts:
- mountPath: /models
name: model-hostpath
initContainers:
- name: init-model
image: aibrix-container-registry-cn-beijing.cr.volces.com/aibrix/runtime:v0.1.0-rc.4
command:
- python
- -m
- aibrix.downloader
- --model-uri
- tos://aibrix-artifact-testing/models/deepseek-ai/deepseek-coder-6.7b-instruct/
- --local-dir
- /models/
env:
- name: DOWNLOADER_MODEL_NAME
value: deepseek-coder-6.7b-instruct
- name: DOWNLOADER_NUM_THREADS
value: "16"
- name: DOWNLOADER_ALLOW_FILE_SUFFIX
value: json, safetensors
- name: TOS_ACCESS_KEY
valueFrom:
secretKeyRef:
name: tos-credential
key: TOS_ACCESS_KEY
- name: TOS_SECRET_KEY
valueFrom:
secretKeyRef:
name: tos-credential
key: TOS_SECRET_KEY
- name: TOS_ENDPOINT
value: tos-cn-beijing.ivolces.com
- name: TOS_REGION
value: cn-beijing
volumeMounts:
- mountPath: /models
name: model-hostpath
volumes:
- name: model-hostpath
hostPath:
path: /root/models
type: DirectoryOrCreate
- name: dshm
emptyDir:
medium: Memory
sizeLimit: "4Gi"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: machine.cluster.vke.volcengine.com/gpu-name
operator: In
values:
- NVIDIA-A10

---

apiVersion: v1
kind: Service
metadata:
labels:
model.aibrix.ai/name: deepseek-coder-7b-instruct
prometheus-discovery: "true"
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8000"
name: deepseek-coder-7b-instruct
namespace: default
spec:
ports:
- name: serve
port: 8000
protocol: TCP
targetPort: 8000
- name: http
port: 8080
protocol: TCP
targetPort: 8080
selector:
model.aibrix.ai/name: deepseek-coder-7b-instruct
type: LoadBalancer
18 changes: 18 additions & 0 deletions benchmarks/autoscaling/apa.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
apiVersion: autoscaling.aibrix.ai/v1alpha1
kind: PodAutoscaler
metadata:
name: deepseek-coder-7b-instruct-apa
labels:
app.kubernetes.io/name: aibrix
app.kubernetes.io/managed-by: kustomize
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: aibrix-model-deepseek-coder-7b-instruct
minReplicas: 1
maxReplicas: 10
targetMetric: "vllm:gpu_cache_usage_perc"
targetValue: "50"
scalingStrategy: "APA"
Loading
Loading