docs: update readme with new arch figure (#555)

This change slightly revises the main readme to add tuning related context when explaining the Kaito project. Also, the arch figure is updated slightly to be more specific about every subcomponent.
kaito-project · Aug 12, 2024 · 370007f · 370007f
1 parent 9ec071e
commit 370007f
Show file tree

Hide file tree

Showing 2 changed files with 6 additions and 7 deletions.
diff --git a/README.md b/README.md
@@ -10,29 +10,28 @@
 | Latest Release: July 12th, 2024. Kaito v0.3.0.  |
 | First Release: Nov 15th, 2023. Kaito v0.1.0.    |
 
-Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster.
-The target models are popular large open-sourced inference models such as [falcon](https://huggingface.co/tiiuae) and [llama2](https://github.com/facebookresearch/llama).
+Kaito is an operator that automates the AI/ML model inference or tuning workload in a Kubernetes cluster.
+The target models are popular open-sourced large models such as [falcon](https://huggingface.co/tiiuae) and [phi-3](https://huggingface.co/docs/transformers/main/en/model_doc/phi3).
 Kaito has the following key differentiations compared to most of the mainstream model deployment methodologies built on top of virtual machine infrastructures:
 
 - Manage large model files using container images. A http server is provided to perform inference calls using the model library.
-- Avoid tuning deployment parameters to fit GPU hardware by providing preset configurations.
+- Provide preset configurations to avoid adjusting workload parameters based on GPU hardware.
 - Auto-provision GPU nodes based on model requirements.
 - Host large model images in the public Microsoft Container Registry (MCR) if the license allows.
 
 Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
 
-
 ## Architecture
 
-Kaito follows the classic Kubernetes Custom Resource Definition(CRD)/controller design pattern. User manages a `workspace` custom resource which describes the GPU requirements and the inference specification. Kaito controllers will automate the deployment by reconciling the `workspace` custom resource.
+Kaito follows the classic Kubernetes Custom Resource Definition(CRD)/controller design pattern. User manages a `workspace` custom resource which describes the GPU requirements and the inference or tuning specification. Kaito controllers will automate the deployment by reconciling the `workspace` custom resource.
 <div align="left">
   <img src="docs/img/arch.png" width=80% title="Kaito architecture" alt="Kaito architecture">
 </div>
 
 The above figure presents the Kaito architecture overview. Its major components consist of:
 
-- **Workspace controller**: It reconciles the `workspace` custom resource, creates `machine` (explained below) custom resources to trigger node auto provisioning, and creates the inference workload (`deployment` or `statefulset`) based on the model preset configurations.
-- **Node provisioner controller**: The controller's name is *gpu-provisioner* in [gpu-provisioner helm chart](https://github.com/Azure/gpu-provisioner/tree/main/charts/gpu-provisioner). It uses the `machine` CRD originated from [Karpenter](https://sigs.k8s.io/karpenter) to interact with the workspace controller. It integrates with Azure Kubernetes Service(AKS) APIs to add new GPU nodes to the AKS cluster. 
+- **Workspace controller**: It reconciles the `workspace` custom resource, creates `machine` (explained below) custom resources to trigger node auto provisioning, and creates the inference or tuning workload (`deployment`, `statefulset` or `job`) based on the model preset configurations.
+- **Node provisioner controller**: The controller's name is *gpu-provisioner* in [gpu-provisioner helm chart](https://github.com/Azure/gpu-provisioner/tree/main/charts/gpu-provisioner). It uses the `machine` CRD originated from [Karpenter](https://sigs.k8s.io/karpenter) to interact with the workspace controller. It integrates with Azure Resource Manager REST APIs to add new GPU nodes to the AKS cluster.
 > Note: The [*gpu-provisioner*](https://github.com/Azure/gpu-provisioner) is an open sourced component. It can be replaced by other controllers if they support [Karpenter-core](https://sigs.k8s.io/karpenter) APIs.
 
 ## Installation

diff --git a/docs/img/arch.png b/docs/img/arch.png