feat: Options for Building and Running Private/Custom Models (#598)

**Reason for Change**: This PR provides a README and example deployment for running a private/custom model. These are models not currently in [`supported_models.yaml`](https://github.com/Azure/kaito/blob/main/presets/models/supported_models.yaml) PR also introduced new package `https://github.com/Azure/kaito/pkgs/container/kaito%2Fllm-reference-preset` Resolves: #594 --------- Signed-off-by: ishaansehgal99 <ishaanforthewin@gmail.com>
kaito-project · Sep 18, 2024 · f3d6e09 · f3d6e09
1 parent bcc0276
commit f3d6e09
Show file tree

Hide file tree

Showing 5 changed files with 188 additions and 0 deletions.
diff --git a/Makefile b/Makefile
@@ -220,6 +220,15 @@ docker-build-dataset: docker-buildx
 		--pull \
 		--tag $(REGISTRY)/e2e-dataset2:0.0.1 .
 
+.PHONY: docker-build-llm-reference-preset
+docker-build-llm-reference-preset: docker-buildx
+	docker buildx build \
+		-t ghcr.io/azure/kaito/llm-reference-preset:$(VERSION) \
+		-t ghcr.io/azure/kaito/llm-reference-preset:latest \
+		-f docs/custom-model-integration/Dockerfile.reference \
+		--build-arg MODEL_TYPE=text-generation \
+		--build-arg VERSION=$(VERSION) .
+
 ## --------------------------------------
 ## Kaito Installation
 ## --------------------------------------

diff --git a/docs/custom-model-integration/Dockerfile.reference b/docs/custom-model-integration/Dockerfile.reference
@@ -0,0 +1,22 @@
+FROM python:3.10-slim@sha256:684b1aaf96a7942b3c3af438d162e0baa3510aa7af25ad76d238e0c746bdec79
+
+# Specify the repository source URL for reference and access in Kaito packages.
+LABEL org.opencontainers.image.source=https://github.com/azure/kaito
+
+ARG MODEL_TYPE
+ARG VERSION
+
+# Set the working directory
+WORKDIR /workspace/tfs
+
+# Write the version to a file
+RUN echo $VERSION > /workspace/tfs/version.txt
+
+# First, copy just the preset files and install dependencies
+# This is done before copying the code to utilize Docker's layer caching and
+# avoid reinstalling dependencies unless the requirements file changes.
+# Inference
+COPY presets/inference/${MODEL_TYPE}/requirements.txt /workspace/tfs/inference-requirements.txt
+RUN pip install --no-cache-dir -r inference-requirements.txt
+
+COPY presets/inference/${MODEL_TYPE}/inference_api.py /workspace/tfs/inference_api.py
diff --git a/docs/custom-model-integration/custom-deployment-template.yaml b/docs/custom-model-integration/custom-deployment-template.yaml
@@ -0,0 +1,36 @@
+apiVersion: kaito.sh/v1alpha1
+kind: Workspace
+metadata:
+  name: workspace-custom-llm
+resource:
+  instanceType: "Standard_NC12s_v3" # Replace with the required VM SKU based on model requirements
+  labelSelector:
+    matchLabels:
+      apps: custom-llm
+inference:
+  template: 
+    spec:
+      containers:
+      - name: custom-llm-container
+        image: modelsregistry.azurecr.io/custom-llm:0.0.1 # Replace with the actual image name
+        command: ["accelerate"]
+        args:
+          - "launch"
+          - "--num_processes"
+          - "1"
+          - "--num_machines"
+          - "1"
+          - "--gpu_ids"
+          - "all"
+          - "inference_api.py"
+          - "--pipeline"
+          - "text-generation"
+          - "--torch_dtype"
+          - "float16"  # Set to "float16" for compatibility with V100 GPUs; use "bfloat16" for A100, H100 or newer GPUs
+        volumeMounts:
+        - name: dshm
+          mountPath: /dev/shm
+      volumes:
+      - name: dshm
+        emptyDir:
+          medium: Memory
diff --git a/docs/custom-model-integration/custom-model-integration-guide.md b/docs/custom-model-integration/custom-model-integration-guide.md
@@ -0,0 +1,81 @@
+# Custom Model Integration Guide
+
+## Option 1: Use Pre-Built Docker Image Without Model Weights
+If you want to avoid building a Docker image with model weights, use our pre-built reference image (`ghcr.io/azure/kaito/llm-reference-preset:latest`). This image, built with [Dockerfile.reference](./Dockerfile.reference), dynamically downloads model weights from HuggingFace at runtime, reducing the need to create and maintain custom images.
+
+
+- **[Sample Deployment YAML](./reference-image-deployment.yaml)**
+
+
+## Option 2: Build a Custom Docker Image with Model Weights
+
+### Step 1: Clone the Repository
+
+```sh
+git clone https://github.com/Azure/kaito.git
+```
+
+### Step 2: Download Your Private/Custom Model Weights
+
+For example, assuming HuggingFace weights:
+```sh
+git lfs install
+git clone git@hf.co:<MODEL_ID>  # Example: git clone git@hf.co:bigscience/bloom
+# OR
+git clone https://huggingface.co/bigscience/bloom
+```
+
+Alternatively, use curl:
+```
+curl -sSL https://huggingface.co/bigscience/bloom/resolve/main/config.json?download=true -o config.json
+```
+
+More information on downloading models from HuggingFace can be found [here](https://huggingface.co/docs/hub/en/models-downloading).
+
+
+### Step 3: Log In to Your Container Registry
+
+Before pushing the Docker image, ensure you’re logged into the appropriate container registry. Here are general login methods depending on the registry you use:
+
+1. GitHub Container Registry (ghcr.io):
+```sh
+echo $CR_PAT | docker login ghcr.io -u USERNAME --password-stdin
+```
+Replace CR_PAT with your GitHub Personal Access Token and USERNAME with your GitHub username. This token should have the write:packages and read:packages permissions.
+
+2. Azure Container Registry (ACR): If you're using ACR:
+
+```sh
+az acr login --name <REGISTRY_NAME>
+```
+Replace `<REGISTRY_NAME>` with your Azure Container Registry name.
+
+3. Docker Hub or Other Container Registries:
+```sh
+docker login <REGISTRY_URL>
+```
+Enter your username and password when prompted. Replace `<REGISTRY_URL>` with your registry URL, such as `docker.io` for Docker Hub.
+
+
+### Step 4: Build Docker Image with Private/Custom Weights
+
+1. Set Environment Variables
+
+Before building the Docker image, set the relevant environment variables for the image name, version, and weights path:
+```sh
+export IMAGE_NAME="modelsregistry.azurecr.io/phi-3-mini-4k-instruct:0.0.1"
+export VERSION="0.0.1"
+export WEIGHTS_PATH="kaito/phi-3-mini-4k-instruct/weights"
+```
+
+2. Build and Push the Docker Image
+
+Navigate to the Kaito base directory and build the Docker image, ensuring the weights directory is included in the build context:
+```sh
+docker build -t <IMAGE_NAME> --file docker/presets/models/tfs/Dockerfile --build-arg WEIGHTS_PATH=<WEIGHTS_PATH> --build-arg MODEL_TYPE=text-generation --build-arg VERSION=<VERSION> .
+
+docker push <IMAGE_NAME>
+```
+
+### Step 5: Deploy
+Follow the [Custom Template](./custom-deployment-template.yaml)
diff --git a/docs/custom-model-integration/reference-image-deployment.yaml b/docs/custom-model-integration/reference-image-deployment.yaml
@@ -0,0 +1,40 @@
+apiVersion: kaito.sh/v1alpha1
+kind: Workspace
+metadata:
+  name: workspace-custom-llm
+resource:
+  instanceType: "Standard_NC12s_v3" # Replace with the required VM SKU based on model requirements
+  labelSelector:
+    matchLabels:
+      apps: custom-llm
+inference:
+  template: 
+    spec:
+      containers:
+      - name: custom-llm-container
+        image: ghcr.io/azure/kaito/llm-reference-preset:latest
+        command: ["accelerate"]
+        args: 
+          - "launch"
+          - "--num_processes"
+          - "1"
+          - "--num_machines"
+          - "1"
+          - "--gpu_ids"
+          - "all"
+          - "inference_api.py"
+          - "--pipeline"
+          - "text-generation"
+          - "--trust_remote_code"
+          - "--allow_remote_files"
+          - "--pretrained_model_name_or_path"
+          - "<MODEL_ID>"  # Replace <MODEL_ID> with the specific HuggingFace model identifier
+          - "--torch_dtype"
+          - "float16"  # Set to "float16" for compatibility with V100 GPUs; use "bfloat16" for A100, H100 or newer GPUs
+        volumeMounts:
+        - name: dshm
+          mountPath: /dev/shm
+      volumes:
+      - name: dshm
+        emptyDir:
+          medium: Memory