Skip to content

Commit

Permalink
feat: Options for Building and Running Private/Custom Models (#598)
Browse files Browse the repository at this point in the history
**Reason for Change**:
This PR provides a README and example deployment for running a
private/custom model. These are models not currently in
[`supported_models.yaml`](https://github.com/Azure/kaito/blob/main/presets/models/supported_models.yaml)

PR also introduced new package
`https://github.com/Azure/kaito/pkgs/container/kaito%2Fllm-reference-preset`

Resolves: #594

---------

Signed-off-by: ishaansehgal99 <ishaanforthewin@gmail.com>
  • Loading branch information
ishaansehgal99 authored Sep 18, 2024
1 parent bcc0276 commit f3d6e09
Show file tree
Hide file tree
Showing 5 changed files with 188 additions and 0 deletions.
9 changes: 9 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -220,6 +220,15 @@ docker-build-dataset: docker-buildx
--pull \
--tag $(REGISTRY)/e2e-dataset2:0.0.1 .

.PHONY: docker-build-llm-reference-preset
docker-build-llm-reference-preset: docker-buildx
docker buildx build \
-t ghcr.io/azure/kaito/llm-reference-preset:$(VERSION) \
-t ghcr.io/azure/kaito/llm-reference-preset:latest \
-f docs/custom-model-integration/Dockerfile.reference \
--build-arg MODEL_TYPE=text-generation \
--build-arg VERSION=$(VERSION) .

## --------------------------------------
## Kaito Installation
## --------------------------------------
Expand Down
22 changes: 22 additions & 0 deletions docs/custom-model-integration/Dockerfile.reference
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
FROM python:3.10-slim@sha256:684b1aaf96a7942b3c3af438d162e0baa3510aa7af25ad76d238e0c746bdec79

# Specify the repository source URL for reference and access in Kaito packages.
LABEL org.opencontainers.image.source=https://github.com/azure/kaito

ARG MODEL_TYPE
ARG VERSION

# Set the working directory
WORKDIR /workspace/tfs

# Write the version to a file
RUN echo $VERSION > /workspace/tfs/version.txt

# First, copy just the preset files and install dependencies
# This is done before copying the code to utilize Docker's layer caching and
# avoid reinstalling dependencies unless the requirements file changes.
# Inference
COPY presets/inference/${MODEL_TYPE}/requirements.txt /workspace/tfs/inference-requirements.txt
RUN pip install --no-cache-dir -r inference-requirements.txt

COPY presets/inference/${MODEL_TYPE}/inference_api.py /workspace/tfs/inference_api.py
36 changes: 36 additions & 0 deletions docs/custom-model-integration/custom-deployment-template.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
apiVersion: kaito.sh/v1alpha1
kind: Workspace
metadata:
name: workspace-custom-llm
resource:
instanceType: "Standard_NC12s_v3" # Replace with the required VM SKU based on model requirements
labelSelector:
matchLabels:
apps: custom-llm
inference:
template:
spec:
containers:
- name: custom-llm-container
image: modelsregistry.azurecr.io/custom-llm:0.0.1 # Replace with the actual image name
command: ["accelerate"]
args:
- "launch"
- "--num_processes"
- "1"
- "--num_machines"
- "1"
- "--gpu_ids"
- "all"
- "inference_api.py"
- "--pipeline"
- "text-generation"
- "--torch_dtype"
- "float16" # Set to "float16" for compatibility with V100 GPUs; use "bfloat16" for A100, H100 or newer GPUs
volumeMounts:
- name: dshm
mountPath: /dev/shm
volumes:
- name: dshm
emptyDir:
medium: Memory
81 changes: 81 additions & 0 deletions docs/custom-model-integration/custom-model-integration-guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Custom Model Integration Guide

## Option 1: Use Pre-Built Docker Image Without Model Weights
If you want to avoid building a Docker image with model weights, use our pre-built reference image (`ghcr.io/azure/kaito/llm-reference-preset:latest`). This image, built with [Dockerfile.reference](./Dockerfile.reference), dynamically downloads model weights from HuggingFace at runtime, reducing the need to create and maintain custom images.


- **[Sample Deployment YAML](./reference-image-deployment.yaml)**


## Option 2: Build a Custom Docker Image with Model Weights

### Step 1: Clone the Repository

```sh
git clone https://github.com/Azure/kaito.git
```

### Step 2: Download Your Private/Custom Model Weights

For example, assuming HuggingFace weights:
```sh
git lfs install
git clone git@hf.co:<MODEL_ID> # Example: git clone git@hf.co:bigscience/bloom
# OR
git clone https://huggingface.co/bigscience/bloom
```

Alternatively, use curl:
```
curl -sSL https://huggingface.co/bigscience/bloom/resolve/main/config.json?download=true -o config.json
```

More information on downloading models from HuggingFace can be found [here](https://huggingface.co/docs/hub/en/models-downloading).


### Step 3: Log In to Your Container Registry

Before pushing the Docker image, ensure you’re logged into the appropriate container registry. Here are general login methods depending on the registry you use:

1. GitHub Container Registry (ghcr.io):
```sh
echo $CR_PAT | docker login ghcr.io -u USERNAME --password-stdin
```
Replace CR_PAT with your GitHub Personal Access Token and USERNAME with your GitHub username. This token should have the write:packages and read:packages permissions.

2. Azure Container Registry (ACR): If you're using ACR:

```sh
az acr login --name <REGISTRY_NAME>
```
Replace `<REGISTRY_NAME>` with your Azure Container Registry name.

3. Docker Hub or Other Container Registries:
```sh
docker login <REGISTRY_URL>
```
Enter your username and password when prompted. Replace `<REGISTRY_URL>` with your registry URL, such as `docker.io` for Docker Hub.


### Step 4: Build Docker Image with Private/Custom Weights

1. Set Environment Variables

Before building the Docker image, set the relevant environment variables for the image name, version, and weights path:
```sh
export IMAGE_NAME="modelsregistry.azurecr.io/phi-3-mini-4k-instruct:0.0.1"
export VERSION="0.0.1"
export WEIGHTS_PATH="kaito/phi-3-mini-4k-instruct/weights"
```

2. Build and Push the Docker Image

Navigate to the Kaito base directory and build the Docker image, ensuring the weights directory is included in the build context:
```sh
docker build -t <IMAGE_NAME> --file docker/presets/models/tfs/Dockerfile --build-arg WEIGHTS_PATH=<WEIGHTS_PATH> --build-arg MODEL_TYPE=text-generation --build-arg VERSION=<VERSION> .

docker push <IMAGE_NAME>
```

### Step 5: Deploy
Follow the [Custom Template](./custom-deployment-template.yaml)
40 changes: 40 additions & 0 deletions docs/custom-model-integration/reference-image-deployment.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
apiVersion: kaito.sh/v1alpha1
kind: Workspace
metadata:
name: workspace-custom-llm
resource:
instanceType: "Standard_NC12s_v3" # Replace with the required VM SKU based on model requirements
labelSelector:
matchLabels:
apps: custom-llm
inference:
template:
spec:
containers:
- name: custom-llm-container
image: ghcr.io/azure/kaito/llm-reference-preset:latest
command: ["accelerate"]
args:
- "launch"
- "--num_processes"
- "1"
- "--num_machines"
- "1"
- "--gpu_ids"
- "all"
- "inference_api.py"
- "--pipeline"
- "text-generation"
- "--trust_remote_code"
- "--allow_remote_files"
- "--pretrained_model_name_or_path"
- "<MODEL_ID>" # Replace <MODEL_ID> with the specific HuggingFace model identifier
- "--torch_dtype"
- "float16" # Set to "float16" for compatibility with V100 GPUs; use "bfloat16" for A100, H100 or newer GPUs
volumeMounts:
- name: dshm
mountPath: /dev/shm
volumes:
- name: dshm
emptyDir:
medium: Memory

0 comments on commit f3d6e09

Please sign in to comment.