-
Notifications
You must be signed in to change notification settings - Fork 70
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: Options for Building and Running Private/Custom Models (#598)
**Reason for Change**: This PR provides a README and example deployment for running a private/custom model. These are models not currently in [`supported_models.yaml`](https://github.com/Azure/kaito/blob/main/presets/models/supported_models.yaml) PR also introduced new package `https://github.com/Azure/kaito/pkgs/container/kaito%2Fllm-reference-preset` Resolves: #594 --------- Signed-off-by: ishaansehgal99 <ishaanforthewin@gmail.com>
- Loading branch information
1 parent
bcc0276
commit f3d6e09
Showing
5 changed files
with
188 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
FROM python:3.10-slim@sha256:684b1aaf96a7942b3c3af438d162e0baa3510aa7af25ad76d238e0c746bdec79 | ||
|
||
# Specify the repository source URL for reference and access in Kaito packages. | ||
LABEL org.opencontainers.image.source=https://github.com/azure/kaito | ||
|
||
ARG MODEL_TYPE | ||
ARG VERSION | ||
|
||
# Set the working directory | ||
WORKDIR /workspace/tfs | ||
|
||
# Write the version to a file | ||
RUN echo $VERSION > /workspace/tfs/version.txt | ||
|
||
# First, copy just the preset files and install dependencies | ||
# This is done before copying the code to utilize Docker's layer caching and | ||
# avoid reinstalling dependencies unless the requirements file changes. | ||
# Inference | ||
COPY presets/inference/${MODEL_TYPE}/requirements.txt /workspace/tfs/inference-requirements.txt | ||
RUN pip install --no-cache-dir -r inference-requirements.txt | ||
|
||
COPY presets/inference/${MODEL_TYPE}/inference_api.py /workspace/tfs/inference_api.py |
36 changes: 36 additions & 0 deletions
36
docs/custom-model-integration/custom-deployment-template.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
apiVersion: kaito.sh/v1alpha1 | ||
kind: Workspace | ||
metadata: | ||
name: workspace-custom-llm | ||
resource: | ||
instanceType: "Standard_NC12s_v3" # Replace with the required VM SKU based on model requirements | ||
labelSelector: | ||
matchLabels: | ||
apps: custom-llm | ||
inference: | ||
template: | ||
spec: | ||
containers: | ||
- name: custom-llm-container | ||
image: modelsregistry.azurecr.io/custom-llm:0.0.1 # Replace with the actual image name | ||
command: ["accelerate"] | ||
args: | ||
- "launch" | ||
- "--num_processes" | ||
- "1" | ||
- "--num_machines" | ||
- "1" | ||
- "--gpu_ids" | ||
- "all" | ||
- "inference_api.py" | ||
- "--pipeline" | ||
- "text-generation" | ||
- "--torch_dtype" | ||
- "float16" # Set to "float16" for compatibility with V100 GPUs; use "bfloat16" for A100, H100 or newer GPUs | ||
volumeMounts: | ||
- name: dshm | ||
mountPath: /dev/shm | ||
volumes: | ||
- name: dshm | ||
emptyDir: | ||
medium: Memory |
81 changes: 81 additions & 0 deletions
81
docs/custom-model-integration/custom-model-integration-guide.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
# Custom Model Integration Guide | ||
|
||
## Option 1: Use Pre-Built Docker Image Without Model Weights | ||
If you want to avoid building a Docker image with model weights, use our pre-built reference image (`ghcr.io/azure/kaito/llm-reference-preset:latest`). This image, built with [Dockerfile.reference](./Dockerfile.reference), dynamically downloads model weights from HuggingFace at runtime, reducing the need to create and maintain custom images. | ||
|
||
|
||
- **[Sample Deployment YAML](./reference-image-deployment.yaml)** | ||
|
||
|
||
## Option 2: Build a Custom Docker Image with Model Weights | ||
|
||
### Step 1: Clone the Repository | ||
|
||
```sh | ||
git clone https://github.com/Azure/kaito.git | ||
``` | ||
|
||
### Step 2: Download Your Private/Custom Model Weights | ||
|
||
For example, assuming HuggingFace weights: | ||
```sh | ||
git lfs install | ||
git clone git@hf.co:<MODEL_ID> # Example: git clone git@hf.co:bigscience/bloom | ||
# OR | ||
git clone https://huggingface.co/bigscience/bloom | ||
``` | ||
|
||
Alternatively, use curl: | ||
``` | ||
curl -sSL https://huggingface.co/bigscience/bloom/resolve/main/config.json?download=true -o config.json | ||
``` | ||
|
||
More information on downloading models from HuggingFace can be found [here](https://huggingface.co/docs/hub/en/models-downloading). | ||
|
||
|
||
### Step 3: Log In to Your Container Registry | ||
|
||
Before pushing the Docker image, ensure you’re logged into the appropriate container registry. Here are general login methods depending on the registry you use: | ||
|
||
1. GitHub Container Registry (ghcr.io): | ||
```sh | ||
echo $CR_PAT | docker login ghcr.io -u USERNAME --password-stdin | ||
``` | ||
Replace CR_PAT with your GitHub Personal Access Token and USERNAME with your GitHub username. This token should have the write:packages and read:packages permissions. | ||
|
||
2. Azure Container Registry (ACR): If you're using ACR: | ||
|
||
```sh | ||
az acr login --name <REGISTRY_NAME> | ||
``` | ||
Replace `<REGISTRY_NAME>` with your Azure Container Registry name. | ||
|
||
3. Docker Hub or Other Container Registries: | ||
```sh | ||
docker login <REGISTRY_URL> | ||
``` | ||
Enter your username and password when prompted. Replace `<REGISTRY_URL>` with your registry URL, such as `docker.io` for Docker Hub. | ||
|
||
|
||
### Step 4: Build Docker Image with Private/Custom Weights | ||
|
||
1. Set Environment Variables | ||
|
||
Before building the Docker image, set the relevant environment variables for the image name, version, and weights path: | ||
```sh | ||
export IMAGE_NAME="modelsregistry.azurecr.io/phi-3-mini-4k-instruct:0.0.1" | ||
export VERSION="0.0.1" | ||
export WEIGHTS_PATH="kaito/phi-3-mini-4k-instruct/weights" | ||
``` | ||
|
||
2. Build and Push the Docker Image | ||
|
||
Navigate to the Kaito base directory and build the Docker image, ensuring the weights directory is included in the build context: | ||
```sh | ||
docker build -t <IMAGE_NAME> --file docker/presets/models/tfs/Dockerfile --build-arg WEIGHTS_PATH=<WEIGHTS_PATH> --build-arg MODEL_TYPE=text-generation --build-arg VERSION=<VERSION> . | ||
|
||
docker push <IMAGE_NAME> | ||
``` | ||
|
||
### Step 5: Deploy | ||
Follow the [Custom Template](./custom-deployment-template.yaml) |
40 changes: 40 additions & 0 deletions
40
docs/custom-model-integration/reference-image-deployment.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
apiVersion: kaito.sh/v1alpha1 | ||
kind: Workspace | ||
metadata: | ||
name: workspace-custom-llm | ||
resource: | ||
instanceType: "Standard_NC12s_v3" # Replace with the required VM SKU based on model requirements | ||
labelSelector: | ||
matchLabels: | ||
apps: custom-llm | ||
inference: | ||
template: | ||
spec: | ||
containers: | ||
- name: custom-llm-container | ||
image: ghcr.io/azure/kaito/llm-reference-preset:latest | ||
command: ["accelerate"] | ||
args: | ||
- "launch" | ||
- "--num_processes" | ||
- "1" | ||
- "--num_machines" | ||
- "1" | ||
- "--gpu_ids" | ||
- "all" | ||
- "inference_api.py" | ||
- "--pipeline" | ||
- "text-generation" | ||
- "--trust_remote_code" | ||
- "--allow_remote_files" | ||
- "--pretrained_model_name_or_path" | ||
- "<MODEL_ID>" # Replace <MODEL_ID> with the specific HuggingFace model identifier | ||
- "--torch_dtype" | ||
- "float16" # Set to "float16" for compatibility with V100 GPUs; use "bfloat16" for A100, H100 or newer GPUs | ||
volumeMounts: | ||
- name: dshm | ||
mountPath: /dev/shm | ||
volumes: | ||
- name: dshm | ||
emptyDir: | ||
medium: Memory |