Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIM on GKE Tutorial #737

Merged
merged 16 commits into from
Jul 29, 2024
Merged
159 changes: 159 additions & 0 deletions tutorials-and-examples/nvidia-nim/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
# NVIDIA NIM on GKE

## Before you begin

1. Get access to NVIDIA NIMs
> [!IMPORTANT]
> Before you proceed further, ensure you have the NVIDIA AI Enterprise License (NVAIE) to access the NIMs. To get started, go to [build.nvidia.com](https://build.nvidia.com/explore/discover?signin=true) and provide your company email address

2. In the [Google Cloud console](https://console.cloud.google.com), on the project selector page, select or create a new project with [billing enabled](https://cloud.google.com/billing/docs/how-to/verify-billing-enabled#console)

3. Ensure you have the following tools installed on your workstation
* [gcloud CLI](https://cloud.google.com/sdk/docs/install)
* [gcloud kubectl](https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-access-for-kubectl#install_kubectl)
* [git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)
* [jq](https://jqlang.github.io/jq/)
* [ngc](https://ngc.nvidia.com/setup)

4. Enable the required APIs
```bash
gcloud services enable \
container.googleapis.com \
file.googleapis.com
```

## Set up your GKE Cluster

1. Choose your region and set your project and machine variables:
```bash
export PROJECT_ID=$(gcloud config get project)
export REGION=us-central1
export ZONE=${REGION?}-b
export MACH=a2-highgpu-1g
export GPU_TYPE=nvidia-tesla-a100
export GPU_COUNT=1
```


2. Create a GKE cluster:
```bash
gcloud container clusters create nim-demo --location ${REGION?} \
--workload-pool ${PROJECT_ID?}.svc.id.goog \
--enable-image-streaming \
--enable-ip-alias \
--node-locations ${ZONE?} \
--workload-pool=${PROJECT_ID?}.svc.id.goog \
--addons=GcpFilestoreCsiDriver \
--machine-type n2d-standard-4 \
--num-nodes 1 --min-nodes 1 --max-nodes 5 \
--ephemeral-storage-local-ssd=count=2
```

3. Create a nodepool
```bash
gcloud container node-pools create ${MACH?}-node-pool --cluster nim-demo \
--accelerator type=${GPU_TYPE?},count=${GPU_COUNT?},gpu-driver-version=latest \
--machine-type ${MACH?} \
--ephemeral-storage-local-ssd=count=${GPU_COUNT?} \
--enable-autoscaling --enable-image-streaming \
--num-nodes=1 --min-nodes=1 --max-nodes=3 \
--node-locations ${ZONE?} \
--region ${REGION?} \
--spot
```


## Set Up Access to NVIDIA NIMs and prepare environment

1. Get your NGC_API_KEY from NGC
```bash
export NGC_CLI_API_KEY="<YOUR_API_KEY>"
```
> [!NOTE]
> If you have not set up NGC, see [NGC Setup](https://ngc.nvidia.com/setup) to get your access key and begin using NGC.

2. As a part of the NGC setup, set your configs
```bash
ngc config set
```

3. Ensure you have access to the repository by listing the models
```bash
ngc registry model list
```

4. Create a Kuberntes namespace and switch context to that namespace
brandonroyal marked this conversation as resolved.
Show resolved Hide resolved
```bash
kubectl create namespace nim
```

## Deploy a PVC to persist the model
1. Create a PVC to persist the model weights - recommended for deployments with more than one (1) replica. Save the following yaml as `pvc.yaml` or use existing file in this repository
brandonroyal marked this conversation as resolved.
Show resolved Hide resolved
```yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: model-store-pvc
namespace: nim
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 30Gi
storageClassName: standard-rwx
```

2. Apply PVC
```bash
kubectl apply -f pvc.yaml
```
> [!NOTE]
> This PVC will [dynamically provision a PV](https://cloud.google.com/kubernetes-engine/docs/concepts/persistent-volumes#dynamic_provisioning) with the necessary storage to persist model weights across replicas of your pods.

## Deploy the NIM with the generated engine using a Helm chart

1. Clone the nim-deploy repository
```bash
git clone https://github.com/NVIDIA/nim-deploy.git
cd nim-deploy/helm
```

2. Deploy chart with minimal configurations
```bash
helm --namespace nim install demo-nim nim-llm/ --set model.ngcAPIKey=$NGC_CLI_API_KEY --set persistence.enabled=true --set persistence.existingClaim=model-store-pvc
brandonroyal marked this conversation as resolved.
Show resolved Hide resolved
```

## Test the NIM
1. Expose the service
```bash
kubectl port-forward --namespace nim services/demo-nim-nim-llm 8000
```

2. Send a test prompt - A100
```bash
curl -X 'POST' \
'http://localhost:8000/v1/chat/completions' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"messages": [
{
"content": "You are a polite and respectful poet.",
"role": "system"
},
{
"content": "Write a limerick about the wonders of GPUs and Kubernetes?",
"role": "user"
}
],
"model": "meta/llama3-8b-instruct",
"max_tokens": 256,
"top_p": 1,
"n": 1,
"stream": false,
"frequency_penalty": 0.0
}' | jq '.choices[0].message.content' -
```

3. Browse the API by navigating to http://localhost:8000/docs
26 changes: 26 additions & 0 deletions tutorials-and-examples/nvidia-nim/pvc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Copyright 2024 Google LLC
brandonroyal marked this conversation as resolved.
Show resolved Hide resolved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: model-store-pvc
namespace: nim
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 30Gi
storageClassName: standard-rwx