LeapfrogAI VLLM Backend

⚠️This repo is archived in favor of the LeapfrogAI monorepo: https://github.com/defenseunicorns/leapfrogai⚠️

Description

A LeapfrogAI API-compatible VLLM wrapper for quantized and un-quantized model inferencing across GPU infrastructures.

Usage

See instructions to get the backend up and running. Then, use the LeapfrogAI API server to interact with the backend.

Instructions

The instructions in this section assume the following:

Properly installed and configured Python 3.11.x, to include its development tools, and uv
The LeapfrogAI API server is deployed and running

The following are additional assumptions for GPU inferencing:

You have properly installed one or more NVIDIA GPUs and GPU drivers
You have properly installed and configured the cuda-toolkit and nvidia-container-toolkit

Model Selection

The default model that comes with this backend in this repository's officially released images is a 4-bit quantization of the Synthia-7b model.

Run Locally

# install with GPU compilation and deps
make requirements-dev

# Setup Virtual Environment
make create-venv
source .venv/bin/activate
make requirements-dev

# Copy the environment variable file, change this if different params are needed
cp .env.example .env

# Make sure environment variables are set
source .env

# Clone Model
# Supply a REPO_ID, FILENAME and REVISION if a different model is desired
make fetch-model

# Copy the config file, change this if different params are needed
cp config.example.yaml config.yaml

# Start Model Backend
make dev

Run in Docker

Local Image Build and Run

For local image building and running.

# Supply a REPO_ID, FILENAME and REVISION if a different model is desired
make docker-build
make docker-run

Remote Image Build and Run

For pulling a tagged image from the main release repository.

Where <IMAGE_TAG> is the released packages found here.

docker build -t ghcr.io/defenseunicorns/leapfrogai/vllm:<IMAGE_TAG> .
docker run --gpus device=0 -e GPU_ENABLED=true -p 50051:50051 -d --name vllm ghcr.io/defenseunicorns/leapfrogai/vllm:<IMAGE_TAG>

VLLM Specific Packaging

VLLM requires access to host system GPU drivers in order to operate when compiled specifically for GPU inferencing. Even if no layers are offloaded to the GPU at runtime, VLLM will throw an unrecoverable exception.

Zarf package creation:

zarf package create --set IMAGE_REPOSITORY=ghcr.io/defenseunicorns/leapfrogai/vllm --set IMAGE_VERSION=<IMAGE_TAG> --set NAME=vllm --insecure
zarf package publish zarf-package-vllm-amd64-<IMAGE_TAG>.tar.zst oci://ghcr.io/defenseunicorns/packages/leapfrogai

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
config.example.yaml		config.example.yaml
main.py		main.py
overrides.txt		overrides.txt
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
zarf-config.yaml		zarf-config.yaml
zarf.yaml		zarf.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LeapfrogAI VLLM Backend

Description

Usage

Instructions

Model Selection

Run Locally

Run in Docker

Local Image Build and Run

Remote Image Build and Run

VLLM Specific Packaging

About

Releases

Packages

Contributors 2

Languages

defenseunicorns/leapfrogai-backend-vllm

Folders and files

Latest commit

History

Repository files navigation

LeapfrogAI VLLM Backend

Description

Usage

Instructions

Model Selection

Run Locally

Run in Docker

Local Image Build and Run

Remote Image Build and Run

VLLM Specific Packaging

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages