diff --git a/README.md b/README.md index af3a04b22..318b1a908 100644 --- a/README.md +++ b/README.md @@ -12,24 +12,15 @@ - [Components](#components) - [API](#api) - [Backends](#backends) - - [Image Hardening](#image-hardening) - [SDK](#sdk) - [User Interface](#user-interface) - [Repeater](#repeater) + - [Image Hardening](#image-hardening) - [Usage](#usage) - - [UDS (Latest)](#uds-latest) - - [UDS (Dev)](#uds-dev) - - [CPU](#cpu) - - [GPU](#gpu) - - [Accessing the UI](#accessing-the-ui) - - [Cleanup](#cleanup) + - [UDS](#uds) + - [UDS Latest](#uds-latest) + - [UDS Dev](#uds-dev) - [Local Dev](#local-dev) - - [API](#api-1) - - [Repeater](#repeater-1) - - [Backend: llama-cpp-python](#backend-llama-cpp-python) - - [Backend: text-embeddings](#backend-text-embeddings) - - [Backend: vllm](#backend-vllm) - - [Backend: whisper](#backend-whisper) - [Community](#community) ## Overview @@ -55,20 +46,21 @@ The LeapfrogAI repository follows a monorepo structure based around an [API](#ap ```shell leapfrogai/ ├── src/ -│ ├── leapfrogai_api/ -│ │ ├── main.py -│ │ └── ... -│ ├── leapfrogai_sdk/ -│ └── leapfrogai_ui/ +│ ├── leapfrogai_api/ # source code for the API +│ ├── leapfrogai_sdk/ # source code for the SDK +│ └── leapfrogai_ui/ # source code for the UI ├── packages/ -│ ├── api/ -│ ├── llama-cpp-python/ -│ ├── text-embeddings/ -│ ├── vllm/ -│ └── whisper/ +│ ├── api/ # deployment infrastructure for the API +│ ├── llama-cpp-python/ # source code & deployment infrastructure for the llama-cpp-python backend +│ ├── repeater/ # source code & deployment infrastructure for the repeater model backend +│ ├── supabase/ # deployment infrastructure for the Supabase backend and postgres database +│ ├── text-embeddings/ # source code & deployment infrastructure for the text-embeddings backend +│ ├── ui/ # deployment infrastructure for the UI +│ ├── vllm/ # source code & deployment infrastructure for the vllm backend +│ └── whisper/ # source code & deployment infrastructure for the whisper backend ├── uds-bundles/ -│ ├── dev/ -│ └── latest/ +│ ├── dev/ # uds bundles for local uds dev deployments +│ └── latest/ # uds bundles for the most current uds deployments ├── Makefile ├── pyproject.toml ├── README.md @@ -87,6 +79,8 @@ LeapfrogAI provides an API that closely matches that of OpenAI's. This feature a ### Backends +LeapfrogAI provides several backends for a variety of use cases. + > Available Backends: > | Backend | AMD64 Support | ARM64 Support | Cuda Support | Docker Ready | K8s Ready | Zarf Ready | > | --- | --- | --- | --- | --- | --- | --- | @@ -94,17 +88,6 @@ LeapfrogAI provides an API that closely matches that of OpenAI's. This feature a > | [whisper](packages/whisper/) | ✅ | 🚧 | ✅ | ✅ | ✅ | ✅ | > | [text-embeddings](packages/text-embeddings/) | ✅ | 🚧 | ✅ | ✅ | ✅ | ✅ | > | [vllm](packages/vllm/) | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | -> | [rag](https://github.com/defenseunicorns/leapfrogai-backend-rag) (repo integration soon) | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | - -LeapfrogAI provides several backends for a variety of use cases. - -### Image Hardening - -> GitHub Repo: -> -> - [leapfrogai-images](https://github.com/defenseunicorns/leapfrogai-images) - -LeapfrogAI leverages Chainguard's [apko](https://github.com/chainguard-dev/apko) to harden base python images - pinning Python versions to the latest supported version by the other components of the LeapfrogAI stack. ### SDK @@ -118,183 +101,49 @@ LeapfrogAI provides a [User Interface](src/leapfrogai_ui/) with support for comm The [repeater](packages/repeater/) "model" is a basic "backend" that parrots all inputs it receives back to the user. It is built out the same way all the actual backends are and it primarily used for testing the API. -## Usage - -### UDS (Latest) - -LeapfrogAI can be deployed and run locally via UDS and Kubernetes, built out using [Zarf](https://zarf.dev) packages. This pulls the most recent package images and is the most stable way of running a local LeapfrogAI deployment. These instructions can be found on the [LeapfrogAI Docs](https://docs.leapfrog.ai/docs/) site. - -### UDS (Dev) - -If you want to make some changes to LeapfrogAI before deploying via UDS (for example in a dev environment), you can follow these instructions: - -Make sure your system has the [required dependencies](https://docs.leapfrog.ai/docs/local-deploy-guide/quick_start/#prerequisites). - -For ease, it's best to create a virtual environment: - -```shell -python -m venv .venv -source .venv/bin/activate -``` - -Each component is built into its own Zarf package. You can build all of the packages you need at once with the following `Make` targets: +### Image Hardening -```shell -make build-cpu # api, llama-cpp-python, text-embeddings, whisper -make build-gpu # api, vllm, text-embeddings, whisper -make build-all # all of the backends -``` +> GitHub Repo: +> +> - [leapfrogai-images](https://github.com/defenseunicorns/leapfrogai-images) -**OR** +LeapfrogAI leverages Chainguard's [apko](https://github.com/chainguard-dev/apko) to harden base python images - pinning Python versions to the latest supported version by the other components of the LeapfrogAI stack. -You can build components individually using the following `Make` targets: +## Usage -```shell -make build-api -make build-vllm # if you have GPUs -make build-llama-cpp-python # if you have CPU only -make build-text-embeddings -make build-whisper -``` +### UDS -Once the packages are created, you can deploy either a CPU or GPU-enabled deployment via one of the UDS bundles: +LeapfrogAI can be deployed and run locally via UDS and Kubernetes, built out using [Zarf](https://zarf.dev) packages. See the [Quick Start](https://docs.leapfrog.ai/docs/local-deploy-guide/quick_start/#prerequisites) for a list of prerequisite packages that must be installed first. -#### CPU +Prior to deploying any LeapfrogAI packages, a UDS Kubernetes cluster must be deployed using the most recent k3d bundle: -```shell -cd uds-bundles/dev/cpu -uds create . +```sh uds deploy k3d-core-slim-dev:0.22.2 -uds deploy uds-bundle-leapfrogai*.tar.zst -``` - -#### GPU - -```shell -cd uds-bundles/dev/gpu -uds create . -uds deploy k3d-core-slim-dev:0.22.2 --set K3D_EXTRA_ARGS="--gpus=all --image=ghcr.io/justinthelaw/k3d-gpu-support:v1.27.4-k3s1-cuda" # be sure to check if a newer version exists -uds deploy uds-bundle-leapfrogai-*.tar.zst --confirm ``` -### Accessing the UI - -LeapfrogAI is integrated with the UDS Core KeyCloak service, which provides authentication via SSO. Below are general instructions for accessing the LeapfrogAI UI after a successful UDS deployment of UDS Core and LeapfrogAI. - -1. Connect to the KeyCloak admin panel - - Run the following to get a port-forwarded tunnel: `uds zarf connect keycloak` - - Go to the resulting localhost URL and create an admin account - -2. Go to ai.uds.dev and press "Login using SSO" - -3. Register a new user by pressing "Register Here" +#### UDS Latest -4. Fill-in all of the information - - The bot detection requires you to scroll and click around in a natural way, so if the Register button is not activated despite correct information, try moving around the page until the bot detection says 100% verified +This type of deployment pulls the most recent package images and is the most stable way of running a local LeapfrogAI deployment. These instructions can be found on the [LeapfrogAI Docs](https://docs.leapfrog.ai/docs/) site. -5. Using an authenticator, follow the MFA steps +#### UDS Dev -6. Go to sso.uds.dev - - Login using the admin account you created earlier +If you want to make some changes to LeapfrogAI before deploying via UDS (for example in a dev environment), follow the [UDS Dev Instructions](/uds-bundles/dev/README.md). -7. Approve the newly registered user - - Click on the hamburger menu in the top left to open/close the sidebar - - Go to the dropdown that likely says "Keycloak" and switch to the "uds" context - - Click "Users" in the sidebar - - Click on the newly registered user's username - - Go to the "Email Verified" switch and toggle it to be "Yes" - - Scroll to the bottom and press "Save" - -8. Go back to ai.uds.dev and login as the registered user to access the UI - -### Cleanup - -To clean-up or perform a fresh install, run the following commands in the context in which you had previously installed UDS Core and LeapfrogAI: - -```bash -k3d cluster delete uds # kills a running uds cluster -uds zarf tools clear-cache # clears the Zarf tool cache -rm -rf ~/.uds-cache # clears the UDS cache -docker system prune -a -f # removes all hanging containers and images -docker volume prune -f # removes all hanging container volumes -``` ### Local Dev -The following instructions are for running each of the LFAI components for local development. This is useful when testing changes to a specific component, but will not assist in a full deployment of LeapfrogAI. Please refer to the above sections for deployment instructions. - -It is highly recommended to make a virtual environment to keep the development environment clean: - -```shell -python -m venv .venv -source .venv/bin/activate -``` - -#### API - -To run the LeapfrogAI API locally (starting from the root directory of the repository): - -```shell -python -m pip install src/leapfrogai_sdk -cd src/leapfrogai_api -python -m pip install . -uvicorn leapfrogai_api.main:app --port 3000 --log-level debug --reload -``` - -#### Repeater +Each of the LFAI components can also be run individually outside of a Kubernetes environment via local development. This is useful when testing changes to a specific component, but will not assist in a full deployment of LeapfrogAI. Please refer to the above sections for deployment instructions. -The instructions for running the basic repeater model (used for testing the API) can be found in the package [README](packages/repeater/README.md). +Please refer to the linked READMEs for each individual packages local development instructions: -#### Backend: llama-cpp-python - -To run the llama-cpp-python backend locally (starting from the root directory of the repository): - -```shell -python -m pip install src/leapfrogai_sdk -cd packages/llama-cpp-python -python -m pip install .[dev] -python scripts/model_download.py -mv .model/*.gguf .model/model.gguf -cp config.example.yaml config.yaml # Make any necessary updates -lfai-cli --app-dir=. main:Model -``` - -#### Backend: text-embeddings - -To run the text-embeddings backend locally (starting from the root directory of the repository): - -```shell -python -m pip install src/leapfrogai_sdk -cd packages/text-embeddings -python -m pip install .[dev] -python scripts/model_download.py -python -u main.py -``` - -#### Backend: vllm - -To run the vllm backend locally (starting from the root directory of the repository): - -```shell -python -m pip install src/leapfrogai_sdk -cd packages/vllm -python -m pip install .[dev] -python packages/vllm/src/model_download.py -export QUANTIZATION=gptq -python -u src/main.py -``` - -#### Backend: whisper - -To run the vllm backend locally (starting from the root directory of the repository): - -```shell -python -m pip install src/leapfrogai_sdk -cd packages/whisper -python -m pip install ".[dev]" -ct2-transformers-converter --model openai/whisper-base --output_dir .model --copy_files tokenizer.json --quantization float32 -python -u main.py -``` +- [API](/src/leapfrogai_api/README.md) +- [llama-cpp-python](/packages/llama-cpp-python/README.md) +- [repeater](/packages/repeater/README.md) +- [supabase](/packages/supabase/README.md) +- [text-embeddings](/packages/text-embeddings/README.md) +- [ui](/src/leapfrogai_ui/README.md) +- [vllm](/packages/vllm/README.md) +- [whisper](/packages/whisper/README.md) ## Community diff --git a/packages/llama-cpp-python/README.md b/packages/llama-cpp-python/README.md index 4d7033a13..313649ed3 100644 --- a/packages/llama-cpp-python/README.md +++ b/packages/llama-cpp-python/README.md @@ -22,30 +22,45 @@ The following are additional assumptions for GPU inferencing: The default model that comes with this backend in this repository's officially released images is a [4-bit quantization of the Synthia-7b model](https://huggingface.co/TheBloke/SynthIA-7B-v2.0-GPTQ). -### Run Locally +Models are pulled from [HuggingFace Hub](https://huggingface.co/models) via the [model_download.py](/packages/llama-cpp-python/scripts/model_download.py) script. To change what model comes with the llama-cpp-python backend, set the following environment variables: + +```bash +REPO_ID # eg: "TheBloke/SynthIA-7B-v2.0-GGUF" +FILENAME # eg: "synthia-7b-v2.0.Q4_K_M.gguf" +REVISION # eg: "3f65d882253d1f15a113dabf473a7c02a004d2b5" +``` + +## Zarf Package Deployment + +To build and deploy just the llama-cpp-python Zarf package (from the root of the repository): + +> Deploy a [UDS cluster](/README.md#uds) if one isn't deployed already + +```shell +make build-llama-cpp-python LOCAL_VERSION=dev +uds zarf package deploy packages/llama-cpp-python/zarf-package-llama-cpp-python-*-dev.tar.zst --confirm +``` + +## Run Locally + + +To run the llama-cpp-python backend locally (starting from the root directory of the repository): From this directory: ```bash # Setup Virtual Environment python -m venv .venv source .venv/bin/activate - -python -m pip install ../../src/leapfrogai_sdk -python -m pip install . ``` ```bash -# To support Huggingface Hub model downloads +# Install dependencies +python -m pip install src/leapfrogai_sdk +cd packages/llama-cpp-python python -m pip install ".[dev]" ``` ```bash -# Copy the environment variable file, change this if different params are needed -cp .env.example .env - -# Make sure environment variables are set -source .env - # Clone Model # Supply a REPO_ID, FILENAME and REVISION if a different model is desired python scripts/model_download.py @@ -53,5 +68,5 @@ python scripts/model_download.py mv .model/*.gguf .model/model.gguf # Start Model Backend -python -m leapfrogai_sdk.cli --app-dir=. main:Model +lfai-cli --app-dir=. main:Model ``` diff --git a/packages/repeater/README.md b/packages/repeater/README.md index c4c14eb91..86f362ea5 100644 --- a/packages/repeater/README.md +++ b/packages/repeater/README.md @@ -7,30 +7,41 @@ A LeapfrogAI API-compatible repeater model that simply parrots the input it is p The repeater model is used to verify that the API is able to both load configs for and send inputs to a very simple model. The repeater model fulfills this role by returning the input it recieves as output. +## Zarf Package Deployment + +To build and deploy just the repeater Zarf package (from the root of the repository): + +> Deploy a [UDS cluster](/README.md#uds) if one isn't deployed already + +```shell +make build-repeater LOCAL_VERSION=dev +uds zarf package deploy packages/repeater/zarf-package-repeater-*-dev.tar.zst --confirm +``` + ## Local Usage Here is how to run the repeater model locally to test the API: It's easiest to set up a virtual environment to keep things clean: -``` +```bash python -m venv .venv source .venv/bin/activate ``` First install the lfai-repeater project and dependencies. From the root of the project repository: -``` +```bash pip install src/leapfrogai_sdk cd packages/repeater pip install . ``` Next, launch the repeater model: -``` +```bash python repeater.py ``` Now the basic API tests can be run in full. In a new terminal, starting from the root of the project repository: -``` +```bash export LFAI_RUN_REPEATER_TESTS=true # this is needed to run the tests that require the repeater model, otherwise they get skipped pytest tests/pytest/test_api_auth.py ``` diff --git a/packages/text-embeddings/README.md b/packages/text-embeddings/README.md index a74236806..33482867d 100644 --- a/packages/text-embeddings/README.md +++ b/packages/text-embeddings/README.md @@ -7,6 +7,34 @@ A LeapfrogAI API-compatible [instructor-xl](https://huggingface.co/hkunlp/instru # Usage -:construction_worker: This documentation is still under construction. :construction_worker: +## Zarf Package Deployment +To build and deploy just the text-embeddings Zarf package (from the root of the repository): +> Deploy a [UDS cluster](/README.md#uds) if one isn't deployed already + +```shell +make build-text-embeddings LOCAL_VERSION=dev +uds zarf package deploy packages/text-embeddings/zarf-package-text-embeddings-*-dev.tar.zst --confirm +``` + +## Local Development + +To run the text-embeddings backend locally (starting from the root directory of the repository): + +```shell +# Setup Virtual Environment if you haven't done so already +python -m venv .venv +source .venv/bin/activate + +# install dependencies +python -m pip install src/leapfrogai_sdk +cd packages/text-embeddings +python -m pip install ".[dev]" + +# download the model +python scripts/model_download.py + +# start the model backend +python -u main.py +``` diff --git a/packages/vllm/README.md b/packages/vllm/README.md index 4392205a0..59d705f48 100644 --- a/packages/vllm/README.md +++ b/packages/vllm/README.md @@ -31,19 +31,30 @@ You can optionally specify different models or quantization types using the foll - `--build-arg QUANTIZATION="gptq"`: Quantization type (e.g., gptq, awq, or empty for un-quantized) - `--build-arg TENSOR_PARALLEL_SIZE="1"`: The number of gpus to spread the tensor processing across -### Run Locally +## Zarf Package Deployment -From this directory: +To build and deploy just the VLLM Zarf package (from the root of the repository): + +> Deploy a [UDS cluster](/README.md#uds) if one isn't deployed already + +```shell +make build-vllm LOCAL_VERSION=dev +uds zarf package deploy packages/vllm/zarf-package-vllm-*-dev.tar.zst --confirm +``` + +## Run Locally + +To run the vllm backend locally (starting from the root directory of the repository): ```bash -# Setup Virtual Environment +# Setup Virtual Environment if you haven't done so already python -m venv .venv source .venv/bin/activate - -python -m pip install ../../src/leapfrogai_sdk -python -m pip install . ``` ```bash +# Install dependencies +python -m pip install src/leapfrogai_sdk +cd packages/vllm # To support Huggingface Hub model downloads python -m pip install ".[dev]" ``` @@ -62,5 +73,5 @@ python src/model_download.py mv .model/*.gguf .model/model.gguf # Start Model Backend -python -m leapfrogai_sdk.cli --app-dir=src/ main:Model +lfai-cli --app-dir=src/ main:Model ``` diff --git a/packages/whisper/README.md b/packages/whisper/README.md index 78e00ecc1..327d5a810 100644 --- a/packages/whisper/README.md +++ b/packages/whisper/README.md @@ -5,5 +5,25 @@ A LeapfrogAI API-compatible [whisper](https://huggingface.co/openai/whisper-base # Usage -:construction_worker: This documentation is still under construction. :construction_worker: +## Zarf Package Deployment +To build and deploy just the whisper Zarf package (from the root of the repository): + +> Deploy a [UDS cluster](/README.md#uds) if one isn't deployed already + +```shell +make build-whisper LOCAL_VERSION=dev +uds zarf package deploy packages/whisper/zarf-package-whisper-*-dev.tar.zst --confirm +``` + +## Local Development + +To run the vllm backend locally without K8s (starting from the root directory of the repository): + +```shell +python -m pip install src/leapfrogai_sdk +cd packages/whisper +python -m pip install ".[dev]" +ct2-transformers-converter --model openai/whisper-base --output_dir .model --copy_files tokenizer.json --quantization float32 +python -u main.py +``` diff --git a/src/leapfrogai_api/README.md b/src/leapfrogai_api/README.md index 82590e635..383f271cb 100644 --- a/src/leapfrogai_api/README.md +++ b/src/leapfrogai_api/README.md @@ -2,6 +2,17 @@ A mostly OpenAI compliant API surface. +## Zarf Package Deployment + +To build and deploy just the API Zarf package (from the root of the repository): + +> Deploy a [UDS cluster](/README.md#uds) if one isn't deployed already + +```shell +make build-api LOCAL_VERSION=dev +uds zarf package deploy packages/api/zarf-package-leapfrogai-api-*-dev.tar.zst --confirm +``` + ## Local Development Setup 1. Install dependencies diff --git a/tests/e2e/README.md b/tests/e2e/README.md index d9e59e324..372b89f42 100644 --- a/tests/e2e/README.md +++ b/tests/e2e/README.md @@ -16,12 +16,10 @@ The tests in this directory are also able to be run locally! We are currently op There are several ways you can setup and run these tests. Here is one such way: -```bash -# Setup the UDS cluster -# NOTE: This stands up a k3d cluster and installs istio & pepr -# NOTE: Be sure to use the latest released version at the time you're reading this! -uds deploy oci://ghcr.io/defenseunicorns/packages/uds/bundles/k3d-core-slim-dev:0.22.2 --confirm +> Deploy the [UDS cluster](/README.md#uds) \ +> NOTE: This stands up a k3d cluster and installs istio & pepr +```bash # Build and Deploy the LFAI API make build-api uds zarf package deploy zarf-package-leapfrogai-api-*.tar.zst diff --git a/uds-bundles/dev/README.md b/uds-bundles/dev/README.md new file mode 100644 index 000000000..f894ca60d --- /dev/null +++ b/uds-bundles/dev/README.md @@ -0,0 +1,73 @@ +# LeapfrogAI UDS Dev Deployment Instructions + +Follow these instructions to create a local development deployment of LeapfrogAI using [UDS](https://github.com/defenseunicorns/uds-core). + +Make sure your system has the [required dependencies](https://docs.leapfrog.ai/docs/local-deploy-guide/quick_start/#prerequisites). + +For ease, it's best to create a virtual environment: + +```shell +python -m venv .venv +source .venv/bin/activate +``` + +Each component is built into its own Zarf package. You can build all of the packages you need at once with the following `Make` targets: + +```shell +make build-cpu # api, llama-cpp-python, text-embeddings, whisper, supabase +make build-gpu # api, vllm, text-embeddings, whisper, supabase +make build-all # all of the backends +``` + +**OR** + +You can build components individually using the following `Make` targets: + +```shell +make build-api +make build-supabase +make build-vllm # if you have GPUs +make build-llama-cpp-python # if you have CPU only +make build-text-embeddings +make build-whisper +``` + +Once the packages are created, you can deploy either a CPU or GPU-enabled deployment via one of the UDS bundles: + +## CPU + +Create the uds CPU bundle: +```shell +cd uds-bundles/dev/cpu +uds create . +``` + +Deploy a [UDS cluster](/README.md#uds) if one isn't deployed already + +Deploy the LeapfrogAI bundle: +```shell +uds deploy uds-bundle-leapfrogai*.tar.zst +``` + +## GPU + +Create the uds GPU bundle: +```shell +cd uds-bundles/dev/gpu +uds create . +``` + +Deploy a [UDS cluster](/README.md#uds) with the following flags, as so: + +```shell +uds deploy {k3d-cluster-name} --set K3D_EXTRA_ARGS="--gpus=all --image=ghcr.io/justinthelaw/k3d-gpu-support:v1.27.4-k3s1-cuda" +``` + +Deploy the LeapfrogAI bundle: +```shell +uds deploy uds-bundle-leapfrogai-*.tar.zst --confirm +``` + +## Checking and Managing the Deployment + +For tips on how to monitor the deployment, accessing the UI, and clean up, please reference the [Quick Start](https://docs.leapfrog.ai/docs/local-deploy-guide/quick_start/#checking-deployment) guide in the LeapfrogAI docs. diff --git a/website/content/en/docs/local deploy guide/quick_start.md b/website/content/en/docs/local deploy guide/quick_start.md index f4c5f6575..3f45809d6 100644 --- a/website/content/en/docs/local deploy guide/quick_start.md +++ b/website/content/en/docs/local deploy guide/quick_start.md @@ -12,6 +12,7 @@ The fastest and easiest way to get started with a deployment of LeapfrogAI is by - [Docker](https://docs.docker.com/engine/install/) - [K3D](https://k3d.io/) +- [Zarf](https://docs.zarf.dev/getting-started/install/) - [UDS CLI](https://github.com/defenseunicorns/uds-cli) GPU considerations (NVIDIA GPUs only):