-
Notifications
You must be signed in to change notification settings - Fork 546
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add cerebrium as deployment option in documentation
- Loading branch information
Showing
5 changed files
with
192 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,118 @@ | ||
# Run Outlines using Cerebrium | ||
|
||
[Cerebrium](https://www.cerebrium.ai/) is a serverless AI infrastructure platform that makes it easier for companies to build and deploy AI based applications. They offer Serverless GPU's with low cold start times with over 12 varieties of GPU chips that auto scale and you only pay for the compute you use. | ||
|
||
In this guide we will show you how you can use Cerebrium to run programs written with Outlines on GPUs in the cloud. | ||
|
||
# Setup Cerebrium | ||
|
||
First, we install Cerebrium and login to get authenticated. | ||
|
||
```bash | ||
pip install cerebrium | ||
cerebrium login | ||
``` | ||
|
||
Then let us create our first project | ||
|
||
```bash | ||
cerebrium init outlines-project | ||
``` | ||
|
||
## Setup Environment and Hardware | ||
|
||
You set up your environment and hardware in the cerebrium.toml file that was created using the init function above. | ||
|
||
```toml | ||
[cerebrium.hardware] | ||
cpu = 2 | ||
memory = 14.0 | ||
gpu = "AMPERE A10" | ||
gpu_count = 1 | ||
provider = "aws" | ||
region = "us-east-1" | ||
|
||
[cerebrium.dependencies.pip] | ||
outline = "==0.0.37" | ||
transformers = "==4.38.2" | ||
datasets = "==2.18.0" | ||
accelerate = "==0.27.2" | ||
``` | ||
|
||
## Setup inference | ||
|
||
Running code in Cerebrium is like writing normal python with no special syntax. In a `main.py` file specify the following: | ||
|
||
```python | ||
import outlines | ||
|
||
|
||
model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.2") | ||
|
||
schema = """{ | ||
"title": "Character", | ||
"type": "object", | ||
"properties": { | ||
"name": { | ||
"title": "Name", | ||
"maxLength": 10, | ||
"type": "string" | ||
}, | ||
"age": { | ||
"title": "Age", | ||
"type": "integer" | ||
}, | ||
"armor": {"$ref": "#/definitions/Armor"}, | ||
"weapon": {"$ref": "#/definitions/Weapon"}, | ||
"strength": { | ||
"title": "Strength", | ||
"type": "integer" | ||
} | ||
}, | ||
"required": ["name", "age", "armor", "weapon", "strength"], | ||
"definitions": { | ||
"Armor": { | ||
"title": "Armor", | ||
"description": "An enumeration.", | ||
"enum": ["leather", "chainmail", "plate"], | ||
"type": "string" | ||
}, | ||
"Weapon": { | ||
"title": "Weapon", | ||
"description": "An enumeration.", | ||
"enum": ["sword", "axe", "mace", "spear", "bow", "crossbow"], | ||
"type": "string" | ||
} | ||
} | ||
}""" | ||
|
||
generator = outlines.generate.json(model, schema) | ||
``` | ||
|
||
On first deploy, it will download the model and store it on disk therefore for subsequent calls it will load the model from disk. | ||
|
||
Every function in Cerebrium is callable through an API endpoint. Code at the top most layer (ie: not in a function) is instantiated only when the container is spun up the first time so for subsequent calls, it will simply run the code defined in the function you call. | ||
|
||
To deploy an API that creates a new character when called with a prompt you can add the following code to `main.py`: | ||
|
||
```python | ||
def generate( | ||
prompt: str = "Amiri, a 53 year old warrior woman with a sword and leather armor.", | ||
): | ||
|
||
character = generator( | ||
f"<s>[INST]Give me a character description. Describe {prompt}.[/INST]" | ||
) | ||
|
||
return character | ||
``` | ||
|
||
|
||
## Run on the cloud | ||
|
||
```bash | ||
cerebrium deploy | ||
``` | ||
|
||
You will see your application deploy, install pip packages and download the model. Once completed it will output a CURL request you can use to call your endpoint. Just remember to end | ||
the url with the function you would like to call - in this case /generate. You should see your response returned! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
[cerebrium.deployment] | ||
name = "cerebrium" | ||
python_version = "3.11" | ||
cuda_version = "12" | ||
include = "[./*, main.py, cerebrium.toml]" | ||
exclude = "[.*]" | ||
shell_commands = [] | ||
|
||
[cerebrium.hardware] | ||
cpu = 2 | ||
memory = 14.0 | ||
gpu = "AMPERE A10" | ||
gpu_count = 1 | ||
provider = "aws" | ||
region = "us-east-1" | ||
|
||
[cerebrium.scaling] | ||
min_replicas = 0 | ||
max_replicas = 5 | ||
cooldown = 60 | ||
|
||
[cerebrium.dependencies.pip] | ||
outline = "==0.0.37" | ||
transformers = "==4.38.2" | ||
datasets = "==2.18.0" | ||
accelerate = "==0.27.2" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
import outlines | ||
|
||
model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.2") | ||
|
||
schema = { | ||
"title": "Character", | ||
"type": "object", | ||
"properties": { | ||
"name": {"title": "Name", "maxLength": 10, "type": "string"}, | ||
"age": {"title": "Age", "type": "integer"}, | ||
"armor": {"$ref": "#/definitions/Armor"}, | ||
"weapon": {"$ref": "#/definitions/Weapon"}, | ||
"strength": {"title": "Strength", "type": "integer"}, | ||
}, | ||
"required": ["name", "age", "armor", "weapon", "strength"], | ||
"definitions": { | ||
"Armor": { | ||
"title": "Armor", | ||
"description": "An enumeration.", | ||
"enum": ["leather", "chainmail", "plate"], | ||
"type": "string", | ||
}, | ||
"Weapon": { | ||
"title": "Weapon", | ||
"description": "An enumeration.", | ||
"enum": ["sword", "axe", "mace", "spear", "bow", "crossbow"], | ||
"type": "string", | ||
}, | ||
}, | ||
} | ||
|
||
generator = outlines.generate.json(model, schema) | ||
|
||
|
||
def generate( | ||
prompt: str = "Amiri, a 53 year old warrior woman with a sword and leather armor.", | ||
): | ||
character = generator( | ||
f"<s>[INST]Give me a character description. Describe {prompt}.[/INST]" | ||
) | ||
|
||
print(character) | ||
return character |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters