Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding new guides for developing and deploying an ML application in S… #36700

Merged
merged 1 commit into from
Jun 29, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/source/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -256,11 +256,13 @@ parts:
sections:
- file: serve/getting_started
- file: serve/key-concepts
- file: serve/develop-and-deploy
- file: serve/model_composition
- file: serve/deploy-many-models/index
sections:
- file: serve/deploy-many-models/multi-app
- file: serve/deploy-many-models/model-multiplexing
- file: serve/configure-serve-deployment
- file: serve/http-guide
- file: serve/production-guide/index
title: Production Guide
Expand Down
110 changes: 8 additions & 102 deletions doc/source/serve/advanced-guides/deploy-vm.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,27 +36,7 @@ The message `Sent deploy request successfully!` means:
* It will start a new Serve application if one hasn't already started.
* The Serve application will deploy the deployments from your deployment graph, updated with the configurations from your config file.

It does **not** mean that your Serve application, including your deployments, has already started running successfully. This happens asynchronously as the Ray cluster attempts to update itself to match the settings from your config file. Check out the [next section](serve-in-production-inspecting) to learn more about how to get the current status.

## Adding a runtime environment

The import path (e.g., `fruit:deployment_graph`) must be importable by Serve at runtime.
When running locally, this might be in your current working directory.
However, when running on a cluster you also need to make sure the path is importable.
You can achieve this either by building the code into the cluster's container image (see [Cluster Configuration](kuberay-config) for more details) or by using a `runtime_env` with a [remote URI](remote-uris) that hosts the code in remote storage.

As an example, we have [pushed a copy of the FruitStand deployment graph to GitHub](https://github.com/ray-project/test_dag/blob/40d61c141b9c37853a7014b8659fc7f23c1d04f6/fruit.py). You can use this config file to deploy the `FruitStand` deployment graph to your own Ray cluster even if you don't have the code locally:

```yaml
import_path: fruit:deployment_graph

runtime_env:
working_dir: "https://github.com/ray-project/serve_config_examples/archive/HEAD.zip"
```

:::{note}
As a side note, you could also package your deployment graph into a standalone Python package that can be imported using a [PYTHONPATH](https://docs.python.org/3.10/using/cmdline.html#envvar-PYTHONPATH) to provide location independence on your local machine. However, it's still best practice to use a `runtime_env`, to ensure consistency across all machines in your cluster.
:::
It does **not** mean that your Serve application, including your deployments, has already started running successfully. This happens asynchronously as the Ray cluster attempts to update itself to match the settings from your config file. See [Inspect an application](serve-in-production-inspecting) for how to get the current status.

(serve-in-production-remote-cluster)=

Expand All @@ -74,7 +54,11 @@ As an example, the address for the local cluster started by `ray start --head` i
$ serve deploy config_file.yaml -a http://127.0.0.1:52365
```

The Ray dashboard agent's default port is 52365. You can set it to a different value using the `--dashboard-agent-listen-port` argument when running `ray start`."
The Ray Dashboard agent's default port is 52365. To set it to a different value, use the `--dashboard-agent-listen-port` argument when running `ray start`.

:::{note}
When running on a remote cluster, you need to ensure that the import path is accessible. See [Handle Dependencies](serve-handling-dependencies) for how to add a runtime environment.
:::

:::{note}
If the port 52365 (or whichever port you specify with `--dashboard-agent-listen-port`) is unavailable when Ray starts, the dashboard agent’s HTTP server will fail. However, the dashboard agent and Ray will continue to run.
Expand Down Expand Up @@ -107,84 +91,6 @@ $ unset RAY_AGENT_ADDRESS
Check for this variable in your environment to make sure you're using your desired Ray agent address.
:::

(serve-in-production-inspecting)=

## Inspecting the application with `serve config` and `serve status`

The Serve CLI also offers two commands to help you inspect your Serve application in production: `serve config` and `serve status`.
If you're working with a remote cluster, `serve config` and `serve status` also offer an `--address/-a` argument to access your cluster. Check out [the previous section](serve-in-production-remote-cluster) for more info on this argument.

`serve config` gets the latest config file the Ray cluster received. This config file represents the Serve application's goal state. The Ray cluster will constantly attempt to reach and maintain this state by deploying deployments, recovering failed replicas, and more.

Using the `fruit_config.yaml` example from [an earlier section](fruit-config-yaml):

```console
$ ray start --head
$ serve deploy fruit_config.yaml
...

$ serve config
import_path: fruit:deployment_graph

runtime_env: {}

deployments:

- name: MangoStand
num_replicas: 2
route_prefix: null
...
```

`serve status` gets your Serve application's current status. It's divided into two parts: the `app_status` and the `deployment_statuses`.

The `app_status` contains three fields:
* `status`: a Serve application has four possible statuses:
* `"NOT_STARTED"`: no application has been deployed on this cluster.
* `"DEPLOYING"`: the application is currently carrying out a `serve deploy` request. It is deploying new deployments or updating existing ones.
* `"RUNNING"`: the application is at steady-state. It has finished executing any previous `serve deploy` requests, and it is attempting to maintain the goal state set by the latest `serve deploy` request.
* `"DEPLOY_FAILED"`: the latest `serve deploy` request has failed.
* `message`: provides context on the current status.
* `deployment_timestamp`: a unix timestamp of when Serve received the last `serve deploy` request. This is calculated using the `ServeController`'s local clock.

The `deployment_statuses` contains a list of dictionaries representing each deployment's status. Each dictionary has three fields:
* `name`: the deployment's name.
* `status`: a Serve deployment has three possible statuses:
* `"UPDATING"`: the deployment is updating to meet the goal state set by a previous `deploy` request.
* `"HEALTHY"`: the deployment is at the latest requests goal state.
* `"UNHEALTHY"`: the deployment has either failed to update, or it has updated and has become unhealthy afterwards. This may be due to an error in the deployment's constructor, a crashed replica, or a general system or machine error.
* `message`: provides context on the current status.

You can use the `serve status` command to inspect your deployments after they are deployed and throughout their lifetime.

Using the `fruit_config.yaml` example from [an earlier section](fruit-config-yaml):

```console
$ ray start --head
$ serve deploy fruit_config.yaml
...

$ serve status
app_status:
status: RUNNING
message: ''
deployment_timestamp: 1655771534.835145
deployment_statuses:
- name: MangoStand
status: HEALTHY
message: ''
- name: OrangeStand
status: HEALTHY
message: ''
- name: PearStand
status: HEALTHY
message: ''
- name: FruitMarket
status: HEALTHY
message: ''
- name: DAGDriver
status: HEALTHY
message: ''
```
To inspect the status of the Serve application in production, see [Inspect an application](serve-in-production-inspecting).

`serve status` can also be used with KubeRay ({ref}`kuberay-index`), a Kubernetes operator for Ray Serve, to help deploy your Serve applications with Kubernetes. There's also work in progress to provide closer integrations between some of the features from this document, like `serve status`, with Kubernetes to provide a clearer Serve deployment story.
Make heavyweight code updates (like `runtime_env` changes) by starting a new Ray Cluster, updating your Serve config file, and deploying the file with `serve deploy` to the new cluster. Once the new deployment is finished, switch your traffic to the new cluster.
2 changes: 1 addition & 1 deletion doc/source/serve/advanced-guides/dyn-req-batch.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ end-before: __batch_params_update_end__
---
```

Use these methods in the `reconfigure` [method](serve-in-production-reconfigure) to control the `@serve.batch` parameters through your Serve configuration file.
Use these methods in the `reconfigure` [method](serve-user-config) to control the `@serve.batch` parameters through your Serve configuration file.
:::

## Streaming batched requests
Expand Down
4 changes: 3 additions & 1 deletion doc/source/serve/advanced-guides/inplace-updates.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,10 @@ Lightweight config updates modify running deployment replicas without tearing th
Lightweight config updates are only possible for deployments that are included as entries under `deployments` in the config file. If a deployment is not included in the config file, replicas of that deployment will be torn down and brought up again each time you redeploy with `serve deploy`.
:::

(serve-updating-user-config)=

## Updating User Config
Let's use the `FruitStand` deployment graph [from an earlier section](fruit-config-yaml) as an example. All the individual fruit deployments contain a `reconfigure()` method. This method allows us to issue lightweight updates to our deployments by updating the `user_config`.
Let's use the `FruitStand` deployment graph [from the production guide](fruit-config-yaml) as an example. All the individual fruit deployments contain a `reconfigure()` method. This method allows us to issue lightweight updates to our deployments by updating the `user_config`.

First let's deploy the graph. Make sure to stop any previous Ray cluster using the CLI command `ray stop` for this example:

Expand Down
139 changes: 139 additions & 0 deletions doc/source/serve/configure-serve-deployment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
(serve-configure-deployment)=

# Configure Ray Serve deployments

These parameters are configurable on a Ray Serve deployment. Documentation is also in the [API reference](../serve/api/doc/ray.serve.deployment_decorator.rst).

Configure the following parameters either in the Serve config file, or on the `@serve.deployment` decorator:

- `name` - Name uniquely identifying this deployment within the application. If not provided, the name of the class or function is used.
- `num_replicas` - Number of replicas to run that handle requests to this deployment. Defaults to 1.
- `route_prefix` - Requests to paths under this HTTP path prefix are routed to this deployment. Defaults to ‘/{name}’. This can only be set for the ingress (top-level) deployment of an application.
akshay-anyscale marked this conversation as resolved.
Show resolved Hide resolved
- `ray_actor_options` - Options to pass to the Ray Actor decorator, such as resource requirements. Valid options are: `accelerator_type`, `memory`, `num_cpus`, `num_gpus`, `object_store_memory`, `resources`, and `runtime_env` For more details - [Resource management in Serve](serve-cpus-gpus)
- `max_concurrent_queries` - Maximum number of queries that are sent to a replica of this deployment without receiving a response. Defaults to 100. This may be an important parameter to configure for [performance tuning](serve-perf-tuning).
- `autoscaling_config` - Parameters to configure autoscaling behavior. If this is set, num_replicas cannot be set. For more details on configurable parameters for autoscaling - [Ray Serve Autoscaling](ray-serve-autoscaling).
- `user_config` - Config to pass to the reconfigure method of the deployment. This can be updated dynamically without restarting the replicas of the deployment. The user_config must be fully JSON-serializable. For more details - [Serve User Config](serve-user-config).
- `health_check_period_s` - Duration between health check calls for the replica. Defaults to 10s. The health check is by default a no-op Actor call to the replica, but you can define your own health check using the "check_health" method in your deployment that raises an exception when unhealthy.
- `health_check_timeout_s` - Duration in seconds, that replicas wait for a health check method to return before considering it as failed. Defaults to 30s.
- `graceful_shutdown_wait_loop_s` - Duration that replicas wait until there is no more work to be done before shutting down. Defaults to 2s.
- `graceful_shutdown_timeout_s` - Duration to wait for a replica to gracefully shut down before being forcefully killed. Defaults to 20s.
- `is_driver_deployment` - [EXPERIMENTAL] when set, exactly one replica of this deployment runs on every node (like a daemon set).
akshay-anyscale marked this conversation as resolved.
Show resolved Hide resolved

There are 3 ways of specifying parameters:

- In the `@serve.deployment` decorator -

```{literalinclude} ../serve/doc_code/configure_serve_deployment/model_deployment.py
:start-after: __deployment_start__
:end-before: __deployment_end__
:language: python
```

- Through `options()` -

```{literalinclude} ../serve/doc_code/configure_serve_deployment/model_deployment.py
:start-after: __deployment_end__
:end-before: __options_end__
:language: python
```

- Using the YAML [Serve Config file](serve-in-production-config-file) -

```yaml
applications:

- name: app1

route_prefix: /

import_path: configure_serve:translator_app

runtime_env: {}

deployments:

- name: Translator
num_replicas: 2
max_concurrent_queries: 100
graceful_shutdown_wait_loop_s: 2.0
graceful_shutdown_timeout_s: 20.0
health_check_period_s: 10.0
health_check_timeout_s: 30.0
ray_actor_options:
num_cpus: 0.2
num_gpus: 0.0
```

## Overriding deployment settings

The order of priority is (from highest to lowest):

1. Serve Config file
2. `.options()` call in python code referenced above
3. `@serve.deployment` decorator in python code
4. Serve defaults

For example, if a deployment's `num_replicas` is specified in the config file and their graph code, Serve will use the config file's value. If it's only specified in the code, Serve will use the code value. If the user doesn't specify it anywhere, Serve will use a default (which is `num_replicas=1`).

Keep in mind that this override order is applied separately to each individual parameter.
For example, if a user has a deployment `ExampleDeployment` with the following decorator:

```python
@serve.deployment(
num_replicas=2,
max_concurrent_queries=15,
)
class ExampleDeployment:
...
```

and the following config file:

```yaml
...

deployments:

- name: ExampleDeployment
num_replicas: 5

...
```

Serve sets `num_replicas=5`, using the config file value, and `max_concurrent_queries=15`, using the code value (because `max_concurrent_queries` wasn't specified in the config file). All other deployment settings use Serve defaults because the user didn't specify them in the code or the config.

:::{tip}
Remember that `ray_actor_options` counts as a single setting. The entire `ray_actor_options` dictionary in the config file overrides the entire `ray_actor_options` dictionary from the graph code. If there are individual options within `ray_actor_options` (e.g. `runtime_env`, `num_gpus`, `memory`) that are set in the code but not in the config, Serve still won't use the code settings if the config has a `ray_actor_options` dictionary. It treats these missing options as though the user never set them and uses defaults instead. This dictionary overriding behavior also applies to `user_config` and `autoscaling_config`.
:::

(serve-user-config)=
## Dynamically changing parameters without restarting your replicas (`user_config`)

You can use the `user_config` field to supply structured configuration for your deployment. You can pass arbitrary JSON serializable objects to the YAML configuration. Serve then applies it to all running and future deployment replicas. The application of user configuration *does not* restart the replica. This means you can use this field to dynamically:
- adjust model weights and versions without restarting the cluster.
- adjust traffic splitting percentage for your model composition graph.
- configure any feature flag, A/B tests, and hyper-parameters for your deployments.

To enable the `user_config` feature, you need to implement a `reconfigure` method that takes a JSON-serializable object (e.g., a Dictionary, List or String) as its only argument:

```python
@serve.deployment
class Model:
def reconfigure(self, config: Dict[str, Any]):
self.threshold = config["threshold"]
```

If the `user_config` is set when the deployment is created (e.g., in the decorator or the Serve config file), this `reconfigure` method is called right after the deployment's `__init__` method, and the `user_config` is passed in as an argument. You can also trigger the `reconfigure` method by updating your Serve config file with a new `user_config` and reapplying it to your Ray cluster. See [In-place Updates](serve-inplace-updates) for more information.

The corresponding YAML snippet is:

```yaml
...
deployments:
- name: Model
user_config:
threshold: 1.5
```



Loading