Skip to content

Commit

Permalink
[Docs] Update Endpoint Deployment guide to specify advanced config op…
Browse files Browse the repository at this point in the history
…tions (#1740)
  • Loading branch information
nikhil-sk authored Apr 4, 2024
1 parent 40369da commit eeefc2e
Showing 1 changed file with 29 additions and 3 deletions.
32 changes: 29 additions & 3 deletions serving/docs/lmi/deployment_guide/deploying-your-endpoint.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ You will need the following to deploy your model with LMI on SageMaker:
* Container URI
* Configuration File or Environment Variables

## Configuration - serving.properties
## Option 1: Configuration - serving.properties

If your configuration is specified in a `serving.properties` file, we will need to upload this file to S3.
If your model artifacts are also stored in S3, then you can either upload the `serving.properties` file under the same object prefix, or in a separate location.
Expand Down Expand Up @@ -91,7 +91,7 @@ outputs = predictor.predict({
})
```

## Configuration - environment variables
## Option 2: Configuration - environment variables

If you are using environment variables for configuration, you will need to pass those configurations to the SageMaker Model object.

Expand Down Expand Up @@ -149,6 +149,32 @@ outputs = predictor.predict({
Depending on which backend you are deploying with, you will have access to different generation parameters.
To learn more about the API schema (Request/Response structure), please see [this document](../user_guides/lmi_input_output_schema.md).

# Deploying your model on a SageMaker Endpoint with boto3

As an alternate to using the SageMaker python sdk, you could also use a sagemaker client from boto3. Currently, this provides with some additional options to configure your SageMaker endpoint.

## Using ProductionVariants to customize the endpoint
The following options may be added to the `ProductionVariants` field to support LLM deployment:

1. VolumeSizeInGB: option to attach a larger EBS volume if necessary.
2. ModelDataDownloadTimeoutInSeconds: some models, due to their large size, take longer to be downloaded from S3 and untarred. With this option, you can increase the timeout for such downloads.
3. ContainerStartupHealthCheckTimeoutInSeconds: Since LLMs may take longer to be loaded and ready to serve, the default /ping timeout of 5 minutes may not be sufficient. With this option, you can increase the timeout for the health check.

Follow this link for a step-by-step guide to create an endpoint with the above options: https://docs.aws.amazon.com/sagemaker/latest/dg/large-model-inference-hosting.html

## Using ModelDataSource to deploy uncompressed model artifacts
The following options may be added to the `ModelDataSource` field to support uncompressed artifacts from S3:
```
"ModelDataSource": {
"S3DataSource": {
"S3Uri": "s3://my-bucket/prefix/to/model/data/",
"S3DataType": "S3Prefix",
"CompressionType": "None",
},
```

Follow this link for a detailed overview of this option: https://docs.aws.amazon.com/sagemaker/latest/dg/large-model-inference-uncompressed.html

Next: [Benchmark your endpoint](benchmarking-your-endpoint.md)

Previous: [Container Configurations](configurations.md)
Previous: [Container Configurations](configurations.md)

0 comments on commit eeefc2e

Please sign in to comment.