From eeefc2ec8670f3fa3c5f06759684c87c79bfe9d3 Mon Sep 17 00:00:00 2001
From: Nikhil Kulkarni <knikhil29@gmail.com>
Date: Thu, 4 Apr 2024 14:13:11 -0700
Subject: [PATCH] [Docs] Update Endpoint Deployment guide to specify advanced
 config options (#1740)

---
 .../deploying-your-endpoint.md                | 32 +++++++++++++++++--
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/serving/docs/lmi/deployment_guide/deploying-your-endpoint.md b/serving/docs/lmi/deployment_guide/deploying-your-endpoint.md
index ec61a460d..b63b43228 100644
--- a/serving/docs/lmi/deployment_guide/deploying-your-endpoint.md
+++ b/serving/docs/lmi/deployment_guide/deploying-your-endpoint.md
@@ -9,7 +9,7 @@ You will need the following to deploy your model with LMI on SageMaker:
 * Container URI
 * Configuration File or Environment Variables
 
-## Configuration - serving.properties
+## Option 1: Configuration - serving.properties
 
 If your configuration is specified in a `serving.properties` file, we will need to upload this file to S3.
 If your model artifacts are also stored in S3, then you can either upload the `serving.properties` file under the same object prefix, or in a separate location.
@@ -91,7 +91,7 @@ outputs = predictor.predict({
 })
 ```
 
-## Configuration - environment variables
+## Option 2: Configuration - environment variables
 
 If you are using environment variables for configuration, you will need to pass those configurations to the SageMaker Model object.
 
@@ -149,6 +149,32 @@ outputs = predictor.predict({
 Depending on which backend you are deploying with, you will have access to different generation parameters.
 To learn more about the API schema (Request/Response structure), please see [this document](../user_guides/lmi_input_output_schema.md).
 
+# Deploying your model on a SageMaker Endpoint with boto3
+
+As an alternate to using the SageMaker python sdk, you could also use a sagemaker client from boto3. Currently, this provides with some additional options to configure your SageMaker endpoint.
+
+## Using ProductionVariants to customize the endpoint
+The following options may be added to the `ProductionVariants` field to support LLM deployment:
+
+1. VolumeSizeInGB: option to attach a larger EBS volume if necessary.
+2. ModelDataDownloadTimeoutInSeconds: some models, due to their large size, take longer to be downloaded from S3 and untarred. With this option, you can increase the timeout for such downloads.
+3. ContainerStartupHealthCheckTimeoutInSeconds: Since LLMs may take longer to be loaded and ready to serve, the default /ping timeout of 5 minutes may not be sufficient. With this option, you can increase the timeout for the health check.
+
+Follow this link for a step-by-step guide to create an endpoint with the above options: https://docs.aws.amazon.com/sagemaker/latest/dg/large-model-inference-hosting.html
+
+## Using ModelDataSource to deploy uncompressed model artifacts
+The following options may be added to the `ModelDataSource` field to support uncompressed artifacts from S3:
+```
+"ModelDataSource": {
+            "S3DataSource": {
+                "S3Uri": "s3://my-bucket/prefix/to/model/data/", 
+                "S3DataType": "S3Prefix",
+                "CompressionType": "None",
+            },
+```
+
+Follow this link for a detailed overview of this option: https://docs.aws.amazon.com/sagemaker/latest/dg/large-model-inference-uncompressed.html
+
 Next: [Benchmark your endpoint](benchmarking-your-endpoint.md)
 
-Previous: [Container Configurations](configurations.md)
+Previous: [Container Configurations](configurations.md)
\ No newline at end of file