Skip to content

Commit

Permalink
Integrate webapp into the pipeline
Browse files Browse the repository at this point in the history
  • Loading branch information
ananth-sankar94 committed Apr 1, 2019
1 parent 006adf5 commit 847f76e
Show file tree
Hide file tree
Showing 252 changed files with 3,162 additions and 1,775 deletions.
38 changes: 13 additions & 25 deletions samples/nvidia-resnet/LICENSE
Original file line number Diff line number Diff line change
@@ -1,25 +1,13 @@
Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of NVIDIA CORPORATION nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
52 changes: 27 additions & 25 deletions samples/nvidia-resnet/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
# A simple NVIDIA-accelerated ResNet Kubeflow pipeline
### This example demonstrates a simple end-to-end training & deployment of a Keras Resnet model on the CIFAR10 dataset utilizing the following NVIDIA technologies:
# A simple GPU-accelerated ResNet Kubeflow pipeline
## Overview
This example demonstrates a simple end-to-end training & deployment of a Keras Resnet model on the CIFAR10 dataset utilizing the following technologies:
* [NVIDIA-Docker2](https://github.com/NVIDIA/nvidia-docker) to make the Docker containers GPU aware.
* [NVIDIA device plugin](https://github.com/NVIDIA/k8s-device-plugin) to allow Kubernetes to access GPU nodes.
* [TensorFlow-19.02](https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow) containers from NVIDIA GPU Cloud container registry.
* [TensorFlow-19.03](https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow) containers from NVIDIA GPU Cloud container registry.
* [TensorRT](https://docs.nvidia.com/deeplearning/dgx/integrate-tf-trt/index.html) for optimizing the Inference Graph in FP16 for leveraging the dedicated use of Tensor Cores for Inference.
* [TensorRT Inference Server](https://github.com/NVIDIA/tensorrt-inference-server) for serving the trained model.

Expand All @@ -11,28 +12,29 @@
* NVIDIA GPU

## Quickstart
* Install NVIDIA Docker, Kubernetes and Kubeflow on your local machine:
* Install NVIDIA Docker, Kubernetes and Kubeflow on your local machine (on your first run):
* `sudo ./install_kubeflow_and_dependencies.sh`
* Mount persistent volume to Kubeflow:
* `sudo ./mount_persistent_volume.sh`
* Build the Preprocessing, Training, Serving, and Pipeline containers using the following script:
* First, modify `build.sh` in `preprocess`, `train`, and `serve` directories to point to a container registry that is accessible to you
* Build the Docker image of each pipeline component and compile the Kubeflow pipeline:
* First, make sure `IMAGE` variable in `build.sh` in each component dir under `components` dir points to a public container registry
* Then, make sure the `image` used in each `ContainerOp` in `pipeline/src/pipeline.py` matches `IMAGE` in the step above
* Then, make sure the `image` of the webapp Deployment in `components/webapp_launcher/src/webapp-service-template.yaml` matches `IMAGE` in `components/webapp/build.sh`
* Then, `sudo ./build_pipeline.sh`
* Note the `pipeline.py.tar.gz` file that appears on your working directory
* Determine the ambassador port using this command:
* Note the `pipeline.py.tar.gz` file that appears in your working directory
* Determine the ambassador port:
* `sudo kubectl get svc -n kubeflow ambassador`
* Open the Kubeflow Dashboard on:
* https://local-machine-ip-address:port-determined-from-previous-step
* E.g. https://10.110.210.99:31342
* Click on the tab Pipeline Dashboard, upload the `pipeline.py.tar.gz` file under you working directory and create a run
* Once the training has completed (should take about 20 minutes for 50 epochs) and the model is being served, port forward the port of the serving pod (8000) to the local system:
* Determine the name of the serving pod by selecting it on the Kubeflow Dashboard
* Modify accordingly the variable `SERVING_POD` within `portforward_serving_port.sh`
* Then, `sudo ./portforward_serving_port.sh`
* Build the client container and start a local server for the demo web UI on the host machine (about 15 mins):
* `sudo ./test_trtis_client.sh`
* Now you have successfully set up the client through which you can ping the server with an image URL and obtain predictions:
* Open the demo client UI on a web browser with the following IP address:
https://local-machine-ip-address:8050
* The port is specified in `demo_client_ui.py` and can be changed as needed
* Copy an image URL (for one of the 10 CIFAR classes) and paste it in the UI
* Open the Kubeflow UI on:
* https://[local-machine-ip-address]:[ambassador-port]/
* E.g. https://10.110.210.99:31342/
* Click on Pipeline Dashboard tab, upload the `pipeline.py.tar.gz` file you just compile and create a run
* Training takes about 20 minutes for 50 epochs and a web UI is deployed as part of the pipeline so user can interact with the served model
* Access the client web UI:
* https://[local-machine-ip-address]:[kubeflow-ambassador-port]/[webapp-prefix]/
* E.g. https://10.110.210.99:31342/webapp/
* Now you can test the trained model with random images and obtain class prediction and probability distribution

## Cleanup
Following are optional scripts to cleanup your cluster (useful for debugging)
* Delete deployments & services from previous runs:
* `sudo ./clean_utils/delete_all_previous_resources.sh`
* Uninstall Minikube and Kubeflow:
* `sudo ./clean_utils/remove_minikube_and_kubeflow.sh`
53 changes: 21 additions & 32 deletions samples/nvidia-resnet/build_pipeline.sh
Original file line number Diff line number Diff line change
@@ -1,39 +1,28 @@
#!/bin/bash
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of NVIDIA CORPORATION nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

WORK_DIR=$(pwd)
base_dir=$(pwd)
components_dir=$base_dir/components

# Build and push images of kubeflow pipeline components
cd $WORK_DIR/preprocess && ./build.sh && \
cd $WORK_DIR/train && ./build.sh && \
cd $WORK_DIR/serve && ./build.sh && \
# Build and push images of Kubeflow Pipelines components
for component in $components_dir/*/; do
cd $component && ./build.sh
done

# Compile kubeflow pipeline tar file
cd $WORK_DIR/pipeline && ./build.sh


cd $base_dir/pipeline && ./build.sh
(mv -f src/*.tar.gz $base_dir && \
echo "Pipeline compiled sucessfully!") || \
echo "Pipeline compilation failed!"
33 changes: 33 additions & 0 deletions samples/nvidia-resnet/clean_utils/delete_all_previous_resources.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

TRTIS_NAME=trtis
WEBAPP_NAME=webapp
PIPELINE_NAME=resnet-cifar10-pipeline
KF_NAMESPACE=kubeflow

kubectl delete service/$TRTIS_NAME -n $KF_NAMESPACE
kubectl delete deployment.apps/$TRTIS_NAME -n $KF_NAMESPACE

for service in $( kubectl get svc -n $KF_NAMESPACE | grep $WEBAPP_NAME | cut -d' ' -f1 ); do
kubectl delete -n $KF_NAMESPACE service/$service
done

for deployment in $( kubectl get deployment -n $KF_NAMESPACE | grep $WEBAPP_NAME | cut -d' ' -f1 ); do
kubectl delete -n $KF_NAMESPACE deployment.apps/$deployment
done

for pod in $(kubectl get pod -n kubeflow | grep $PIPELINE_NAME | cut -d' ' -f1); do
kubectl delete -n $KF_NAMESPACE pod/$pod
done
22 changes: 22 additions & 0 deletions samples/nvidia-resnet/clean_utils/remove_minikube_and_kubeflow.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#!/bin/bash
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Remove KubeFlow
cd ${KUBEFLOW_SRC}/${KFAPP}
${KUBEFLOW_SRC}/scripts/kfctl.sh delete k8s

# Remove Minikube
minikube stop
minikube delete
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

FROM ubuntu:16.04

RUN apt-get update -y && \
apt-get install --no-install-recommends -y -q ca-certificates curl python-dev python-setuptools wget unzip
RUN easy_install pip && \
pip install pyyaml six requests

# Install kubectl
RUN curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl
RUN chmod +x ./kubectl
RUN mv ./kubectl /usr/local/bin

ADD src /workspace
WORKDIR /workspace

ENTRYPOINT ["python", "deploy_trtis.py"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#!/bin/bash
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

IMAGE=<inference-server-launcher-image>

docker build -t $IMAGE .
docker push $IMAGE
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import argparse
import os
import logging
import subprocess
import requests


KUBEFLOW_NAMESPACE = 'kubeflow'
YAML_TEMPLATE = 'trtis-service-template.yaml'
YAML_FILE = 'trtis-service.yaml'


def main():
parser = argparse.ArgumentParser(description='Inference server launcher')
parser.add_argument('--trtserver_name', help='Name of trtis service')
parser.add_argument('--model_path', help='...')

args = parser.parse_args()

logging.getLogger().setLevel(logging.INFO)
logging.info('Generating TRTIS service template')

template_file = os.path.join(os.path.dirname(
os.path.realpath(__file__)), YAML_TEMPLATE)
target_file = os.path.join(os.path.dirname(
os.path.realpath(__file__)), YAML_FILE)

with open(template_file, 'r') as template:
with open(target_file, "w") as target:
data = template.read()
changed = data.replace('TRTSERVER_NAME', args.trtserver_name)
changed1 = changed.replace(
'KUBEFLOW_NAMESPACE', KUBEFLOW_NAMESPACE)
changed2 = changed1.replace('MODEL_PATH', args.model_path)
target.write(changed2)

logging.info('Deploying TRTIS service')
subprocess.call(['kubectl', 'apply', '-f', YAML_FILE])

with open('/output.txt', 'w') as f:
f.write(args.trtserver_name)


if __name__ == "__main__":
main()
Loading

0 comments on commit 847f76e

Please sign in to comment.