Integrate webapp into the pipeline

kubeflow · Apr 1, 2019 · 847f76e · 847f76e
1 parent 006adf5
commit 847f76e
Show file tree

Hide file tree

Showing 252 changed files with 3,162 additions and 1,775 deletions.
diff --git a/samples/nvidia-resnet/LICENSE b/samples/nvidia-resnet/LICENSE
@@ -1,25 +1,13 @@
-Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
-
-Redistribution and use in source and binary forms, with or without
-modification, are permitted provided that the following conditions
-are met:
- * Redistributions of source code must retain the above copyright
-   notice, this list of conditions and the following disclaimer.
- * Redistributions in binary form must reproduce the above copyright
-   notice, this list of conditions and the following disclaimer in the
-   documentation and/or other materials provided with the distribution.
- * Neither the name of NVIDIA CORPORATION nor the names of its
-   contributors may be used to endorse or promote products derived
-   from this software without specific prior written permission.
-
-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
-EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
-CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
-EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
-PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
-PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
-OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
-(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
-OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+# Copyright (c) 2019, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
diff --git a/samples/nvidia-resnet/README.md b/samples/nvidia-resnet/README.md
@@ -1,8 +1,9 @@
-# A simple NVIDIA-accelerated ResNet Kubeflow pipeline
-### This example demonstrates a simple end-to-end training & deployment of a Keras Resnet model on the CIFAR10 dataset utilizing the following NVIDIA technologies:
+# A simple GPU-accelerated ResNet Kubeflow pipeline
+## Overview
+This example demonstrates a simple end-to-end training & deployment of a Keras Resnet model on the CIFAR10 dataset utilizing the following technologies:
 * [NVIDIA-Docker2](https://github.com/NVIDIA/nvidia-docker) to make the Docker containers GPU aware.
 * [NVIDIA device plugin](https://github.com/NVIDIA/k8s-device-plugin) to allow Kubernetes to access GPU nodes.
-* [TensorFlow-19.02](https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow) containers from NVIDIA GPU Cloud container registry.
+* [TensorFlow-19.03](https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow) containers from NVIDIA GPU Cloud container registry.
 * [TensorRT](https://docs.nvidia.com/deeplearning/dgx/integrate-tf-trt/index.html) for optimizing the Inference Graph in FP16 for leveraging the dedicated use of Tensor Cores for Inference.
 * [TensorRT Inference Server](https://github.com/NVIDIA/tensorrt-inference-server) for serving the trained model.
 
@@ -11,28 +12,29 @@
 * NVIDIA GPU
 
 ## Quickstart
-* Install NVIDIA Docker, Kubernetes and Kubeflow on your local machine:
+* Install NVIDIA Docker, Kubernetes and Kubeflow on your local machine (on your first run):
     * `sudo ./install_kubeflow_and_dependencies.sh`
-* Mount persistent volume to Kubeflow:
-    * `sudo ./mount_persistent_volume.sh`
-* Build the Preprocessing, Training, Serving, and Pipeline containers using the following script:
-    * First, modify `build.sh` in `preprocess`, `train`, and `serve` directories to point to a container registry that is accessible to you
+* Build the Docker image of each pipeline component and compile the Kubeflow pipeline:
+    * First, make sure `IMAGE` variable in `build.sh` in each component dir under `components` dir points to a  public container registry
+    * Then, make sure the `image` used in each `ContainerOp` in `pipeline/src/pipeline.py` matches `IMAGE` in the step above
+    * Then, make sure the `image` of the webapp Deployment in `components/webapp_launcher/src/webapp-service-template.yaml` matches `IMAGE` in `components/webapp/build.sh`
     * Then, `sudo ./build_pipeline.sh`
-    * Note the `pipeline.py.tar.gz` file that appears on your working directory
-* Determine the ambassador port using this command:
+    * Note the `pipeline.py.tar.gz` file that appears in your working directory
+* Determine the ambassador port:
     * `sudo kubectl get svc -n kubeflow ambassador`
-* Open the Kubeflow Dashboard on:
-    * https://local-machine-ip-address:port-determined-from-previous-step
-    * E.g. https://10.110.210.99:31342
-* Click on the tab Pipeline Dashboard, upload the `pipeline.py.tar.gz` file under you working directory and create a run
-* Once the training has completed (should take about 20 minutes for 50 epochs) and the model is being served, port forward the port of the serving pod (8000) to the local system:
-    * Determine the name of the serving pod by selecting it on the Kubeflow Dashboard
-    * Modify accordingly the variable `SERVING_POD` within `portforward_serving_port.sh`
-    * Then, `sudo ./portforward_serving_port.sh`
-* Build the client container and start a local server for the demo web UI on the host machine (about 15 mins):
-    * `sudo ./test_trtis_client.sh`
-* Now you have successfully set up the client through which you can ping the server with an image URL and obtain predictions:
-    * Open the demo client UI on a web browser with the following IP address:
-https://local-machine-ip-address:8050 
-    * The port is specified in `demo_client_ui.py` and can be changed as needed
-    * Copy an image URL (for one of the 10 CIFAR classes) and paste it in the UI
+* Open the Kubeflow UI on:
+    * https://[local-machine-ip-address]:[ambassador-port]/
+    * E.g. https://10.110.210.99:31342/
+* Click on Pipeline Dashboard tab, upload the `pipeline.py.tar.gz` file you just compile and create a run
+* Training takes about 20 minutes for 50 epochs and a web UI is deployed as part of the pipeline so user can interact with the served model
+* Access the client web UI:
+    * https://[local-machine-ip-address]:[kubeflow-ambassador-port]/[webapp-prefix]/
+    * E.g. https://10.110.210.99:31342/webapp/
+* Now you can test the trained model with random images and obtain class prediction and probability distribution
+
+## Cleanup
+Following are optional scripts to cleanup your cluster (useful for debugging) 
+* Delete deployments & services from previous runs:
+    * `sudo ./clean_utils/delete_all_previous_resources.sh`
+* Uninstall Minikube and Kubeflow:
+    * `sudo ./clean_utils/remove_minikube_and_kubeflow.sh`
diff --git a/samples/nvidia-resnet/build_pipeline.sh b/samples/nvidia-resnet/build_pipeline.sh
@@ -1,39 +1,28 @@
 #!/bin/bash
-# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+# Copyright (c) 2019, NVIDIA CORPORATION.  All rights reserved.
 #
-# Redistribution and use in source and binary forms, with or without
-# modification, are permitted provided that the following conditions
-# are met:
-#  * Redistributions of source code must retain the above copyright
-#    notice, this list of conditions and the following disclaimer.
-#  * Redistributions in binary form must reproduce the above copyright
-#    notice, this list of conditions and the following disclaimer in the
-#    documentation and/or other materials provided with the distribution.
-#  * Neither the name of NVIDIA CORPORATION nor the names of its
-#    contributors may be used to endorse or promote products derived
-#    from this software without specific prior written permission.
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
 #
-# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
-# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-# PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
-# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
-# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
-# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
-# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
-# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
-# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
-# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
 
-WORK_DIR=$(pwd)
+base_dir=$(pwd)
+components_dir=$base_dir/components
 
-# Build and push images of kubeflow pipeline components
-cd $WORK_DIR/preprocess && ./build.sh && \
-cd $WORK_DIR/train && ./build.sh && \
-cd $WORK_DIR/serve && ./build.sh && \
+# Build and push images of Kubeflow Pipelines components
+for component in $components_dir/*/; do
+    cd $component && ./build.sh
+done
 
 # Compile kubeflow pipeline tar file 
-cd $WORK_DIR/pipeline && ./build.sh
-
-
+cd $base_dir/pipeline && ./build.sh
+(mv -f src/*.tar.gz $base_dir && \
+echo "Pipeline compiled sucessfully!") || \
+echo "Pipeline compilation failed!"
diff --git a/samples/nvidia-resnet/clean_utils/delete_all_previous_resources.sh b/samples/nvidia-resnet/clean_utils/delete_all_previous_resources.sh
@@ -0,0 +1,33 @@
+# Copyright (c) 2019, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+TRTIS_NAME=trtis
+WEBAPP_NAME=webapp
+PIPELINE_NAME=resnet-cifar10-pipeline
+KF_NAMESPACE=kubeflow
+
+kubectl delete service/$TRTIS_NAME -n $KF_NAMESPACE
+kubectl delete deployment.apps/$TRTIS_NAME -n $KF_NAMESPACE
+
+for service in $( kubectl get svc -n $KF_NAMESPACE | grep $WEBAPP_NAME | cut -d' ' -f1 ); do
+    kubectl delete -n $KF_NAMESPACE service/$service
+done
+
+for deployment in $( kubectl get deployment -n $KF_NAMESPACE | grep $WEBAPP_NAME | cut -d' ' -f1 ); do
+    kubectl delete -n $KF_NAMESPACE deployment.apps/$deployment
+done
+
+for pod in $(kubectl get pod -n kubeflow | grep $PIPELINE_NAME | cut -d' ' -f1); do 
+    kubectl delete -n $KF_NAMESPACE pod/$pod
+done
diff --git a/samples/nvidia-resnet/clean_utils/remove_minikube_and_kubeflow.sh b/samples/nvidia-resnet/clean_utils/remove_minikube_and_kubeflow.sh
@@ -0,0 +1,22 @@
+#!/bin/bash
+# Copyright (c) 2019, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Remove KubeFlow
+cd ${KUBEFLOW_SRC}/${KFAPP}
+${KUBEFLOW_SRC}/scripts/kfctl.sh delete k8s
+
+# Remove Minikube
+minikube stop
+minikube delete
diff --git a/samples/nvidia-resnet/components/inference_server_launcher/Dockerfile b/samples/nvidia-resnet/components/inference_server_launcher/Dockerfile
@@ -0,0 +1,30 @@
+# Copyright (c) 2019, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+FROM ubuntu:16.04
+
+RUN apt-get update -y && \
+    apt-get install --no-install-recommends -y -q ca-certificates curl python-dev python-setuptools wget unzip
+RUN easy_install pip && \
+    pip install pyyaml six requests
+
+# Install kubectl
+RUN curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl
+RUN chmod +x ./kubectl
+RUN mv ./kubectl /usr/local/bin
+
+ADD src /workspace
+WORKDIR /workspace
+
+ENTRYPOINT ["python", "deploy_trtis.py"]
diff --git a/samples/nvidia-resnet/components/inference_server_launcher/build.sh b/samples/nvidia-resnet/components/inference_server_launcher/build.sh
@@ -0,0 +1,19 @@
+#!/bin/bash
+# Copyright (c) 2019, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+IMAGE=<inference-server-launcher-image>
+
+docker build -t $IMAGE .
+docker push $IMAGE
diff --git a/samples/nvidia-resnet/components/inference_server_launcher/src/deploy_trtis.py b/samples/nvidia-resnet/components/inference_server_launcher/src/deploy_trtis.py
@@ -0,0 +1,59 @@
+# Copyright (c) 2019, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+import os
+import logging
+import subprocess
+import requests
+
+
+KUBEFLOW_NAMESPACE = 'kubeflow'
+YAML_TEMPLATE = 'trtis-service-template.yaml'
+YAML_FILE = 'trtis-service.yaml'
+
+
+def main():
+    parser = argparse.ArgumentParser(description='Inference server launcher')
+    parser.add_argument('--trtserver_name', help='Name of trtis service')
+    parser.add_argument('--model_path', help='...')
+
+    args = parser.parse_args()
+
+    logging.getLogger().setLevel(logging.INFO)
+    logging.info('Generating TRTIS service template')
+
+    template_file = os.path.join(os.path.dirname(
+        os.path.realpath(__file__)), YAML_TEMPLATE)
+    target_file = os.path.join(os.path.dirname(
+        os.path.realpath(__file__)), YAML_FILE)
+
+    with open(template_file, 'r') as template:
+        with open(target_file, "w") as target:
+            data = template.read()
+            changed = data.replace('TRTSERVER_NAME', args.trtserver_name)
+            changed1 = changed.replace(
+                'KUBEFLOW_NAMESPACE', KUBEFLOW_NAMESPACE)
+            changed2 = changed1.replace('MODEL_PATH', args.model_path)
+            target.write(changed2)
+
+    logging.info('Deploying TRTIS service')
+    subprocess.call(['kubectl', 'apply', '-f', YAML_FILE])
+
+    with open('/output.txt', 'w') as f:
+        f.write(args.trtserver_name)
+
+
+if __name__ == "__main__":
+    main()