From 25756c34e599df75b23ca151a61e869faa69688b Mon Sep 17 00:00:00 2001 From: Christian Kadner Date: Wed, 29 Apr 2020 20:20:17 -0700 Subject: [PATCH] Add developer guide (#124) Closes #52 --- CONTRIBUTING.md | 6 + README.md | 29 ++++- sdk/README.md | 9 +- sdk/python/README.md | 258 +++++++++++++++++++++++++++++++++++++++++++ tools/mdtoc.sh | 33 ++++++ 5 files changed, 322 insertions(+), 13 deletions(-) create mode 100644 sdk/python/README.md create mode 100755 tools/mdtoc.sh diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index a9d0390982f..43e4cc47fc3 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -4,6 +4,7 @@ - [How to Contribute](#how-to-contribute) - [Contributor License Agreement](#contributor-license-agreement) + - [Development Guidelines](#development-guidelines) - [Code reviews](#code-reviews) - [Get involved](#get-involved) @@ -26,6 +27,11 @@ You generally only need to submit a CLA once, so if you've already submitted one (even if it was for a different project), you probably don't need to do it again. +## Development Guidelines + +Please take a look at the [KFP-Tekton Developer Guide](sdk/python/README.md) for details about how to make code +contributions to the KFP-Tekton project. + ## Code reviews All submissions, including submissions by project members, require review. We diff --git a/README.md b/README.md index c1e6d8c3731..e616283fae7 100644 --- a/README.md +++ b/README.md @@ -1,24 +1,43 @@ # Kubeflow Pipelines and Tekton -Experimental project to bring Kubeflow Pipelines and Tekton together. The work is being driven in accordance with this evolving [design doc specifications](http://bit.ly/kfp-tekton). Since this will evolve from experimental towards a more mature solution, we are keeping it currently separate from [Kubeflow Pipeline repo](https://github.com/kubeflow/pipelines) + +Experimental project to bring Kubeflow Pipelines and Tekton together. The work is being driven in accordance with this +evolving [design doc specifications](http://bit.ly/kfp-tekton). Since this will evolve from experimental towards a more +mature solution, we are keeping it currently separate from [Kubeflow Pipeline repo](https://github.com/kubeflow/pipelines). ## Tekton -The Tekton Pipelines project provides Kubernetes-style resources for declaring CI/CD-style pipelines. Tekton introduces several new CRDs including Task, Pipeline, TaskRun, and PipelineRun. A PipelineRun represents a single running instance of a Pipeline and is responsible for creating a Pod for each of its Tasks and as many containers within each Pod as it has Steps. Some tasks here will invariably require contributions back to Tekton. Please follow the community guidelines in [Tekton repo](https://github.com/tektoncd/pipeline). + +The Tekton Pipelines project provides Kubernetes-style resources for declaring CI/CD-style pipelines. Tekton introduces +several new CRDs including Task, Pipeline, TaskRun, and PipelineRun. A PipelineRun represents a single running instance +of a Pipeline and is responsible for creating a Pod for each of its Tasks and as many containers within each Pod as it +has Steps. Some tasks here will invariably require contributions back to Tekton. Please follow the community guidelines +in [Tekton repo](https://github.com/tektoncd/pipeline). ## Development: Kubeflow Pipeline DSL to Tekton Compiler -The work will be split in three phases. While the details of the phases are listed in the [design doc](http://bit.ly/kfp-tekton), the current effort in this repository is focussed on creating a Kubeflow Pipeline compiler for Tekton, which can take KFP DSL, and compile it to Tekton yaml. We will update the details as we move into other phases, in concurrence with design decisions. +The work will be split in three phases. While the details of the phases are listed in the [design doc](http://bit.ly/kfp-tekton), +the current effort in this repository is focused on creating a Kubeflow Pipeline compiler for Tekton, which can take +KFP DSL, and compile it to Tekton YAML. We will update the details as we move into other phases, in concurrence with +design decisions. ![kfp-tekton](images/kfp-tekton-phase-one.png) -To get started with contributing to KFP Tekton Compiler, please [follow these instructions](sdk/README.md), as well as look at [open issues on the repo](https://github.com/kubeflow/kfp-tekton/issues) +To get started experimenting with the KFP Tekton Compiler, please [follow these instructions](sdk/README.md). -We are using Kubeflow Pipelines v0.2.2 and Tekton v0.11.3 for the project currently. You may also be interested in [KFP, Argo and Tekton Features Comparision](https://docs.google.com/spreadsheets/d/1LFUy86MhVrU2cRhXNsDU-OBzB4BlkT9C0ASD3hoXqpo/edit#gid=979402121) which the team has compiled, and it goes in fine-grained details. +If you would like to make code contributions take a look at the [Developer Guide](sdk/python/README.md) and go through +the list of [open issues](https://github.com/kubeflow/kfp-tekton/issues). +We are currently using [Kubeflow Pipelines 0.2.2](https://github.com/kubeflow/pipelines/releases/tag/0.2.2) and +[Tekton 0.11.3](https://github.com/tektoncd/pipeline/releases/tag/v0.11.3) for this project. + +The [KFP, Argo and Tekton Feature Comparison](https://docs.google.com/spreadsheets/d/1LFUy86MhVrU2cRhXNsDU-OBzB4BlkT9C0ASD3hoXqpo/edit#gid=979402121) +provides a detailed analysis of the KFP features and a comparison of their respective implementations in Argo and Tekton. + ## CD Foundation The work here is being tracked under the [CD Foundation MLOps Sig](https://cd.foundation/blog/2020/02/11/announcing-the-cd-foundation-mlops-sig/). If you are interested in joining, please see the [instructions here](https://github.com/cdfoundation/sig-mlops) ## Additional Reference Materials: KFP and TFX + 1. [Kubeflow Pipelines-TFX Pipelines](/samples/kfp-tfx) 2. [Kubeflow Pipelines-TFX Pipelines Talk at Tensorflow World](https://www.slideshare.net/AnimeshSingh/hybrid-cloud-kubeflow-and-tensorflow-extended-tfx) 3. [Kubeflow Pipelines-TFX Pipelines RFC](https://docs.google.com/document/d/1_n3q0mNOr7gUSM04yaA0e5BO9RrS0Vkh1cNCyrB07WM/edit) diff --git a/sdk/README.md b/sdk/README.md index 6cfdb536260..e0d4bfd4d12 100644 --- a/sdk/README.md +++ b/sdk/README.md @@ -6,14 +6,7 @@ The output of the KFP SDK compiler is YAML for [Argo](https://github.com/argopro We are updating the `Compiler` of the KFP SDK to generate `Tekton` YAML. Please go through these steps to ensure you are setup properly to use the updated compiler. -## Development Prerequisites - -1. [`Python`](https://www.python.org/downloads/): Python 3.5 or later -2. [`Conda`](https://docs.conda.io/en/latest/) or Python - [virtual environment](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/): - Package, dependency and environment management for Python - -## Tested Versions +## Project Prerequisites - Python: `3.7.5` - Kubeflow Pipelines: [`0.2.2`](https://github.com/kubeflow/pipelines/releases/tag/0.2.2) diff --git a/sdk/python/README.md b/sdk/python/README.md new file mode 100644 index 00000000000..db30a6469da --- /dev/null +++ b/sdk/python/README.md @@ -0,0 +1,258 @@ +# KFP-Tekton Developer Guide + +This document describes the development guidelines for contributing to the KFP-Tekton project. +Details about the required contributor license agreement (CLA) and the code review process can be found in the +[CONTRIBUTING.md](/CONTRIBUTING.md) document. +A quick-start guide with general setup instruction, trouble shooting guide and technical limitations can be found in +the [SDK README](/sdk/README.md) + +## Table of Contents + + + + - [Development Prerequisites](#development-prerequisites) + - [Installing Tekton](#installing-tekton) + - [Tekton Cluster](#tekton-cluster) + - [Tekton CLI](#tekton-cli) + - [Tekton Dashboard](#tekton-dashboard) + - [Origins of the KFP-Tekton Compiler Code](#origins-of-the-kfp-tekton-compiler-code) + - [Adding New Code](#adding-new-code) + - [Overriding KFP Compiler Methods](#overriding-kfp-compiler-methods) + - [Monkey-Patching Static KFP Compiler Methods](#monkey-patching-static-kfp-compiler-methods) + - [Coding Style](#coding-style) + - [Testing](#testing) + - [Unit Tests](#unit-tests) + - [End-to-End Tests with Tekton](#end-to-end-tests-with-tekton) + - [Compiler Test Report](#compiler-test-report) + - [License Headers](#license-headers) + + + + +## Development Prerequisites + +1. [`Python`](https://www.python.org/downloads/): version `3.5` or later +2. [`Kubernetes` Cluster](https://v1-15.docs.kubernetes.io/docs/setup/): version `1.15` ([required by Kubeflow](https://www.kubeflow.org/docs/started/k8s/overview/) and Tekton 0.11) +3. [`kubectl` CLI](https://kubernetes.io/docs/tasks/tools/install-kubectl/): required to deploy Tekton pipelines to Kubernetes cluster +4. [`Tekton` Deployment](https://github.com/tektoncd/pipeline/releases/tag/v0.11.3/): version `0.11.3` (or greater to support Tekton API version `v1beta1`), required for end-to-end testing +5. [`tkn` CLI](https://github.com/tektoncd/cli#installing-tkn): required to work with Tekton pipelines +6. [`Kubeflow Pipelines` Deployment](https://www.kubeflow.org/docs/pipelines/installation/overview/): required for some end-to-end tests + + +### Installing Tekton + +A working Tekton cluster deployment is required to perform end-to-end tests of the pipelines generated by the +`kfp_tekton` compiler. The Tekton CLI is useful to start a pipeline and analyze the pipeline logs. + +#### Tekton Cluster + +Follow the instructions listed [here](https://github.com/tektoncd/pipeline/blob/v0.11.3/docs/install.md#installing-tekton-pipelines-on-kubernetes) +or simply run: + + kubectl apply -f https://storage.googleapis.com/tekton-releases/pipeline/previous/v0.11.3/release.yaml + +**Note**, if your container runtime does not support image-reference:tag@digest (like cri-o used in OpenShift 4.x), +use `release.notags.yaml` instead. + +Optionally, for convenience, set the default namespace to `tekton-pipelines`: + + kubectl config set-context --current --namespace=tekton-pipelines + +#### Tekton CLI + +Follow the instructions [here](https://github.com/tektoncd/cli#installing-tkn). + +Mac OS users can install the Tekton CLI using the `homebrew` formula: + + brew tap tektoncd/tools + brew install tektoncd/tools/tektoncd-cli + +#### Tekton Dashboard + +Follow the installation instructions [here](https://github.com/tektoncd/dashboard#installing-the-latest-release), i.e.: + + kubectl apply --filename https://github.com/tektoncd/dashboard/releases/download/v0.6.1/tekton-dashboard-release.yaml + +The Tekton Dashboard can be accessed through its `ClusterIP` service by running `kubectl proxy` or the service can +be patched to expose a public `NodePort` IP: + + kubectl patch svc tekton-dashboard -n tekton-pipelines --type='json' -p '[{"op":"replace","path":"/spec/type","value":"NodePort"}]' + +To open the dashboard run: + + TKN_DASHBOARD_SVC_PORT=$(kubectl -n tekton-pipelines get service tekton-dashboard -o jsonpath='{.spec.ports[0].nodePort}') + PUBLIC_IP=$(kubectl get nodes -o jsonpath='{.items[0].status.addresses[?(@.type=="ExternalIP")].address}') + open "http://${PUBLIC_IP}:${TKN_DASHBOARD_SVC_PORT}/#/pipelineruns" + + +## Origins of the KFP-Tekton Compiler Code + +The source code of the `kfp-tekton` compiler was created as an extension of the [`Kubeflow Pipelines SDK Compiler`](https://github.com/kubeflow/pipelines/tree/master/sdk/python/kfp/compiler). +This approach allowed us to leverage much of the existing [Kubeflow Pipelines Python SDK code](https://www.kubeflow.org/docs/pipelines/sdk/sdk-overview/#sdk-packages), +like the DSL and components packages, but "override" or "replace" those parts of the compiler code required to generate +the Tekton YAML instead of Argo YAML. Since the KFP SDK was not designed and implemented to easily be extended, +_monkey-patching_ was used to replace non-class methods and functions at runtime. + +In order for the _monkey patch_ to work properly, the `kfp-tekton` compiler source code has to be aligned with a +specific version of the `kfp` SDK compiler. As of now that version is [`0.2.2`](https://github.com/kubeflow/pipelines/releases/tag/0.2.2). + + +## Adding New Code + +The Python package structure as well as the module names and method signatures closely mirror those of the +[`Kubeflow Pipelines Python SDK`](https://github.com/kubeflow/pipelines/tree/master/sdk/python). +This helps keeping track of all the code that had to modified and will make merging (some of) the code back into KFP or +identify pieces of code that need to be refactored in KFP in order to accommodate various execution platforms. +When it is necessary to bring further methods from `kfp` compiler package into the `kfp-tekton` compiler package, keep +the original method names and signatures as well as their position inside their respective Python modules. + +### Overriding KFP Compiler Methods + +Most of the functions provided by the `kfp.compiler.compiler.Compiler` are instance based and can be overridden in +[`kfp_tekton.compiler.compiler.TektonCompiler`](/sdk/python/kfp_tekton/compiler/compiler.py). + +Static `Compiler` methods may need to be also be added to the _monkey patch_ described in the next section unless they +are only used by other methods that are already overridden in `TektonCompiler`. Be careful not to mix inheritance and +monkey patching. A method which in its body calls on its `super().` implementation must not be added to the list of +methods that get dynamically replaced via the _monkey patch_. + +### Monkey-Patching Static KFP Compiler Methods + +When code changes are required to static helper methods in `kfp.compiler` the "overridden" methods should be added to +their respective modules in `kfp_tekton.compiler` and added to the _monkey patch_ which dynamically replaces the code in +the `kfp` at runtime. + +`sdk/python/kfp_tekton/compiler/__init__.py`: +```Python +def monkey_patch(): + """ + Overriding (replacing) selected methods/function in the KFP SDK compiler package. + This is a temporary hack during early development of the KFP-Tekton compiler. + """ + import kfp + from kfp.compiler._data_passing_rewriter import fix_big_data_passing + from kfp.compiler._k8s_helper import convert_k8s_obj_to_json + from kfp.compiler._op_to_template import _op_to_template, _process_base_ops + from kfp.compiler.compiler import Compiler as KFPCompiler + + from ._data_passing_rewriter import fix_big_data_passing as tekton_fix_big_data_passing + from ._k8s_helper import convert_k8s_obj_to_json as tekton_convert_k8s_obj_to_json + from ._op_to_template import _op_to_template as tekton_op_to_template + from ._op_to_template import _process_base_ops as tekton_process_base_ops + from .compiler import TektonCompiler + + kfp.compiler._data_passing_rewriter.fix_big_data_passing = tekton_fix_big_data_passing + kfp.compiler._k8s_helper.convert_k8s_obj_to_json = tekton_convert_k8s_obj_to_json + kfp.compiler._op_to_template._op_to_template = tekton_op_to_template + kfp.compiler._op_to_template._process_base_ops = tekton_process_base_ops + KFPCompiler._resolve_value_or_reference = TektonCompiler._resolve_value_or_reference + KFPCompiler._create_dag_templates = TektonCompiler._create_dag_templates + KFPCompiler._create_and_write_workflow = TektonCompiler._create_and_write_workflow + KFPCompiler._create_pipeline_workflow = TektonCompiler._create_pipeline_workflow + KFPCompiler._create_workflow = TektonCompiler._create_workflow + KFPCompiler._group_to_dag_template = TektonCompiler._group_to_dag_template + KFPCompiler._write_workflow = TektonCompiler._write_workflow + +try: + print("Applying KFP-Tekton compiler patch") + monkey_patch() + +except Exception as error: + traceback.print_exc() + print("Failed to apply KFP-Tekton compiler patch") + sys.exit(1) +``` + +**Note**: Since the _monkey patch_ gets triggered by importing any member of the `kfp_tekton.compiler` module, we try to +avoid using top-level imports of any members in `kfp_tekton.compiler` in pipeline DSL scripts. +Instead use local imports to avoid triggering the _monkey-patch_ when the original KFP compiler is used to compile a +pipeline DSL script using KFP's `dsl-compile --py ` command. + +```Python +if __name__ == '__main__': + # don't use top-level import of TektonCompiler to prevent monkey-patching KFP compiler when using KFP's dsl-compile + from kfp_tekton.compiler import TektonCompiler + TektonCompiler().compile(pipeline, __file__.replace('.py', '.yaml')) +``` + + +## Coding Style + +The Python code in this project follows the [Google Python style guide](http://google.github.io/styleguide/pyguide.html). +You can make use of a [yapf](https://github.com/google/yapf) configuration file to auto-format Python code and adopt the +Google Python style. We encouraged to lint Python docstrings using [docformatter](https://github.com/myint/docformatter). +Our CI/CD integration with Travis uses [Flake8](https://pypi.org/project/flake8/) and the current set of enforced rules +can be found in [.travis.yml](/.travis.yml). + +## Testing + +Before committing code changes to the compiler make sure to run the [compiler unit tests](/sdk/python/tests/compiler/compiler_tests.py): + +Ideally whenever a code change to the compiler results in modified YAML an end-to-end test should be run on a Tekton cluster. + +### Unit Tests + +Any new functionality being added to the `kfp_tekton.compiler` should be accompanied by a new unit test in `sdk/python/tests/compiler/compiler_tests.py` +Typically a test case comes with a minimal Python DSL script and a "golden" YAML file in `sdk/python/tests/compiler/testdata`. +The "golden" YAML file contains the expected compiler output. The unit tests use the "golden" YAML files to compare +the current compiler output with the previously expected compiler output. + + make test + + +If the pipeline script compiles but does not match the "golden" YAML, then the unit test should fail. If the change in +the output YAML is desired, then the "golden" YAML needs to be regenerated, i.e. by temporarily enabling the +`GENERATE_GOLDEN_YAML` flag in `compiler_tests.py`. + + +### End-to-End Tests with Tekton + +The unit tests are designed to verify the YAML produced by the compiler matches the expected, previously generated +"golden" YAML. End-to-end (E2E) tests are necessary to verify that the generated Tekton YAML is syntactically valid and +that the pipeline can be executed successfully on a Tekton cluster. + +A manual E2E test can be performed in the following manner: + + kubectl apply -f + tkn pipeline start --showlog + + +Some E2E tests require a Kubernetes cluster with Kubeflow Pipelines installed in order to make use of the +artifact storage provided by [Minio](https://docs.minio.io/) and need to run in the `kubeflow` namespace in order to +access secrets: + + kubectl apply -f -n kubeflow + tkn pipeline start --showlog -n kubeflow + + + +### Compiler Test Report + +The goal of the first phase of the KFP-Tekton project was to ensure that most or all of the KFP compiler features are +working for Tekton. That is, the `kfp_tekton` compiler can compile all Python DSL test scripts in the KFP compiler +[`testdata`](https://github.com/kubeflow/pipelines/tree/master/sdk/python/tests/compiler/testdata) folder. + +To update the ["Compiler Status Report"](/sdk/python/tests/README.md) use the output of this command: + + make report + + +## License Headers + +All source files should have the following license header. Adjust the year accordingly to reflect the year the file was +added and the last year it was modified: + + # Copyright 2019-2020 kubeflow.org + # + # Licensed under the Apache License, Version 2.0 (the "License"); + # you may not use this file except in compliance with the License. + # You may obtain a copy of the License at + # + # http://www.apache.org/licenses/LICENSE-2.0 + # + # Unless required by applicable law or agreed to in writing, software + # distributed under the License is distributed on an "AS IS" BASIS, + # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + # See the License for the specific language governing permissions and + # limitations under the License. + diff --git a/tools/mdtoc.sh b/tools/mdtoc.sh new file mode 100755 index 00000000000..dda0729c3ab --- /dev/null +++ b/tools/mdtoc.sh @@ -0,0 +1,33 @@ +#!/bin/bash + +# Copyright 2020 kubeflow.org +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# This script will generate a table of contents (ToC) for Markdown (MD) files. +# +# 1. Find the paragraph headings with grep (2nd and 3rd level heading starting with "##" and "###") +# 2. Extract the heading's text with sed and transform into '|' separated records of the form '###|Full Text|Full Text' +# 3. Generate the TcC lines with awk by replacing '#' with ' ', converting spaces to dashes '-' and lower-casing caps +# 4. Replace leading 2 spaces since our ToC does not include 1st level headings +# +# Inspired by https://medium.com/@acrodriguez/one-liner-to-generate-a-markdown-toc-f5292112fd14 + +SEP="|" + +[ -z "${1}" ] && echo -e "Usage:\n\n $BASH_SOURCE \n" && exit 1 + +grep -E "^#{2,3}" "${1}" | grep -v "Table of Contents" | \ +sed -E "s/(#+) (.+)/\1${SEP}\2${SEP}\2/g" | \ +awk -F "${SEP}" '{ gsub(/#/," ",$1); gsub(/[ ]/,"-",$3); print $1 "- [" $2 "](#" tolower($3) ")" }' | \ +sed -e 's/^ //g'