Skip to content

Commit

Permalink
Move jenkins/ dir into ci/jenkins and spread docs around.
Browse files Browse the repository at this point in the history
  • Loading branch information
areusch committed Jun 28, 2022
1 parent 1115fd9 commit eebab38
Show file tree
Hide file tree
Showing 17 changed files with 287 additions and 34 deletions.
2 changes: 1 addition & 1 deletion Jenkinsfile

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

97 changes: 97 additions & 0 deletions ci/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
<!--- Licensed to the Apache Software Foundation (ASF) under one -->
<!--- or more contributor license agreements. See the NOTICE file -->
<!--- distributed with this work for additional information -->
<!--- regarding copyright ownership. The ASF licenses this file -->
<!--- to you under the Apache License, Version 2.0 (the -->
<!--- "License"); you may not use this file except in compliance -->
<!--- with the License. You may obtain a copy of the License at -->

<!--- http://www.apache.org/licenses/LICENSE-2.0 -->

<!--- Unless required by applicable law or agreed to in writing, -->
<!--- software distributed under the License is distributed on an -->
<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
<!--- KIND, either express or implied. See the License for the -->
<!--- specific language governing permissions and limitations -->
<!--- under the License. -->

# Apache TVM Continuous Integration (CI)

## Overview

TVM's Continuous Integration is responsible for verifying the code in `apache/tvm` and testing PRs
before they merge to inform TVM contributors and committers. These jobs are essential to keeping the
TVM project in a healthy state and preventing breakages. CI in TVM is broken into these pieces:
- Lint scripts in [`tests/lint`](../tests/lint).
- The tests themselves, all of which live underneath [`tests`](../tests).
- Definitions of test suites, with each suite defined as a separate `task_` script in
[`tests/scripts`](../tests/scripts).
- The linux test sequence (in [`Jenkinsfile`](../Jenkinsfile)), which lints and builds TVM and runs test
suites using Docker on Linux.
- The Windows and Mac test sequences (in [`.github/actions`](../.github/actions)).
- GitHub Actions that support the code review process (in [`.github/actions`](../.github/actions)).
- Tools to reproduce the CI locally (in `tests/scripts`).
- Infrastructure-as-Code that configures the cloud services that provide Jenkins for the TVM CI (in the
[`tlc-pack/ci`](https://github.com/tlc-pack/ci) repo).

## CI Documentation Index

The CI documentation belongs with the implementation it describes. To make that concrete, the
documentation is split like so:
1. An overview of the CI is in this file.
1. User-facing documentation lives in `apache/tvm`'s `docs/contribute` sub-directory and is served on the
[TVM docs site](https://tvm.apache.org/docs/contribute/ci.html).
2. Documentation of the tools that run TVM's various regression tests locally and the test suites
are in this sub-directory.
3. Documentation of the cloud services and their configuration lives in the
[`tlc-pack/ci`](https://github.com/tlc-pack/ci) repo.

## Jenkins

Jenkins runs all of the linux-based TVM CI-enabled regression tests. This includes tests against accelerated hardware such as GPUs. It excludes those regression tests that run against hardware not available in the cloud (those tests aren't currently exercised in TVM CI). The tests run by Jenkins represent most of the merge-blocking tests (and passing Jenkins should mostly correlate with passing the remaining Windows/Mac builds).

## GitHub Actions

GitHub Actions is used to run Windows jobs, MacOS jobs, and various on-GitHub automations. These are defined in [`.github/workflows`](../.github/workflows/). These automations include bots to:
* [cc people based on subscribed teams/topics](https://github.com/apache/tvm/issues/10317)
* [allow non-committers to merge approved / CI passing PRs](https://discuss.tvm.apache.org/t/rfc-allow-merging-via-pr-comments/12220)
* [add cc-ed people as reviewers on GitHub](https://discuss.tvm.apache.org/t/rfc-remove-codeowners/12095)
* [ping languishing PRs after no activity for a week (currently opt-in only)](https://github.com/apache/tvm/issues/9983)
* [push a `last-successful` branch to GitHub with the last `main` commit that passed CI](https://github.com/apache/tvm/tree/last-successful)

https://github.com/apache/tvm/actions has the logs for each of these workflows. Note that when debugging these workflows changes from PRs from forked repositories won't be reflected in the PR. These should be tested in the forked repository first and linked in the PR body.

## Docker Images

Each CI job runs most of its work inside a Docker container, built from files
in the [`docker/`](../docker) folder. These
files are built nightly in Jenkins via the [docker-images-ci](https://ci.tlcpack.ai/job/docker-images-ci/>) job.
The images for these containers are hosted in the [tlcpack Docker Hub](https://hub.docker.com/u/tlcpack>)
and referenced in the [`Jenkinsfile.j2`](Jenkinsfile.j2). These can be inspected and run
locally via standard Docker commands.

### `ci-docker-staging`

The [ci-docker-staging](https://github.com/apache/tvm/tree/ci-docker-staging>)
branch is used to test updates to Docker images and `Jenkinsfile` changes. When
running a build for a normal PR from a forked repository, Jenkins uses the code
from the PR except for the `Jenkinsfile` itself, which comes from the base branch.
When branches are built, the `Jenkinsfile` in the branch is used, so a committer
with write access must push PRs to a branch in apache/tvm to properly test
`Jenkinsfile` changes. If your PR makes changes to the `Jenkinsfile`, make sure
to @ a [committer](../CONTRIBUTORS.md>)
and ask them to push your PR as a branch to test the changes.

# Jenkins CI

TVM uses Jenkins for running Linux continuous integration (CI) tests on
[branches](https://ci.tlcpack.ai/job/tvm/) and
[pull requests](https://ci.tlcpack.ai/job/tvm/view/change-requests/) through a
build configuration specified in a [`Jenkinsfile`](../Jenkinsfile).
Other jobs run in GitHub Actions for Windows and MacOS jobs.

## `Jenkinsfile`

The template files in this directory are used to generate the [`Jenkinsfile`](../Jenkinsfile) used by Jenkins to run CI jobs for each commit to PRs and branches.

To regenerate the `Jenkinsfile`, run `make` in the `ci/jenkins` dir.
1 change: 1 addition & 0 deletions ci/jenkins/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/_venv
File renamed without changes.
File renamed without changes.
File renamed without changes.
14 changes: 7 additions & 7 deletions jenkins/Jenkinsfile.j2 → ci/jenkins/Jenkinsfile.j2
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
// Generated at {{ generated_time }}

import org.jenkinsci.plugins.pipeline.modeldefinition.Utils
{% import 'jenkins/macros.j2' as m with context -%}
{% import 'ci/jenkins/macros.j2' as m with context -%}

// NOTE: these lines are scanned by docker/dev_common.sh. Please update the regex as needed. -->
ci_lint = 'tlcpack/ci-lint:20220513-055910-fa834f67e'
Expand Down Expand Up @@ -106,12 +106,12 @@ s3_prefix = "tvm-jenkins-artifacts-prod/tvm/${env.BRANCH_NAME}/${env.BUILD_NUMBE
// General note: Jenkins has limits on the size of a method (or top level code)
// that are pretty strict, so most usage of groovy methods in these templates
// are purely to satisfy the JVM
{% include "jenkins/Prepare.groovy.j2" %}
{% include "jenkins/DockerBuild.groovy.j2" %}
{% include "jenkins/Lint.groovy.j2" %}
{% include "jenkins/Build.groovy.j2" %}
{% include "jenkins/Test.groovy.j2" %}
{% include "jenkins/Deploy.groovy.j2" %}
{% include "ci/jenkins/Prepare.groovy.j2" %}
{% include "ci/jenkins/DockerBuild.groovy.j2" %}
{% include "ci/jenkins/Lint.groovy.j2" %}
{% include "ci/jenkins/Build.groovy.j2" %}
{% include "ci/jenkins/Test.groovy.j2" %}
{% include "ci/jenkins/Deploy.groovy.j2" %}


cancel_previous_build()
Expand Down
File renamed without changes.
27 changes: 27 additions & 0 deletions ci/jenkins/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

_venv: requirements.txt
rm -rf _venv
python3 -mvenv _venv
_venv/bin/pip3 install -r requirements.txt

all: _venv
_venv/bin/python3 generate.py

.PHONY: all venv
.DEFAULT_GOAL=all
File renamed without changes.
36 changes: 22 additions & 14 deletions jenkins/README.md → ci/jenkins/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,11 @@

# TVM CI

TVM runs CI jobs on every commit to an open pull request and to branches in the apache/tvm repo (such as `main`). These jobs are essential to keeping the TVM project in a healthy state and preventing breakages. Jenkins does most of the work in running the TVM tests, though some smaller jobs are also run on GitHub Actions.
TVM runs CI jobs on every commit to an open pull request and to branches in the apache/tvm repo (such as `main`). These jobs are essential to keeping the TVM project in a healthy state and preventing breakages.

## Jenkins

Jenkins runs all of the linux-based TVM CI-enabled regression tests. This includes tests against accelerated hardware such as GPUs. It excludes those regression tests that run against hardware not available in the cloud (those tests aren't currently exercised in TVM CI). The tests run by Jenkins represent most of the merge-blocking tests (and passing Jenkins should mostly correlate with passing the remaining Windows/Mac builds).

## GitHub Actions

Expand All @@ -33,17 +37,20 @@ https://github.com/apache/tvm/actions has the logs for each of these workflows.

## Keeping CI Green

Developers rely on the TVM CI to get signal on their PRs before merging.
Occasionally breakages slip through and break `main`, which in turn causes
the same error to show up on an PR that is based on the broken commit(s). Broken
commits can be identified [through GitHub](https://github.com/apache/tvm/commits/main>)
via the commit status icon or via [Jenkins](https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/activity?branch=main>).
In these situations it is possible to either revert the offending commit or
submit a forward fix to address the issue. It is up to the committer and commit
author which option to choose, keeping in mind that a broken CI affects all TVM
developers and should be fixed as soon as possible.
Developers rely on the TVM CI to get signal on their PRs before merging. Occasionally breakages
slip through and break `main`, which in turn causes the same error to show up on an unrelated PR
that is based on the broken commit(s). Broken commits can be identified [through
GitHub](https://github.com/apache/tvm/commits/main>) via the commit status icon or via
[Jenkins](https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/activity?branch=main>). In these
situations it is possible to either revert the offending commit or submit a forward fix to address
the issue. It is up to the committer and commit author which option to choose. A broken CI affects
all TVM developers and should be fixed as soon as possible, while a revert may be especially painful
for the author of the offending PR when that PR is large.

Some tests are also flaky and fail for reasons unrelated to the PR. The [CI monitoring rotation](https://github.com/apache/tvm/wiki/CI-Monitoring-Runbook) watches for these failures and disables tests as necessary. It is the responsibility of those who wrote the test to ultimately fix and re-enable the test.
Some tests are also flaky and occasionally fail for reasons unrelated to the PR. The [CI monitoring
rotation](https://github.com/apache/tvm/wiki/CI-Monitoring-Runbook) watches for these failures and
disables tests as necessary. It is the responsibility of those who wrote the test to ultimately fix
and re-enable the test.


## Dealing with Flakiness
Expand Down Expand Up @@ -85,7 +92,7 @@ a name, hash, and path in S3, using the `workflow_dispatch` event on
The sha256 must match the file or it will not be uploaded. The upload path is
user-defined so it can be any path (no trailing or leading slashes allowed) but
be careful not to collide with existing resources on accident.

## Skipping CI

For reverts and trivial forward fixes, adding `[skip ci]` to the revert's
Expand Down Expand Up @@ -153,6 +160,7 @@ _venv/bin/python3 jenkins/generate.py

# Infrastructure

While all TVM tests are contained within the apache/tvm repository, the infrastructure used to run the tests is donated by the TVM Community. To encourage collaboration,
Jenkins runs in AWS on an EC2 instance fronted by an ELB which makes it available at https://ci.tlcpack.ai. These definitions are declared via Terraform in the [tlc-pack/ci-terraform](https://github.com/tlc-pack/ci-terraform) repository. The Terraform code references custom AMIs built in [tlc-pack/ci-packer](https://github.com/tlc-pack/ci-packer). [tlc-pack/ci](https://github.com/tlc-pack/ci) contains Ansible scripts to deploy the Jenkins head node and set it up to interact with AWS.

The Jenkins head node has a number of autoscaling groups with labels that are used to run jobs (e.g. `CPU`, `GPU` or `ARM`) via the [EC2 Fleet](https://plugins.jenkins.io/ec2-fleet/) plugin.
Expand Down Expand Up @@ -199,7 +207,7 @@ graph TD
Monitoring_Scrapers --> |fetch job data| JenkinsUI
Developers --> |git push| Commit
Developers --> |create PR| GitHub
subgraph Jenkins Head Node
WebhookServer
JobExecutor
Expand All @@ -223,7 +231,7 @@ graph TD
Grafana
Monitoring_Scrapers
end
subgraph AWS
subgraph Jenkins Workers
WorkerEC2Instance(Worker EC2 Instance)
Expand Down
File renamed without changes.
8 changes: 4 additions & 4 deletions jenkins/generate.py → ci/jenkins/generate.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@
from pathlib import Path


REPO_ROOT = Path(__file__).resolve().parent.parent
JENKINSFILE_TEMPLATE = REPO_ROOT / "jenkins" / "Jenkinsfile.j2"
REPO_ROOT = Path(__file__).resolve().parent.parent.parent
JENKINSFILE_TEMPLATE = REPO_ROOT / "ci" / "jenkins" / "Jenkinsfile.j2"
JENKINSFILE = REPO_ROOT / "Jenkinsfile"


Expand Down Expand Up @@ -111,10 +111,10 @@ def lines_without_generated_tag(content):
Newly generated Jenkinsfile did not match the one on disk! If you have made
edits to the Jenkinsfile, move them to 'jenkins/Jenkinsfile.j2' and
regenerate the Jenkinsfile from the template with
python3 -m pip install -r jenkins/requirements.txt
python3 jenkins/generate.py
Diffed changes:
"""
).strip()
Expand Down
File renamed without changes.
File renamed without changes.
Loading

0 comments on commit eebab38

Please sign in to comment.