Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move jenkins/ dir into ci/jenkins and spread docs around #11927

Merged
merged 1 commit into from
Jun 28, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Jenkinsfile

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

97 changes: 97 additions & 0 deletions ci/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
<!--- Licensed to the Apache Software Foundation (ASF) under one -->
<!--- or more contributor license agreements. See the NOTICE file -->
<!--- distributed with this work for additional information -->
<!--- regarding copyright ownership. The ASF licenses this file -->
<!--- to you under the Apache License, Version 2.0 (the -->
<!--- "License"); you may not use this file except in compliance -->
<!--- with the License. You may obtain a copy of the License at -->

<!--- http://www.apache.org/licenses/LICENSE-2.0 -->

<!--- Unless required by applicable law or agreed to in writing, -->
<!--- software distributed under the License is distributed on an -->
<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
<!--- KIND, either express or implied. See the License for the -->
<!--- specific language governing permissions and limitations -->
<!--- under the License. -->

# Apache TVM Continuous Integration (CI)

## Overview

TVM's Continuous Integration is responsible for verifying the code in `apache/tvm` and testing PRs
before they merge to inform TVM contributors and committers. These jobs are essential to keeping the
TVM project in a healthy state and preventing breakages. CI in TVM is broken into these pieces:
- Lint scripts in [`tests/lint`](../tests/lint).
- The tests themselves, all of which live underneath [`tests`](../tests).
- Definitions of test suites, with each suite defined as a separate `task_` script in
[`tests/scripts`](../tests/scripts).
- The linux test sequence (in [`Jenkinsfile`](../Jenkinsfile)), which lints and builds TVM and runs test
suites using Docker on Linux.
- The Windows and Mac test sequences (in [`.github/actions`](../.github/actions)).
- GitHub Actions that support the code review process (in [`.github/actions`](../.github/actions)).
- Tools to reproduce the CI locally (in `tests/scripts`).
- Infrastructure-as-Code that configures the cloud services that provide Jenkins for the TVM CI (in the
[`tlc-pack/ci`](https://github.com/tlc-pack/ci) repo).

## CI Documentation Index

The CI documentation belongs with the implementation it describes. To make that concrete, the
documentation is split like so:
1. An overview of the CI is in this file.
1. User-facing documentation lives in `apache/tvm`'s `docs/contribute` sub-directory and is served on the
[TVM docs site](https://tvm.apache.org/docs/contribute/ci.html).
2. Documentation of the tools that run TVM's various regression tests locally and the test suites
are in this sub-directory.
3. Documentation of the cloud services and their configuration lives in the
[`tlc-pack/ci`](https://github.com/tlc-pack/ci) repo.

## Jenkins

Jenkins runs all of the linux-based TVM CI-enabled regression tests. This includes tests against accelerated hardware such as GPUs. It excludes those regression tests that run against hardware not available in the cloud (those tests aren't currently exercised in TVM CI). The tests run by Jenkins represent most of the merge-blocking tests (and passing Jenkins should mostly correlate with passing the remaining Windows/Mac builds).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Jenkins runs all of the linux-based TVM CI-enabled regression tests. This includes tests against accelerated hardware such as GPUs. It excludes those regression tests that run against hardware not available in the cloud (those tests aren't currently exercised in TVM CI). The tests run by Jenkins represent most of the merge-blocking tests (and passing Jenkins should mostly correlate with passing the remaining Windows/Mac builds).
Jenkins runs all of the Linux-based TVM CI-enabled regression tests. This includes tests against accelerated hardware such as GPUs. It excludes those regression tests that run against hardware not available in the cloud (those tests aren't currently exercised in TVM CI). The tests run by Jenkins represent most of the merge-blocking tests (and passing Jenkins should mostly correlate with passing the remaining Windows/Mac builds).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


## GitHub Actions

GitHub Actions is used to run Windows jobs, MacOS jobs, and various on-GitHub automations. These are defined in [`.github/workflows`](../.github/workflows/). These automations include bots to:
* [cc people based on subscribed teams/topics](https://github.com/apache/tvm/issues/10317)
* [allow non-committers to merge approved / CI passing PRs](https://discuss.tvm.apache.org/t/rfc-allow-merging-via-pr-comments/12220)
* [add cc-ed people as reviewers on GitHub](https://discuss.tvm.apache.org/t/rfc-remove-codeowners/12095)
* [ping languishing PRs after no activity for a week (currently opt-in only)](https://github.com/apache/tvm/issues/9983)
* [push a `last-successful` branch to GitHub with the last `main` commit that passed CI](https://github.com/apache/tvm/tree/last-successful)

https://github.com/apache/tvm/actions has the logs for each of these workflows. Note that when debugging these workflows changes from PRs from forked repositories won't be reflected in the PR. These should be tested in the forked repository first and linked in the PR body.

## Docker Images

Each CI job runs most of its work inside a Docker container, built from files
in the [`docker/`](../docker) folder. These
files are built nightly in Jenkins via the [docker-images-ci](https://ci.tlcpack.ai/job/docker-images-ci/>) job.
The images for these containers are hosted in the [tlcpack Docker Hub](https://hub.docker.com/u/tlcpack>)
and referenced in the [`Jenkinsfile.j2`](Jenkinsfile.j2). These can be inspected and run
locally via standard Docker commands.

### `ci-docker-staging`

The [ci-docker-staging](https://github.com/apache/tvm/tree/ci-docker-staging>)
branch is used to test updates to Docker images and `Jenkinsfile` changes. When
running a build for a normal PR from a forked repository, Jenkins uses the code
from the PR except for the `Jenkinsfile` itself, which comes from the base branch.
When branches are built, the `Jenkinsfile` in the branch is used, so a committer
with write access must push PRs to a branch in apache/tvm to properly test
`Jenkinsfile` changes. If your PR makes changes to the `Jenkinsfile`, make sure
to @ a [committer](../CONTRIBUTORS.md>)
and ask them to push your PR as a branch to test the changes.

# Jenkins CI

TVM uses Jenkins for running Linux continuous integration (CI) tests on
[branches](https://ci.tlcpack.ai/job/tvm/) and
[pull requests](https://ci.tlcpack.ai/job/tvm/view/change-requests/) through a
build configuration specified in a [`Jenkinsfile`](../Jenkinsfile).
Other jobs run in GitHub Actions for Windows and MacOS jobs.

## `Jenkinsfile`

The template files in this directory are used to generate the [`Jenkinsfile`](../Jenkinsfile) used by Jenkins to run CI jobs for each commit to PRs and branches.

To regenerate the `Jenkinsfile`, run `make` in the `ci/jenkins` dir.
1 change: 1 addition & 0 deletions ci/jenkins/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/_venv
File renamed without changes.
File renamed without changes.
File renamed without changes.
14 changes: 7 additions & 7 deletions jenkins/Jenkinsfile.j2 → ci/jenkins/Jenkinsfile.j2
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
// Generated at {{ generated_time }}

import org.jenkinsci.plugins.pipeline.modeldefinition.Utils
{% import 'jenkins/macros.j2' as m with context -%}
{% import 'ci/jenkins/macros.j2' as m with context -%}

// NOTE: these lines are scanned by docker/dev_common.sh. Please update the regex as needed. -->
ci_lint = 'tlcpack/ci-lint:20220513-055910-fa834f67e'
Expand Down Expand Up @@ -106,12 +106,12 @@ s3_prefix = "tvm-jenkins-artifacts-prod/tvm/${env.BRANCH_NAME}/${env.BUILD_NUMBE
// General note: Jenkins has limits on the size of a method (or top level code)
// that are pretty strict, so most usage of groovy methods in these templates
// are purely to satisfy the JVM
{% include "jenkins/Prepare.groovy.j2" %}
{% include "jenkins/DockerBuild.groovy.j2" %}
{% include "jenkins/Lint.groovy.j2" %}
{% include "jenkins/Build.groovy.j2" %}
{% include "jenkins/Test.groovy.j2" %}
{% include "jenkins/Deploy.groovy.j2" %}
{% include "ci/jenkins/Prepare.groovy.j2" %}
{% include "ci/jenkins/DockerBuild.groovy.j2" %}
{% include "ci/jenkins/Lint.groovy.j2" %}
{% include "ci/jenkins/Build.groovy.j2" %}
{% include "ci/jenkins/Test.groovy.j2" %}
{% include "ci/jenkins/Deploy.groovy.j2" %}


cancel_previous_build()
Expand Down
File renamed without changes.
27 changes: 27 additions & 0 deletions ci/jenkins/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

_venv: requirements.txt
rm -rf _venv
python3 -mvenv _venv
_venv/bin/pip3 install -r requirements.txt

all: _venv
_venv/bin/python3 generate.py

.PHONY: all venv
.DEFAULT_GOAL=all
File renamed without changes.
117 changes: 20 additions & 97 deletions jenkins/README.md → ci/jenkins/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,11 @@

# TVM CI

TVM runs CI jobs on every commit to an open pull request and to branches in the apache/tvm repo (such as `main`). These jobs are essential to keeping the TVM project in a healthy state and preventing breakages. Jenkins does most of the work in running the TVM tests, though some smaller jobs are also run on GitHub Actions.
TVM runs CI jobs on every commit to an open pull request and to branches in the apache/tvm repo (such as `main`). These jobs are essential to keeping the TVM project in a healthy state and preventing breakages.

## Jenkins

Jenkins runs all of the linux-based TVM CI-enabled regression tests. This includes tests against accelerated hardware such as GPUs. It excludes those regression tests that run against hardware not available in the cloud (those tests aren't currently exercised in TVM CI). The tests run by Jenkins represent most of the merge-blocking tests (and passing Jenkins should mostly correlate with passing the remaining Windows/Mac builds).

## GitHub Actions

Expand All @@ -33,17 +37,20 @@ https://github.com/apache/tvm/actions has the logs for each of these workflows.

## Keeping CI Green

Developers rely on the TVM CI to get signal on their PRs before merging.
Occasionally breakages slip through and break `main`, which in turn causes
the same error to show up on an PR that is based on the broken commit(s). Broken
commits can be identified [through GitHub](https://github.com/apache/tvm/commits/main>)
via the commit status icon or via [Jenkins](https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/activity?branch=main>).
In these situations it is possible to either revert the offending commit or
submit a forward fix to address the issue. It is up to the committer and commit
author which option to choose, keeping in mind that a broken CI affects all TVM
developers and should be fixed as soon as possible.
Developers rely on the TVM CI to get signal on their PRs before merging. Occasionally breakages
slip through and break `main`, which in turn causes the same error to show up on an unrelated PR
that is based on the broken commit(s). Broken commits can be identified [through
GitHub](https://github.com/apache/tvm/commits/main>) via the commit status icon or via
[Jenkins](https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/activity?branch=main>). In these
situations it is possible to either revert the offending commit or submit a forward fix to address
the issue. It is up to the committer and commit author which option to choose. A broken CI affects
all TVM developers and should be fixed as soon as possible, while a revert may be especially painful
for the author of the offending PR when that PR is large.

Some tests are also flaky and fail for reasons unrelated to the PR. The [CI monitoring rotation](https://github.com/apache/tvm/wiki/CI-Monitoring-Runbook) watches for these failures and disables tests as necessary. It is the responsibility of those who wrote the test to ultimately fix and re-enable the test.
Some tests are also flaky and occasionally fail for reasons unrelated to the PR. The [CI monitoring
rotation](https://github.com/apache/tvm/wiki/CI-Monitoring-Runbook) watches for these failures and
disables tests as necessary. It is the responsibility of those who wrote the test to ultimately fix
and re-enable the test.


## Dealing with Flakiness
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stuff should be either in .rst or in .md, can you delete one or the other?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh whoops, this was pretty sloppy. done.

Expand Down Expand Up @@ -85,7 +92,7 @@ a name, hash, and path in S3, using the `workflow_dispatch` event on
The sha256 must match the file or it will not be uploaded. The upload path is
user-defined so it can be any path (no trailing or leading slashes allowed) but
be careful not to collide with existing resources on accident.

## Skipping CI

For reverts and trivial forward fixes, adding `[skip ci]` to the revert's
Expand Down Expand Up @@ -153,88 +160,4 @@ _venv/bin/python3 jenkins/generate.py

# Infrastructure

Jenkins runs in AWS on an EC2 instance fronted by an ELB which makes it available at https://ci.tlcpack.ai. These definitions are declared via Terraform in the [tlc-pack/ci-terraform](https://github.com/tlc-pack/ci-terraform) repository. The Terraform code references custom AMIs built in [tlc-pack/ci-packer](https://github.com/tlc-pack/ci-packer). [tlc-pack/ci](https://github.com/tlc-pack/ci) contains Ansible scripts to deploy the Jenkins head node and set it up to interact with AWS.

The Jenkins head node has a number of autoscaling groups with labels that are used to run jobs (e.g. `CPU`, `GPU` or `ARM`) via the [EC2 Fleet](https://plugins.jenkins.io/ec2-fleet/) plugin.

## Deploying

Deploying Jenkins can disrupt developers so it must be done with care. Jobs that are in-flight will be cancelled and must be manually restarted. Follow the instructions [here](https://github.com/tlc-pack/ci/issues/10) to run a deploy.

## Monitoring

Dashboards of CI data can be found:
* within Jenkins at https://ci.tlcpack.ai/monitoring (HTTP / JVM stats)
* at https://monitoring.tlcpack.ai (job status, worker status)

## CI Diagram

This details the individual parts that interact in TVM's CI. For details on operations, see https://github.com/tlc-pack/ci.

```mermaid
graph TD
Commit --> GitHub
GitHub --> |`push` webhook| WebhookServer(Webhook Server)
JobExecutor(Job Executor)
WebhookServer --> JobExecutor
JobExecutor --> EC2Fleet(EC2 Fleet Plugin)
EC2Fleet --> |capacity request| EC2(EC2 Autoscaler)
JobExecutor --> WorkerEC2Instance
Docker --> |build cache, artifacts| S3
WorkerEC2Instance --> Docker
Docker --> |docker pull| G(Docker Hub)
Docker --> |docker push / pull| ECR
Docker --> |Execute jobs| CIScripts(CI Scripts)
RepoCITerraform(ci-terraform repo) --> |terraform| ECR
RepoCITerraform(ci-terraform repo) --> |terraform| EC2
RepoCITerraform(ci-terraform repo) --> |terraform| S3
RepoCI(ci repo) --> |configuration via Ansible| WorkerEC2Instance
RepoCIPacker(ci-packer) --> |AMIs| EC2
Monitoring_Scrapers(Jenkins Scraper) --> Monitoring_DB(Postrgres)
Grafana --> Monitoring_DB
GitHub --> Windows
GitHub --> MacOS

Developers --> |check PR status|JenkinsUI(Jenkins Web UI)
Monitoring_Scrapers --> |fetch job data| JenkinsUI
Developers --> |git push| Commit
Developers --> |create PR| GitHub

subgraph Jenkins Head Node
WebhookServer
JobExecutor
EC2Fleet
JenkinsUI
end

subgraph GitHub Actions
Windows
MacOS
end

subgraph Configuration / Terraform
RepoCITerraform
RepoCI
RepoCIPacker
end

subgraph Monitoring
Monitoring_DB
Grafana
Monitoring_Scrapers
end

subgraph AWS
subgraph Jenkins Workers
WorkerEC2Instance(Worker EC2 Instance)
subgraph "Worker EC2 Instance"
Docker
CIScripts
end
end
EC2
ECR
S3
end

```
While all TVM tests are contained within the apache/tvm repository, the infrastructure used to run the tests is donated by the TVM Community. To encourage collaboration, the configuration for TVM's CI infrastructure is stored in a public GitHub repository. TVM community members are encouraged to contribute improvements. The configuration, along with documentation of TVM's CI infrastructure, is in the [tlc-pack/ci](https://github.com/tlc-pack/ci) repo.
File renamed without changes.
8 changes: 4 additions & 4 deletions jenkins/generate.py → ci/jenkins/generate.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@
from pathlib import Path


REPO_ROOT = Path(__file__).resolve().parent.parent
JENKINSFILE_TEMPLATE = REPO_ROOT / "jenkins" / "Jenkinsfile.j2"
REPO_ROOT = Path(__file__).resolve().parent.parent.parent
JENKINSFILE_TEMPLATE = REPO_ROOT / "ci" / "jenkins" / "Jenkinsfile.j2"
JENKINSFILE = REPO_ROOT / "Jenkinsfile"


Expand Down Expand Up @@ -111,10 +111,10 @@ def lines_without_generated_tag(content):
Newly generated Jenkinsfile did not match the one on disk! If you have made
edits to the Jenkinsfile, move them to 'jenkins/Jenkinsfile.j2' and
regenerate the Jenkinsfile from the template with

python3 -m pip install -r jenkins/requirements.txt
python3 jenkins/generate.py

Diffed changes:
"""
).strip()
Expand Down
File renamed without changes.
File renamed without changes.
Loading