-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move jenkins/ dir into ci/jenkins and spread docs around #11927
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,97 @@ | ||
<!--- Licensed to the Apache Software Foundation (ASF) under one --> | ||
<!--- or more contributor license agreements. See the NOTICE file --> | ||
<!--- distributed with this work for additional information --> | ||
<!--- regarding copyright ownership. The ASF licenses this file --> | ||
<!--- to you under the Apache License, Version 2.0 (the --> | ||
<!--- "License"); you may not use this file except in compliance --> | ||
<!--- with the License. You may obtain a copy of the License at --> | ||
|
||
<!--- http://www.apache.org/licenses/LICENSE-2.0 --> | ||
|
||
<!--- Unless required by applicable law or agreed to in writing, --> | ||
<!--- software distributed under the License is distributed on an --> | ||
<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY --> | ||
<!--- KIND, either express or implied. See the License for the --> | ||
<!--- specific language governing permissions and limitations --> | ||
<!--- under the License. --> | ||
|
||
# Apache TVM Continuous Integration (CI) | ||
|
||
## Overview | ||
|
||
TVM's Continuous Integration is responsible for verifying the code in `apache/tvm` and testing PRs | ||
before they merge to inform TVM contributors and committers. These jobs are essential to keeping the | ||
TVM project in a healthy state and preventing breakages. CI in TVM is broken into these pieces: | ||
- Lint scripts in [`tests/lint`](../tests/lint). | ||
- The tests themselves, all of which live underneath [`tests`](../tests). | ||
- Definitions of test suites, with each suite defined as a separate `task_` script in | ||
[`tests/scripts`](../tests/scripts). | ||
- The linux test sequence (in [`Jenkinsfile`](../Jenkinsfile)), which lints and builds TVM and runs test | ||
suites using Docker on Linux. | ||
- The Windows and Mac test sequences (in [`.github/actions`](../.github/actions)). | ||
- GitHub Actions that support the code review process (in [`.github/actions`](../.github/actions)). | ||
- Tools to reproduce the CI locally (in `tests/scripts`). | ||
- Infrastructure-as-Code that configures the cloud services that provide Jenkins for the TVM CI (in the | ||
[`tlc-pack/ci`](https://github.com/tlc-pack/ci) repo). | ||
|
||
## CI Documentation Index | ||
|
||
The CI documentation belongs with the implementation it describes. To make that concrete, the | ||
documentation is split like so: | ||
1. An overview of the CI is in this file. | ||
1. User-facing documentation lives in `apache/tvm`'s `docs/contribute` sub-directory and is served on the | ||
[TVM docs site](https://tvm.apache.org/docs/contribute/ci.html). | ||
2. Documentation of the tools that run TVM's various regression tests locally and the test suites | ||
are in this sub-directory. | ||
3. Documentation of the cloud services and their configuration lives in the | ||
[`tlc-pack/ci`](https://github.com/tlc-pack/ci) repo. | ||
|
||
## Jenkins | ||
|
||
Jenkins runs all of the linux-based TVM CI-enabled regression tests. This includes tests against accelerated hardware such as GPUs. It excludes those regression tests that run against hardware not available in the cloud (those tests aren't currently exercised in TVM CI). The tests run by Jenkins represent most of the merge-blocking tests (and passing Jenkins should mostly correlate with passing the remaining Windows/Mac builds). | ||
|
||
## GitHub Actions | ||
|
||
GitHub Actions is used to run Windows jobs, MacOS jobs, and various on-GitHub automations. These are defined in [`.github/workflows`](../.github/workflows/). These automations include bots to: | ||
* [cc people based on subscribed teams/topics](https://github.com/apache/tvm/issues/10317) | ||
* [allow non-committers to merge approved / CI passing PRs](https://discuss.tvm.apache.org/t/rfc-allow-merging-via-pr-comments/12220) | ||
* [add cc-ed people as reviewers on GitHub](https://discuss.tvm.apache.org/t/rfc-remove-codeowners/12095) | ||
* [ping languishing PRs after no activity for a week (currently opt-in only)](https://github.com/apache/tvm/issues/9983) | ||
* [push a `last-successful` branch to GitHub with the last `main` commit that passed CI](https://github.com/apache/tvm/tree/last-successful) | ||
|
||
https://github.com/apache/tvm/actions has the logs for each of these workflows. Note that when debugging these workflows changes from PRs from forked repositories won't be reflected in the PR. These should be tested in the forked repository first and linked in the PR body. | ||
|
||
## Docker Images | ||
|
||
Each CI job runs most of its work inside a Docker container, built from files | ||
in the [`docker/`](../docker) folder. These | ||
files are built nightly in Jenkins via the [docker-images-ci](https://ci.tlcpack.ai/job/docker-images-ci/>) job. | ||
The images for these containers are hosted in the [tlcpack Docker Hub](https://hub.docker.com/u/tlcpack>) | ||
and referenced in the [`Jenkinsfile.j2`](Jenkinsfile.j2). These can be inspected and run | ||
locally via standard Docker commands. | ||
|
||
### `ci-docker-staging` | ||
|
||
The [ci-docker-staging](https://github.com/apache/tvm/tree/ci-docker-staging>) | ||
branch is used to test updates to Docker images and `Jenkinsfile` changes. When | ||
running a build for a normal PR from a forked repository, Jenkins uses the code | ||
from the PR except for the `Jenkinsfile` itself, which comes from the base branch. | ||
When branches are built, the `Jenkinsfile` in the branch is used, so a committer | ||
with write access must push PRs to a branch in apache/tvm to properly test | ||
`Jenkinsfile` changes. If your PR makes changes to the `Jenkinsfile`, make sure | ||
to @ a [committer](../CONTRIBUTORS.md>) | ||
and ask them to push your PR as a branch to test the changes. | ||
|
||
# Jenkins CI | ||
|
||
TVM uses Jenkins for running Linux continuous integration (CI) tests on | ||
[branches](https://ci.tlcpack.ai/job/tvm/) and | ||
[pull requests](https://ci.tlcpack.ai/job/tvm/view/change-requests/) through a | ||
build configuration specified in a [`Jenkinsfile`](../Jenkinsfile). | ||
Other jobs run in GitHub Actions for Windows and MacOS jobs. | ||
|
||
## `Jenkinsfile` | ||
|
||
The template files in this directory are used to generate the [`Jenkinsfile`](../Jenkinsfile) used by Jenkins to run CI jobs for each commit to PRs and branches. | ||
|
||
To regenerate the `Jenkinsfile`, run `make` in the `ci/jenkins` dir. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
/_venv |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
|
||
_venv: requirements.txt | ||
rm -rf _venv | ||
python3 -mvenv _venv | ||
_venv/bin/pip3 install -r requirements.txt | ||
|
||
all: _venv | ||
_venv/bin/python3 generate.py | ||
|
||
.PHONY: all venv | ||
.DEFAULT_GOAL=all |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,7 +17,11 @@ | |
|
||
# TVM CI | ||
|
||
TVM runs CI jobs on every commit to an open pull request and to branches in the apache/tvm repo (such as `main`). These jobs are essential to keeping the TVM project in a healthy state and preventing breakages. Jenkins does most of the work in running the TVM tests, though some smaller jobs are also run on GitHub Actions. | ||
TVM runs CI jobs on every commit to an open pull request and to branches in the apache/tvm repo (such as `main`). These jobs are essential to keeping the TVM project in a healthy state and preventing breakages. | ||
|
||
## Jenkins | ||
|
||
Jenkins runs all of the linux-based TVM CI-enabled regression tests. This includes tests against accelerated hardware such as GPUs. It excludes those regression tests that run against hardware not available in the cloud (those tests aren't currently exercised in TVM CI). The tests run by Jenkins represent most of the merge-blocking tests (and passing Jenkins should mostly correlate with passing the remaining Windows/Mac builds). | ||
|
||
## GitHub Actions | ||
|
||
|
@@ -33,17 +37,20 @@ https://github.com/apache/tvm/actions has the logs for each of these workflows. | |
|
||
## Keeping CI Green | ||
|
||
Developers rely on the TVM CI to get signal on their PRs before merging. | ||
Occasionally breakages slip through and break `main`, which in turn causes | ||
the same error to show up on an PR that is based on the broken commit(s). Broken | ||
commits can be identified [through GitHub](https://github.com/apache/tvm/commits/main>) | ||
via the commit status icon or via [Jenkins](https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/activity?branch=main>). | ||
In these situations it is possible to either revert the offending commit or | ||
submit a forward fix to address the issue. It is up to the committer and commit | ||
author which option to choose, keeping in mind that a broken CI affects all TVM | ||
developers and should be fixed as soon as possible. | ||
Developers rely on the TVM CI to get signal on their PRs before merging. Occasionally breakages | ||
slip through and break `main`, which in turn causes the same error to show up on an unrelated PR | ||
that is based on the broken commit(s). Broken commits can be identified [through | ||
GitHub](https://github.com/apache/tvm/commits/main>) via the commit status icon or via | ||
[Jenkins](https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/activity?branch=main>). In these | ||
situations it is possible to either revert the offending commit or submit a forward fix to address | ||
the issue. It is up to the committer and commit author which option to choose. A broken CI affects | ||
all TVM developers and should be fixed as soon as possible, while a revert may be especially painful | ||
for the author of the offending PR when that PR is large. | ||
|
||
Some tests are also flaky and fail for reasons unrelated to the PR. The [CI monitoring rotation](https://github.com/apache/tvm/wiki/CI-Monitoring-Runbook) watches for these failures and disables tests as necessary. It is the responsibility of those who wrote the test to ultimately fix and re-enable the test. | ||
Some tests are also flaky and occasionally fail for reasons unrelated to the PR. The [CI monitoring | ||
rotation](https://github.com/apache/tvm/wiki/CI-Monitoring-Runbook) watches for these failures and | ||
disables tests as necessary. It is the responsibility of those who wrote the test to ultimately fix | ||
and re-enable the test. | ||
|
||
|
||
## Dealing with Flakiness | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This stuff should be either in .rst or in .md, can you delete one or the other? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. oh whoops, this was pretty sloppy. done. |
||
|
@@ -85,7 +92,7 @@ a name, hash, and path in S3, using the `workflow_dispatch` event on | |
The sha256 must match the file or it will not be uploaded. The upload path is | ||
user-defined so it can be any path (no trailing or leading slashes allowed) but | ||
be careful not to collide with existing resources on accident. | ||
|
||
## Skipping CI | ||
|
||
For reverts and trivial forward fixes, adding `[skip ci]` to the revert's | ||
|
@@ -153,88 +160,4 @@ _venv/bin/python3 jenkins/generate.py | |
|
||
# Infrastructure | ||
|
||
Jenkins runs in AWS on an EC2 instance fronted by an ELB which makes it available at https://ci.tlcpack.ai. These definitions are declared via Terraform in the [tlc-pack/ci-terraform](https://github.com/tlc-pack/ci-terraform) repository. The Terraform code references custom AMIs built in [tlc-pack/ci-packer](https://github.com/tlc-pack/ci-packer). [tlc-pack/ci](https://github.com/tlc-pack/ci) contains Ansible scripts to deploy the Jenkins head node and set it up to interact with AWS. | ||
|
||
The Jenkins head node has a number of autoscaling groups with labels that are used to run jobs (e.g. `CPU`, `GPU` or `ARM`) via the [EC2 Fleet](https://plugins.jenkins.io/ec2-fleet/) plugin. | ||
|
||
## Deploying | ||
|
||
Deploying Jenkins can disrupt developers so it must be done with care. Jobs that are in-flight will be cancelled and must be manually restarted. Follow the instructions [here](https://github.com/tlc-pack/ci/issues/10) to run a deploy. | ||
|
||
## Monitoring | ||
|
||
Dashboards of CI data can be found: | ||
* within Jenkins at https://ci.tlcpack.ai/monitoring (HTTP / JVM stats) | ||
* at https://monitoring.tlcpack.ai (job status, worker status) | ||
|
||
## CI Diagram | ||
|
||
This details the individual parts that interact in TVM's CI. For details on operations, see https://github.com/tlc-pack/ci. | ||
|
||
```mermaid | ||
graph TD | ||
Commit --> GitHub | ||
GitHub --> |`push` webhook| WebhookServer(Webhook Server) | ||
JobExecutor(Job Executor) | ||
WebhookServer --> JobExecutor | ||
JobExecutor --> EC2Fleet(EC2 Fleet Plugin) | ||
EC2Fleet --> |capacity request| EC2(EC2 Autoscaler) | ||
JobExecutor --> WorkerEC2Instance | ||
Docker --> |build cache, artifacts| S3 | ||
WorkerEC2Instance --> Docker | ||
Docker --> |docker pull| G(Docker Hub) | ||
Docker --> |docker push / pull| ECR | ||
Docker --> |Execute jobs| CIScripts(CI Scripts) | ||
RepoCITerraform(ci-terraform repo) --> |terraform| ECR | ||
RepoCITerraform(ci-terraform repo) --> |terraform| EC2 | ||
RepoCITerraform(ci-terraform repo) --> |terraform| S3 | ||
RepoCI(ci repo) --> |configuration via Ansible| WorkerEC2Instance | ||
RepoCIPacker(ci-packer) --> |AMIs| EC2 | ||
Monitoring_Scrapers(Jenkins Scraper) --> Monitoring_DB(Postrgres) | ||
Grafana --> Monitoring_DB | ||
GitHub --> Windows | ||
GitHub --> MacOS | ||
|
||
Developers --> |check PR status|JenkinsUI(Jenkins Web UI) | ||
Monitoring_Scrapers --> |fetch job data| JenkinsUI | ||
Developers --> |git push| Commit | ||
Developers --> |create PR| GitHub | ||
|
||
subgraph Jenkins Head Node | ||
WebhookServer | ||
JobExecutor | ||
EC2Fleet | ||
JenkinsUI | ||
end | ||
|
||
subgraph GitHub Actions | ||
Windows | ||
MacOS | ||
end | ||
|
||
subgraph Configuration / Terraform | ||
RepoCITerraform | ||
RepoCI | ||
RepoCIPacker | ||
end | ||
|
||
subgraph Monitoring | ||
Monitoring_DB | ||
Grafana | ||
Monitoring_Scrapers | ||
end | ||
|
||
subgraph AWS | ||
subgraph Jenkins Workers | ||
WorkerEC2Instance(Worker EC2 Instance) | ||
subgraph "Worker EC2 Instance" | ||
Docker | ||
CIScripts | ||
end | ||
end | ||
EC2 | ||
ECR | ||
S3 | ||
end | ||
|
||
``` | ||
While all TVM tests are contained within the apache/tvm repository, the infrastructure used to run the tests is donated by the TVM Community. To encourage collaboration, the configuration for TVM's CI infrastructure is stored in a public GitHub repository. TVM community members are encouraged to contribute improvements. The configuration, along with documentation of TVM's CI infrastructure, is in the [tlc-pack/ci](https://github.com/tlc-pack/ci) repo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done