Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a workflow to update the viable/strict branch #7435

Merged
merged 4 commits into from
Mar 22, 2023

Conversation

DanilBaibak
Copy link
Contributor

Add a workflow to update the viable/strict branch. The current changes will not affect building nightly builds for now.

@pytorch-bot
Copy link

pytorch-bot bot commented Mar 20, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/7435

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link
Member

@osalpekar osalpekar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great from my end! Will defer stamping to the vision maintainers.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamping

@DanilBaibak DanilBaibak requested a review from pmeier March 21, 2023 08:51
Copy link
Collaborator

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the behavior this adds is to run a workflow every 30 minutes that checks whether all workflows are green and if so advances the viable/strict branch to the current main commit, right?

uses: pytorch/test-infra/.github/workflows/update-viablestrict.yml@main
with:
repository: pytorch/vision
required_check: "build,cmake,lint,Build Linux,Build M1,Build Macos,Tests on Linux,Tests on macOS,Docs,Lint"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These seems to be the name fields of our other workflows, right? For example:

name: Tests on Linux

What happens if we rename one of them and forget to update here? Will this just basically disable the workflow or will we get notified somehow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The job will failed with an error message "missing required workflow". You can find all the logic in the test-infra repo.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we surface this failure? IIUC, this will only visible under the "Actions" tab on GitHub, which we aren't using.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a very good question! The main health indicator of the "Update viable/strict" workflow is "the time lag" - the difference between the last commit in the main and the viable/strict branch. You can find this information for PyTorch. The plan is to have a separate dashboard for domains with "time lag" and number of failed nightly builds (not done yet).

The automated process will look like this:

  1. Calculate the time lag between main and viable/strict.
  2. If the time lag is higher some threshold, send a notification to the team.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the time lag is higher some threshold, send a notification to the team.

Does "team" here mean releng or domain maintainers?

In any case, we need something like that setup before we actually use the viable/strict branch for the nightlies. I'm aware this does not happen in this PR, but just making sure this doesn't get lost in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "team" here mean releng or domain maintainers? - Yes

@pmeier
Copy link
Collaborator

pmeier commented Mar 21, 2023

The current changes will not affect building nightly builds for now.

What is the plan here? Build the nightlies from viable/strict? If so we need to solve all flaky build issues first (flaky tests as well, but that is on us). Looking at the HUD, we rarely have a commit on main where all build workflows are green due to flakiness. Meaning, if we use that as requirement, our nightlies will lag behind main for multiple days until we hit a lucky commit.

@DanilBaibak
Copy link
Contributor Author

This PR will not affect nightly builds. There is an internal cron job that triggers nightly builds. The plan is to switch nightly to viable/strict only when we are sure that viable/strict is continuously updated. But yes, unstable build workflows are a big problem.

Copy link
Collaborator

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @DanilBaibak! Not for this PR, but before we use this system to run our nightly builds, we need to address #7435 (comment).

@DanilBaibak DanilBaibak merged commit f0a1df3 into main Mar 22, 2023
@github-actions
Copy link

Hey @DanilBaibak!

You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py

facebook-github-bot pushed a commit that referenced this pull request Mar 30, 2023
Reviewed By: vmoens

Differential Revision: D44416611

fbshipit-source-id: dc48e48b25225ea8fa32a8ec0611d3f7bf16dd20
@pmeier pmeier mentioned this pull request May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants