Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci(pr-test): use GitHub API to get changed files + diff #8952

Merged
merged 9 commits into from
Dec 6, 2022

Conversation

yin1999
Copy link
Member

@yin1999 yin1999 commented Oct 1, 2022

Description

get diff file and changed files' name from GitHub API is much faster.

Motivation

as gh is an built-in GitHub cli in Action runner, we can use this tool to get changed files, which is much faster than git diff command (must fetch the history commits).

Additional details

using gh cli

Only used 0 seconds.

https://github.com/yin1999/translated-content/actions/runs/3580409184/jobs/6022509363#step:6:4

image

using git diff (used by technote-space/get-diff-action)

used 40 seconds.

https://github.com/mdn/translated-content/actions/runs/3163384214/jobs/5150888786#step:7:1

image

conclusion

Using gh cli could speed up the PR test workflow. But when we use this, we must get the diff file from https://github.com/ower/repo/compare/{base}...{head}.diff and this diff file just show the diff in changed files (which may differ from git diff which will collected the changes in the main branch).

@github-actions github-actions bot added the system Infrastructure and configuration for the project label Oct 1, 2022
.github/workflows/pr-test.yml Show resolved Hide resolved
.github/workflows/pr-test.yml Show resolved Hide resolved
.github/workflows/pr-test.yml Outdated Show resolved Hide resolved
@yin1999
Copy link
Member Author

yin1999 commented Oct 1, 2022

If this is ok, I'd like to reflect the changes in mdn/content

@yin1999 yin1999 marked this pull request as ready for review October 1, 2022 08:51
@yin1999 yin1999 requested a review from a team as a code owner October 1, 2022 08:52
Copy link
Contributor

@caugner caugner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and promising, thanks! 🎉

Just some comments:

.github/workflows/pr-test.yml Outdated Show resolved Hide resolved
.github/workflows/pr-test.yml Outdated Show resolved Hide resolved
.github/workflows/pr-test.yml Outdated Show resolved Hide resolved
@yin1999 yin1999 requested a review from caugner October 20, 2022 11:14
Comment on lines 150 to 174
# note that we should get the absolute path of the changed images
run: |
DIFF_IMAGES=$(cat ${{ env.CHANGED_FILES }} | egrep -i "^files/.*/.*\.(png|jpeg|jpg|gif|svg|webp)$" | xargs readlink -e | tr "\n" " ")
echo "GIT_DIFF_IMAGES=$DIFF_IMAGES" >> $GITHUB_ENV
Copy link
Member Author

@yin1999 yin1999 Oct 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I've ignored to export absolute path here, see testing pr (yin1999#8):

Now, this seems ok.

@yin1999
Copy link
Member Author

yin1999 commented Oct 21, 2022

marked as draft until I figure out the workflow run (https://github.com/yin1999/translated-content/actions/runs/3293826528/jobs/5430706523). It seems to be not right.

@yin1999 yin1999 marked this pull request as draft October 21, 2022 00:54
@yin1999

This comment was marked as outdated.

@yin1999

This comment was marked as outdated.

get diff file and changed files' name from GitHub API is much faster.
@yin1999 yin1999 marked this pull request as ready for review November 30, 2022 04:49
Copy link
Member Author

@yin1999 yin1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 50 to 52
gh api repos/{owner}/{repo}/compare/${{ env.BASE_SHA }}...${{ env.HEAD_SHA }} \
--paginate \
--jq '.files | .[] | select(.status|IN("added", "modified", "renamed", "copied", "changed")) | .filename' > ${{ env.CHANGED_FILES }}
Copy link
Member Author

@yin1999 yin1999 Nov 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For there maybe too many files has been changed in a single PR, the gh pr view command used before can only fetch up to 100 changed files. But for almost all the PR, the commit in it would be less than 30 commits. So we can use github rest API to get changed files. (See https://docs.github.com/rest/commits/commits?apiVersion=2022-11-28#compare-two-commits). This API will limit 30 commits per page. So for those must few PRs, we can use --paginate (see options in https://cli.github.com/manual/gh_api) to fetch all the commits.

Addition: The API limitation from GitHub Actions with GitHub Token would be 1000 requests per hour. for a single repo This should be enough for most of cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is ok. Should we try it in Markdown lint (PR) first? There are some failed workflows (cannot get changed files), and it is less important than the current CI.

@@ -132,7 +124,7 @@ jobs:
# directory.
# The purpose of this is for the PR Review Companion to later
# be able to use this raw diff file for the benefit of analyzing.
git diff --diff-filter=ACMRT ${{ github.event.pull_request.base.sha }} ${{ github.sha }} > $BUILD_OUT_ROOT/DIFF
wget https://github.com/${{ github.repository }}/compare/${{ env.BASE_SHA }}...${{ env.HEAD_SHA }}.diff -O $BUILD_OUT_ROOT/DIFF
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can see the diff url in https://docs.github.com/rest/commits/commits#compare-two-commits

This may avoid an additional api call (making it harder to reach the limitation)

@caugner
Copy link
Contributor

caugner commented Dec 6, 2022

@yin1999 This has probably been discussed before, but why can we actually not use git ls-files? My expectation would be that the workflow run in the context of the PR has the history of all commits between BASE_SHA and HEAD_SHA. Is this not so?

@yin1999
Copy link
Member Author

yin1999 commented Dec 6, 2022

My expectation would be that the workflow run in the context of the PR has the history of all commits between BASE_SHA and HEAD_SHA. Is this not so?

Actually not (see: actions/checkout#520), the actions/checkout@v3 action could only fetch specified n depth (with fetch-depth input). By default, the param would be 1 (equivalent to git clone --depth=1), so the commit history between BASE_SHA to HEAD_SHA will not be included in the workflow context.

image

If we need to fetch commits that are in a single PR (branch), we need to use git fetch with enough depth (technote-space/get-diff-action's implement) which may take too much time.

And I've created another PR for markdown-lint (PR), to deal with the workflow failure. If the solution seems ok, could we try on markdown-lint. Because this workflow may be more important for us.

@yin1999 yin1999 self-assigned this Dec 6, 2022
@caugner
Copy link
Contributor

caugner commented Dec 6, 2022

@yin1999 Understood, thanks for clarifying!

Could you update this PR to align with the changes in #10390 (i.e. without the temporary file)? And then it should be good to go. 🙏

@yin1999
Copy link
Member Author

yin1999 commented Dec 6, 2022

@caugner caugner changed the title perf: use GitHub API to get DIFF ci(pr-test): use GitHub API to get changed files + diff Dec 6, 2022
@yin1999 yin1999 requested a review from caugner December 6, 2022 15:33
@caugner caugner merged commit abf6e70 into mdn:main Dec 6, 2022
@caugner
Copy link
Contributor

caugner commented Dec 6, 2022

Thank you @yin1999, this is awesome! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
system Infrastructure and configuration for the project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants