Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Options like sparse mode #172

Open
jjangga0214 opened this issue Mar 2, 2020 · 26 comments
Open

Options like sparse mode #172

jjangga0214 opened this issue Mar 2, 2020 · 26 comments
Labels
enhancement New feature or request

Comments

@jjangga0214
Copy link

jjangga0214 commented Mar 2, 2020

Hi.

GitHub is internally evaluating partial-clone (Actually GitLab already provides the feature!).

This can be useful for a large monorepo with sparse-checkout(see git sparse-checkout --help) feature.

Thus, hereby I request support to those features.

Thanks.

workaround

Here is a workaround suggestion for folks who want to customize the whole process, though I haven't tested.

$ REPO="https://${GITHUB_ACTOR}:${GITHUB_TOKEN}@github.com/${GITHUB_REPOSITORY}.git"

$ git clone --your-options $REPO

$ git checkout --your-options <your-subcommand> 
@annulen
Copy link

annulen commented Mar 3, 2020

FWIW, Travis CI also supports sparse checkout, described at https://docs.travis-ci.com/user/customizing-the-build#git-sparse-checkout

It would be even better to have full-fledged partial-clone though.

@ericsciple
Copy link
Contributor

actions/checkout@v2 is optimized to fetch only a single commit by default, so you should get some of the same benefits (no unnecessary history)

However doesn't solve the files-outside-of-the-user’s-work-area-in-the-tree problem.

@annulen
Copy link

annulen commented Mar 10, 2020

It might be a good idea to add no-checkout option for action, so that people could customize checkout command directly without reimplementing initial setup

@annulen
Copy link

annulen commented Mar 10, 2020

For now I'm using this

    - name: Sparse checkout
      shell: bash
      run: |
        REPO="https://${GITHUB_ACTOR}:${GITHUB_TOKEN}@github.com/${GITHUB_REPOSITORY}.git"
        BRANCH="${GITHUB_REF/#refs\/heads\//}"

        # Following code is based on logs of actions/checkout@v, with sparseCheckout stuff inserted in the middle
        echo "Syncing repository: $GITHUB_REPOSITORY"
        echo "Working directory is '$(pwd)' GITHUB_WORKSPACE=$GITHUB_WORKSPACE BRANCH=$BRANCH"
        git version
        git init $GITHUB_WORKSPACE
        git remote add origin https://github.com/$GITHUB_REPOSITORY
        git config --local gc.auto 0
        # Now interesting part
        git config core.sparseCheckout true
        # Add here contents of sparse-checkout line by line
        echo "..." >> .git/info/sparse-checkout
        # echo ... 
        git -c protocol.version=2 fetch --no-tags --prune --progress --depth=10 origin +${GITHUB_SHA}:refs/remotes/origin/${BRANCH}
        git checkout --progress --force -B $BRANCH refs/remotes/origin/$BRANCH

Not ideal but still saves quite a bit of time in my case

@dscho
Copy link
Contributor

dscho commented Apr 11, 2020

Not sure whether you are aware of the partial clone feature? It is still somewhat experimental, and I would be loathe to risk overloading our servers (especially during COVID-19, when we're running at or over capacity, human-wise), but maybe it would be worth playing with it for a few moments (in coordination with @github/git-core and @github/git-systems, maybe)?

@annulen
Copy link

annulen commented Apr 11, 2020

In git 2.26 partial clone does not play well with --depth, causing massive regression

@dscho
Copy link
Contributor

dscho commented Apr 11, 2020

In git 2.26 partial clone does not play well with --depth, causing massive regression

Right, I saw some discussion on the Git mailing list, but I wasn't able to monitor that (COVID-19 🌧️).

The thing is: typically the bulk of the payload consists of blobs. The trees and commits are usually pretty light-weight. In other words, a non-shallow partial clone (that is then populated via a sparse checkout) can be a lot faster than a shallow clone. At least that's what our friends over at Google reported internally.

@annulen
Copy link

annulen commented Apr 11, 2020

Right, I saw some discussion on the Git mailing list

And I was the one who brought the topic

non-shallow partial clone (that is then populated via a sparse checkout) can be a lot faster than a shallow clone

I guess it may be true for some class of repositories, but definitely doesn't hold in my case.

@annulen
Copy link

annulen commented Apr 11, 2020

In the context of actions/checkout I think it would be better to provide settings controlling checkout patterns and --filter, just like it allows controlling depth now

@dscho
Copy link
Contributor

dscho commented Apr 11, 2020

Right, I saw some discussion on the Git mailing list

And I was the one who brought the topic

Whoops, I did not realize that you were the one, sorry ;-)

@fawazahmed0
Copy link

fawazahmed0 commented Sep 8, 2020

This is what I did to get partial clone:

      - name: Partial Clone
        run: |
          REPO="https://${GITHUB_ACTOR}:${{ secrets.GITHUB_TOKEN }}@github.com/${GITHUB_REPOSITORY}.git"
          git clone --filter=blob:none --no-checkout --depth 1  --sparse $REPO .
          git sparse-checkout init --cone
          git sparse-checkout add "folder1" "folder2/folder3"
          git checkout

Incase you want more power over what gets cloned, you might want to avoid the cone option

Refer this and this for more details

@dscho
Copy link
Contributor

dscho commented Sep 9, 2020

          git config --global user.email github-actions@github.com
          git config --global user.name github-actions

You don't really need to set the user.email or user.name, right?

@fawazahmed0
Copy link

fawazahmed0 commented Sep 9, 2020

          git config --global user.email github-actions@github.com
          git config --global user.name github-actions

You don't really need to set the user.email or user.name, right?

Yeah it's optional, only required if you want to commit and push the changes

@dscho
Copy link
Contributor

dscho commented Sep 9, 2020

Yeah it's optional, only required if you want to commit and push the changes

But shouldn't that be configured if/when a commit is to be created, rather than already during the checkout?

@fawazahmed0
Copy link

Yeah it's optional, only required if you want to commit and push the changes

But shouldn't that be configured if/when a commit is to be created, rather than already during the checkout?

Yep, you are right, I have updated the text, it was only relevant in my case, as I wanted to push the changes as well

@fawazahmed0
Copy link

fawazahmed0 commented Sep 10, 2020

Does anyone know how to checkout the PR in actions? This step only seems to work on push events and not during PR event

@dscho
Copy link
Contributor

dscho commented Sep 14, 2020

Does anyone know how to checkout the PR in actions?

I think you will have to use ${{github.head_ref}} (see https://docs.github.com/en/actions/reference/context-and-expression-syntax-for-github-actions), probably guarding the step behind an if: github.event_name == 'pull_request' condition.

@fawazahmed0
Copy link

Does anyone know how to checkout the PR in actions?

I think you will have to use ${{github.head_ref}} (see https://docs.github.com/en/actions/reference/context-and-expression-syntax-for-github-actions), probably guarding the step behind an if: github.event_name == 'pull_request' condition.

I was looking into those variables, what they were storing.
This is what they seems to store for this PR:

echo ${{ github.head_ref }}     ->  renovate/playwright-1.x
echo ${{ github.base_ref }}     ->  master
echo ${{ github.event.pull_request.head.sha }}    ->  8795a56fa8a91017e212c0311c17b4e6df1df512
echo ${{ github.event.pull_request.head.ref }}      ->  renovate/playwright-1.x

I can checkout the PR using sha, but pushing the changes to the PR seems to be a problem.
I guess the right way of cloning the PR is by using PR ID number, as shown here, which none of the above context variables seems to store.

And yeah thanks for helping out

@dscho
Copy link
Contributor

dscho commented Sep 14, 2020

If you want to push to a PR, things get really awkward and sometimes impossible. To push successfully to a PR, you will have to know the URL of the originating repository (there might be a workflow variable to help you with that), but the contributor will also need to have checked the checkbox "Allow edits by maintainers", and then I am still uncertain that secrets.GITHUB_TOKEN would be enough to push there. And no, you cannot update the branch by pushing to refs/pull/<number>/head, that's prohibited.

@fawazahmed0
Copy link

If you want to push to a PR, things get really awkward and sometimes impossible. To push successfully to a PR, you will have to know the URL of the originating repository (there might be a workflow variable to help you with that), but the contributor will also need to have checked the checkbox "Allow edits by maintainers", and then I am still uncertain that secrets.GITHUB_TOKEN would be enough to push there. And no, you cannot update the branch by pushing to refs/pull/<number>/head, that's prohibited.

I didn't really thought of "Allow edits by maintainers" checkbox, just tried out committing to this PR with checkbox unchecked and it give permission denied error at Github desktop

You are correct, this will get complicated, I think it's better to wait until sparse mode is added in checkout actions, and anyways I don't really need that right now, but that was a good thing to have as well.

Thanks

@mambax
Copy link

mambax commented Feb 16, 2021

Any update here?

@dscho
Copy link
Contributor

dscho commented Feb 18, 2021

@mambax how about giving it a try yourself? Just

  1. add a new parameter to the inputs: section in https://github.com/actions/checkout/blob/main/action.yml
  2. parse the parameter in https://github.com/actions/checkout/blob/main/src/input-helper.ts
  3. add the parameter to the IGitSourceSettings interface: https://github.com/actions/checkout/blob/main/src/git-source-settings.ts
  4. adjust the fetch() method according to the new parameter:
    async fetch(refSpec: string[], fetchDepth?: number): Promise<void> {
    const args = ['-c', 'protocol.version=2', 'fetch']
    if (!refSpec.some(x => x === refHelper.tagsRefSpec)) {
    args.push('--no-tags')
    }
    args.push('--prune', '--progress', '--no-recurse-submodules')
    if (fetchDepth && fetchDepth > 0) {
    args.push(`--depth=${fetchDepth}`)
    } else if (
    fshelper.fileExistsSync(
    path.join(this.workingDirectory, '.git', 'shallow')
    )
    ) {
    args.push('--unshallow')
    }
    args.push('origin')
    for (const arg of refSpec) {
    args.push(arg)
    }
    const that = this
    await retryHelper.execute(async () => {
    await that.execGit(args)
    })
    }
  5. add documentation to the README: https://github.com/actions/checkout/blob/main/README.md
  6. test, test, test in your fork
  7. open a PR

@mambax
Copy link

mambax commented Feb 18, 2021

🤣 Can do but since the last update was 5 months ago I thought I ask 😭
@dscho Are you sure it is that straightforward? My first glance says that async fetch() uses git fetch. When I go and read the documentation of git-fetch I do not see a parameter (with my limited git knowledge) that would allow specifying a folder.
I although see https://git-scm.com/docs/git-sparse-checkout, which seems to support his case.

@dscho
Copy link
Contributor

dscho commented Feb 19, 2021

Are you sure it is that straightforward?

It should be pretty straight-forward.

My first glance says that async fetch() uses git fetch. When I go and read the documentation of git-fetch I do not see a parameter (with my limited git knowledge) that would allow specifying a folder.

Right. I only pointed to the part where git fetch is executed, and obviously it would need to be called with those --filter=blob:none --no-checkout --depth 1 options mentioned here: #172 (comment). (Even if git fetch's documentation does not talk about --filter, that command does support the option.)

What I forgot to say is that the git checkout part also needs to be adjusted accordingly. That is, if input parameters for the sparse mode are provided, these commands still need to be run before git checkout (you can probably just add a sister function to checkout and call it just before checkout() is called:

git sparse-checkout init --cone
git sparse-checkout add "folder1" "folder2/folder3"

Feel free to point me to your code if you get stuck with this.

@viceice
Copy link

viceice commented Nov 17, 2022

I also like to see --filter=blob:none , because for most automated release tools we need the git tags to compute the next version. but we only need the blobs for the current branch

@vanZeben vanZeben added the enhancement New feature or request label Dec 20, 2022
@jakub-g
Copy link

jakub-g commented Feb 15, 2023

Related issue about support for filter: #663

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

9 participants