Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fast check skipping repo download #425

Merged
merged 1 commit into from
Mar 30, 2024

Conversation

david-baylibre
Copy link
Contributor

We might not be the only ones to work with big repositories and it takes a long time just to check versions.
Donwloading the whole repo consumes cpu, memory, disk and bandwidth.

If the check runs with no filter, and 1 version only is requested (default as of today), the latest commit can be retrieved with a simple git ls-remote. this prevents the repository from being downloaded.
It's a huge gain, checks run instantly without using any resource!

It'd be nice to change the disable_ci_skip default value to true - people would need to request explicitly to skip commit messages containing "ci skip" - so the fast check would become the default behaviour. What do you guys think?

If the check runs with no filter and 1 version only is requested,
the latest commit can be retrieved with a simple git ls-remote.
this prevents the repository from being downloaded.

Signed-off-by: David Rozé <droze@baylibre.com>
@taylorsilva
Copy link
Member

@david-baylibre This PR that was just merged might solve this for you since they also deal with large repos: #418

@david-baylibre
Copy link
Contributor Author

@taylorsilva The bare clone makes a great improvement no matter the parameters and this is awesome.
git ls-remote goes one step further returning the version instantly in the case we want the latest version and no filter are applied, which may cover many pipelines I believe.

Here are some numbers testing the kernel repo check (2.5GB) to get a better picture.
I ran the following command on each version:
time echo '{ "source": { "uri": "https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git", "branch": "linux-6.1.y", "version_depth":1, "disable_ci_skip":true }}' | check

Legacy version (v1.15.0):

Cloning into '/tmp/git-resource-repo-cache'...
remote: Enumerating objects: 9202576, done.
remote: Counting objects: 100% (6741/6741), done.
remote: Compressing objects: 100% (2339/2339), done.
remote: Total 9202576 (delta 4609), reused 5709 (delta 4401), pack-reused 9195835
Receiving objects: 100% (9202576/9202576), 2.49 GiB | 46.67 MiB/s, done.
Resolving deltas: 100% (7541062/7541062), done.
Updating files: 100% (78746/78746), done.
[
  {
    "ref": "e5cd595e23c1a075359a337c0e5c3a4f2dc28dd1"
  }
]

-> real 3m10.822s

git clone --bare (v1.16.0):

Cloning into bare repository '/tmp/git-resource-repo-cache'...
warning: filtering not recognized by server, ignoring
remote: Enumerating objects: 9202576, done.
remote: Counting objects: 100% (6741/6741), done.
remote: Compressing objects: 100% (2339/2339), done.
remote: Total 9202576 (delta 4609), reused 5709 (delta 4401), pack-reused 9195835
Receiving objects: 100% (9202576/9202576), 2.49 GiB | 47.52 MiB/s, done.
Resolving deltas: 100% (7541062/7541062), done.
[
  {
    "ref": "e5cd595e23c1a075359a337c0e5c3a4f2dc28dd1"
  }
]

-> real 1m39.176s

With git ls-remote (Proposal):

[
  {
    "ref": "e5cd595e23c1a075359a337c0e5c3a4f2dc28dd1"
  }
]

-> real 0m0.665s

Some repos are much bigger than 2.50GB, leading to a major waste of time prior to builds

Copy link
Member

@taylorsilva taylorsilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks the explanation! I feel happy merging this in since all the tests passed and I can't think of any scenario where this might break someone's pipeline.

@taylorsilva taylorsilva merged commit abb0629 into concourse:master Mar 30, 2024
2 checks passed
@david-baylibre
Copy link
Contributor Author

Thanks @taylorsilva for the merge.

What do you think about the following proposal now the patch has been merged?
"It'd be nice to change the disable_ci_skip default value to true - people would need to request explicitly to skip commit messages containing "ci skip" - so the fast check would become the default behaviour. What do you guys think?"

@taylorsilva
Copy link
Member

That's worth discussing, but let's not have that discussion in this closed PR which will be hard for people to find. Feel free to open an issue about this for everyone to discuss.

On the face of it (so haven't thought deeply about it), I'm hesitant to change defaults on people, but could be convinced if we guarantee version finding behaviour is the same.

@marco-m-pix4d
Copy link

@taylorsilva if I am not mistaken, this PR has been merged after that current latest tag, v1.16.
Would it be possible to issue a new tag v1.17 so that we can pick this up easily, or would this have unintended consequences ? Merci :-)

@taylorsilva
Copy link
Member

@marco-m-pix4d Better late than never! https://github.com/concourse/git-resource/releases/tag/v1.17.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants