-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use sparse checkout for generate matrix job #1452
Conversation
- pwsh: | | ||
git clone --no-checkout --filter=tree:0 git://github.com/$(Build.Repository.Name).git . | ||
git sparse-checkout init | ||
if ("${{ parameters.AdditionalParameters.ServiceDirectory }}") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can AdditionalParameters.ServiceDirectory
be a list?
This is a lead to a wider question on whether a more selective checkout could be used across other pipelines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, ServiceDirectory as it is passed in through the pipelines is a string value. In the java repo, we've started to add another parameter TestResourceDirectories
to accommodate a list. My plan is to eventually support a list of directories (for this and ARM templates) across the repositories, so my thinking is when that change comes, we can make the updates here based on what it looks like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yes, I think supporting an opt-in to selective checkout for ALL live test jobs is going to be nice to have, but there are more potential edge cases there. I'd like to start investigating that as a follow-on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to include a list or a service directory default.
eng/common/pipelines/templates/jobs/archetype-sdk-tests-generate.yml
Outdated
Show resolved
Hide resolved
eng/common/pipelines/templates/jobs/archetype-sdk-tests-generate.yml
Outdated
Show resolved
Hide resolved
eng/common/pipelines/templates/jobs/archetype-sdk-tests-generate.yml
Outdated
Show resolved
Hide resolved
The following pipelines have been queued for testing: |
Pipeline example using this new mode: https://dev.azure.com/azure-sdk/internal/_build/results?buildId=758564&view=logs&j=6865b188-e449-5409-40f9-44b712c50e56&t=48ee3de9-3dc4-5425-2d15-671a2278bddc |
The following pipelines have been queued for testing: |
Given things aren't on fire right now, I wonder if it is worth taking this opportunity to try and generalize this into a template that takes a list of directories. So that we can do something like this: - template: eng/common/pipelines/templates/steps/sparse-checkout.yml
parameters:
Paths:
- eng/
- eng/${{parameters.ServiceDirectory}} You're probably already thinking this, but I was thinking you might be able to jump straight to the end-game here at the expense of a few more days of longer job generation (which isn't going to hurt too much). |
eng/common/pipelines/templates/jobs/archetype-sdk-tests-generate.yml
Outdated
Show resolved
Hide resolved
@mitchdenny unfortunately I can't use a template, because in order to load templates, we need to have a repository checked out. So it's kind of a chicken or egg problem. I could add a directory parameters argument as a placeholder now that we can extend into going forward? |
Templates are evaluated server side. You shouldn't need to checkout in order to use them. |
@mitchdenny ahh, ok I was confusing script paths with templates. I'll try the latter. |
@weshaggard @mitchdenny I updated this to be included as a template, include a list of paths, and to handle private + |
This pull request is protected by Check Enforcer. What is Check Enforcer?Check Enforcer helps ensure all pull requests are covered by at least one check-run (typically an Azure Pipeline). When all check-runs associated with this pull request pass then Check Enforcer itself will pass. Why am I getting this message?You are getting this message because Check Enforcer did not detect any check-runs being associated with this pull request within five minutes. This may indicate that your pull request is not covered by any pipelines and so Check Enforcer is correctly blocking the pull request being merged. What should I do now?If the check-enforcer check-run is not passing and all other check-runs associated with this PR are passing (excluding license-cla) then you could try telling Check Enforcer to evaluate your pull request again. You can do this by adding a comment to this pull request as follows: |
The following pipelines have been queued for testing: |
displayName: Generate Job Matrix | ||
steps: | ||
# Skip sparse checkout for the `azure-sdk-for-<lang>-pr` private mirrored repositories | ||
# as we require the github service connection to be loaded. | ||
- ${{ if and(parameters.SparseCheckout, not(contains(variables['Build.DefinitionName'], '-pr - '))) }}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it is better to check against the pipeline or repo name? I have a slight preference for the repo name so we aren't as dependent on someone failing to name their pipeline correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Repo name isn't available at template build time :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
booo... I guess we will have to see how this works out. I don't like how loosely coupled this is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect this is going to come back to bite us once we start using it in more places in the private repos, but for now I guess we can start here and then reconsider other options if we hit an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not a fan of this approach but can't find an alternative. My thinking is also that the pr
repos are right now our only standardized name private repos, and people that need to do temporary feature/pre-release testing can just keep SparseCheckout: false
in their branch until they need to go live.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if they didn't have to have a code commit that would need to be undone once the merge to public. We could consider a pipeline variable that someone could set to opt-in/out of this but lets see if any when this becomes an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable would make the template harder to define since we'd have to use a condition. I was using a SparseCheckout
top-level parameter myself for testing, but if I had to check-enable it every time for private repo testing I think I would end up making a commit anyway as part of my workflow.
eng/common/pipelines/templates/jobs/archetype-sdk-tests-generate.yml
Outdated
Show resolved
Hide resolved
The following pipelines have been queued for testing: |
|
||
jobs: | ||
- job: generate_matrix | ||
variables: | ||
displayNameFilter: $[ coalesce(variables.jobMatrixFilter, '.*') ] | ||
pool: | ||
name: Azure Pipelines | ||
vmImage: ubuntu-18.04 | ||
name: ${{ parameters.Pool }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to parameterize this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I have it this way is I want the /eng/common/scripts/job-matrix/samples/matrix-test.yml
to be a working sample (partly because non-working samples are bad, and because I use it for testing changes like these). Since I test this sample pipeline in the playground project, it doesn't have access to the 1es pool, hence why I need to override the pool/image.
- checkout: none | ||
|
||
- pwsh: | | ||
git clone --no-checkout --filter=tree:0 git://github.com/$(Build.Repository.Name) . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is probably worth adding the repository as a template parameter as I believe this template will be interesting for pipelines we clone multiple pipelines.
I also suggest adding a template parameter for the working directory for these steps so folks can control where this clone happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a repositories object, since with multiple repos we need to pair the working directory with the repo name and commit-ish. Also, I set the default the way it is so that when no repo is specified, we just checkout into the default location, this way any calling template doesn't need to manually specify WorkingDirectory
when there's only one repo.
The following pipelines have been queued for testing: |
type: object | ||
default: | ||
- Name: $(Build.Repository.Name) | ||
Commitish: $(Build.SourceVersion) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOL... what is commitish? Why not just CommitSha?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an official git term, though I knew I'd raise some eyebrows :) Technically it means you can pass a commit or a branch name, or any other valid object ref. I'm open to a better known noun that still explains what valid values are available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I knew we could pass anything here that resolves to a commit but I think this is the first time I've seen this used. I'm fine keeping it.
- ${{ each repo in parameters.Repositories }}: | ||
- pwsh: | | ||
$dir = "${{ coalesce(repo.WorkingDirectory, format('{0}/{1}', '$(System.DefaultWorkingDirectory)', repo.Name)) }}" | ||
New-Item $dir -ItemType Directory -Force |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't the clone handle this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh it doesn't handle it because you are expecting the directory to already exist. We could probably handle this by passing the path to the clone method instead of the .
but I'm indifferent about which approach to take.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with passing a path is then I can't use System.DefaultWorkingDirectory
in any parent pipeline without having to update every subsequent step with a workingDirectory
override for $(System.Default.WorkingDirectory)/repo
. I wanted to keep it simple for single-repo scenarios, since devops has the same behavior (single repo clone to .../1/s
, multi-repo clone to .../1/s/repo
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see you mean the default parameter being .
instead of the variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with that is we need to construct the path to the cloned repo with expressions we can't use in the scope of the parameter definition default, and so we'd have the path reference inline to the code in multiple places, as opposed to just the one place for directory init and then as workingDirectory
everywhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with what you have but I would expect you only need to have it inline in the clone command. All the other git steps can be independent and have the working directory set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple comments but otherwise looks reasonable.
Hello @azure-sdk! Because this pull request has the p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (
|
From a functional standpoint, the generate matrix job only takes a few seconds to run. Depending on the repository, the job can take a total time of 4 minutes before the test jobs are able to start, because azure pipelines does a clone of the entire repository (among other things). This PR makes a change to skip the azure pipelines checkout step, and add a lightweight git clone along with a sparse checkout of just the
eng
and service directory locations (which contain the matrix scripts and/or matrix configs).This change reduces the total job runtime to around 20 seconds from 4 minutes (repo checkouts + auto-injected policy steps).