Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use sparse checkout for generate matrix job #1452

Merged
2 commits merged into from
Mar 9, 2021
Merged

Use sparse checkout for generate matrix job #1452

2 commits merged into from
Mar 9, 2021

Conversation

benbp
Copy link
Member

@benbp benbp commented Mar 1, 2021

From a functional standpoint, the generate matrix job only takes a few seconds to run. Depending on the repository, the job can take a total time of 4 minutes before the test jobs are able to start, because azure pipelines does a clone of the entire repository (among other things). This PR makes a change to skip the azure pipelines checkout step, and add a lightweight git clone along with a sparse checkout of just the eng and service directory locations (which contain the matrix scripts and/or matrix configs).

This change reduces the total job runtime to around 20 seconds from 4 minutes (repo checkouts + auto-injected policy steps).

@benbp benbp requested a review from a team as a code owner March 1, 2021 17:58
- pwsh: |
git clone --no-checkout --filter=tree:0 git://github.com/$(Build.Repository.Name).git .
git sparse-checkout init
if ("${{ parameters.AdditionalParameters.ServiceDirectory }}") {
Copy link
Member

@alzimmermsft alzimmermsft Mar 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can AdditionalParameters.ServiceDirectory be a list?

This is a lead to a wider question on whether a more selective checkout could be used across other pipelines

Copy link
Member Author

@benbp benbp Mar 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, ServiceDirectory as it is passed in through the pipelines is a string value. In the java repo, we've started to add another parameter TestResourceDirectories to accommodate a list. My plan is to eventually support a list of directories (for this and ARM templates) across the repositories, so my thinking is when that change comes, we can make the updates here based on what it looks like.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yes, I think supporting an opt-in to selective checkout for ALL live test jobs is going to be nice to have, but there are more potential edge cases there. I'd like to start investigating that as a follow-on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to include a list or a service directory default.

@azure-sdk
Copy link
Collaborator

The following pipelines have been queued for testing:
java - template
java - template - tests
js - template
net - template
net - template - tests
python - template
python - template - tests
You can sign off on the approval gate to test the release stage of each pipeline.
See eng/common workflow

@benbp benbp requested a review from weshaggard March 1, 2021 18:26
@azure-sdk
Copy link
Collaborator

The following pipelines have been queued for testing:
java - template
java - template - tests
js - template
net - template
net - template - tests
python - template
python - template - tests
You can sign off on the approval gate to test the release stage of each pipeline.
See eng/common workflow

@mitchdenny
Copy link
Contributor

Given things aren't on fire right now, I wonder if it is worth taking this opportunity to try and generalize this into a template that takes a list of directories. So that we can do something like this:

- template: eng/common/pipelines/templates/steps/sparse-checkout.yml
  parameters:
    Paths:
      - eng/
      - eng/${{parameters.ServiceDirectory}}

You're probably already thinking this, but I was thinking you might be able to jump straight to the end-game here at the expense of a few more days of longer job generation (which isn't going to hurt too much).

@benbp
Copy link
Member Author

benbp commented Mar 1, 2021

@mitchdenny unfortunately I can't use a template, because in order to load templates, we need to have a repository checked out. So it's kind of a chicken or egg problem. I could add a directory parameters argument as a placeholder now that we can extend into going forward?

@mitchdenny
Copy link
Contributor

Templates are evaluated server side. You shouldn't need to checkout in order to use them.

@benbp
Copy link
Member Author

benbp commented Mar 1, 2021

Templates are evaluated server side. You shouldn't need to checkout in order to use them.

@mitchdenny ahh, ok I was confusing script paths with templates. I'll try the latter.

@benbp benbp self-assigned this Mar 2, 2021
@benbp benbp added the Central-EngSys This issue is owned by the Engineering System team. label Mar 2, 2021
@benbp
Copy link
Member Author

benbp commented Mar 4, 2021

@weshaggard @mitchdenny I updated this to be included as a template, include a list of paths, and to handle private + -pr repositories. For -pr repos, I had to filter by build definition name, since the repo name variable isn't available at template compile time. If we wanted to do it that way instead, I'd have to add a condition to all the steps in the sparse checkout template.

@check-enforcer-staging
Copy link

This pull request is protected by Check Enforcer.

What is Check Enforcer?

Check Enforcer helps ensure all pull requests are covered by at least one check-run (typically an Azure Pipeline). When all check-runs associated with this pull request pass then Check Enforcer itself will pass.

Why am I getting this message?

You are getting this message because Check Enforcer did not detect any check-runs being associated with this pull request within five minutes. This may indicate that your pull request is not covered by any pipelines and so Check Enforcer is correctly blocking the pull request being merged.

What should I do now?

If the check-enforcer check-run is not passing and all other check-runs associated with this PR are passing (excluding license-cla) then you could try telling Check Enforcer to evaluate your pull request again. You can do this by adding a comment to this pull request as follows:
/check-enforcer evaluate
Typically evaulation only takes a few seconds. If you know that your pull request is not covered by a pipeline and this is expected you can override Check Enforcer using the following command:
/check-enforcer override
Note that using the override command triggers alerts so that follow-up investigations can occur (PRs still need to be approved as normal).

@azure-sdk
Copy link
Collaborator

The following pipelines have been queued for testing:
java - template
java - template - tests
js - template
net - template
net - template - tests
python - template
python - template - tests
You can sign off on the approval gate to test the release stage of each pipeline.
See eng/common workflow

displayName: Generate Job Matrix
steps:
# Skip sparse checkout for the `azure-sdk-for-<lang>-pr` private mirrored repositories
# as we require the github service connection to be loaded.
- ${{ if and(parameters.SparseCheckout, not(contains(variables['Build.DefinitionName'], '-pr - '))) }}:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it is better to check against the pipeline or repo name? I have a slight preference for the repo name so we aren't as dependent on someone failing to name their pipeline correctly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repo name isn't available at template build time :(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

booo... I guess we will have to see how this works out. I don't like how loosely coupled this is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this is going to come back to bite us once we start using it in more places in the private repos, but for now I guess we can start here and then reconsider other options if we hit an issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not a fan of this approach but can't find an alternative. My thinking is also that the pr repos are right now our only standardized name private repos, and people that need to do temporary feature/pre-release testing can just keep SparseCheckout: false in their branch until they need to go live.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if they didn't have to have a code commit that would need to be undone once the merge to public. We could consider a pipeline variable that someone could set to opt-in/out of this but lets see if any when this becomes an issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable would make the template harder to define since we'd have to use a condition. I was using a SparseCheckout top-level parameter myself for testing, but if I had to check-enable it every time for private repo testing I think I would end up making a commit anyway as part of my workflow.

@benbp benbp requested a review from weshaggard March 5, 2021 17:48
@azure-sdk
Copy link
Collaborator

The following pipelines have been queued for testing:
java - template
java - template - tests
js - template
net - template
net - template - tests
python - template
python - template - tests
You can sign off on the approval gate to test the release stage of each pipeline.
See eng/common workflow


jobs:
- job: generate_matrix
variables:
displayNameFilter: $[ coalesce(variables.jobMatrixFilter, '.*') ]
pool:
name: Azure Pipelines
vmImage: ubuntu-18.04
name: ${{ parameters.Pool }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to parameterize this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I have it this way is I want the /eng/common/scripts/job-matrix/samples/matrix-test.yml to be a working sample (partly because non-working samples are bad, and because I use it for testing changes like these). Since I test this sample pipeline in the playground project, it doesn't have access to the 1es pool, hence why I need to override the pool/image.

- checkout: none

- pwsh: |
git clone --no-checkout --filter=tree:0 git://github.com/$(Build.Repository.Name) .
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably worth adding the repository as a template parameter as I believe this template will be interesting for pipelines we clone multiple pipelines.

I also suggest adding a template parameter for the working directory for these steps so folks can control where this clone happens.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a repositories object, since with multiple repos we need to pair the working directory with the repo name and commit-ish. Also, I set the default the way it is so that when no repo is specified, we just checkout into the default location, this way any calling template doesn't need to manually specify WorkingDirectory when there's only one repo.

@benbp benbp requested a review from weshaggard March 8, 2021 19:10
@azure-sdk
Copy link
Collaborator

The following pipelines have been queued for testing:
java - template
java - template - tests
js - template
net - template
net - template - tests
python - template
python - template - tests
You can sign off on the approval gate to test the release stage of each pipeline.
See eng/common workflow

type: object
default:
- Name: $(Build.Repository.Name)
Commitish: $(Build.SourceVersion)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL... what is commitish? Why not just CommitSha?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an official git term, though I knew I'd raise some eyebrows :) Technically it means you can pass a commit or a branch name, or any other valid object ref. I'm open to a better known noun that still explains what valid values are available.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I knew we could pass anything here that resolves to a commit but I think this is the first time I've seen this used. I'm fine keeping it.

- ${{ each repo in parameters.Repositories }}:
- pwsh: |
$dir = "${{ coalesce(repo.WorkingDirectory, format('{0}/{1}', '$(System.DefaultWorkingDirectory)', repo.Name)) }}"
New-Item $dir -ItemType Directory -Force
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't the clone handle this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh it doesn't handle it because you are expecting the directory to already exist. We could probably handle this by passing the path to the clone method instead of the . but I'm indifferent about which approach to take.

Copy link
Member Author

@benbp benbp Mar 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with passing a path is then I can't use System.DefaultWorkingDirectory in any parent pipeline without having to update every subsequent step with a workingDirectory override for $(System.Default.WorkingDirectory)/repo. I wanted to keep it simple for single-repo scenarios, since devops has the same behavior (single repo clone to .../1/s, multi-repo clone to .../1/s/repo).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see you mean the default parameter being . instead of the variable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with that is we need to construct the path to the cloned repo with expressions we can't use in the scope of the parameter definition default, and so we'd have the path reference inline to the code in multiple places, as opposed to just the one place for directory init and then as workingDirectory everywhere else.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with what you have but I would expect you only need to have it inline in the clone command. All the other git steps can be independent and have the working directory set.

Copy link
Member

@weshaggard weshaggard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple comments but otherwise looks reasonable.

@ghost
Copy link

ghost commented Mar 9, 2021

Hello @azure-sdk!

Because this pull request has the auto-merge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

@ghost ghost merged commit 0e24ca5 into Azure:master Mar 9, 2021
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Central-EngSys This issue is owned by the Engineering System team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants