Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added ability to run linux workflows on large runners #6273

Merged
merged 19 commits into from
Oct 18, 2024

Conversation

rashidnhm
Copy link
Contributor

@rashidnhm rashidnhm commented Sep 13, 2024

Context:

Currently the CI gets congested when large amounts of pull requests are being updated simultaneously. This pull request gives PRs an escape hatch and use large runners and use different queue to have CI jobs be picked up.

Description of the Change:

This pull request adds two new features:

  • Ability to add the urgent label to any pull request and switch it over to large runners
  • Automatic swap of rc branch to large runner
    • This assumes the rc branch is of the format vX.Y.Z-rcN

Large runners, albeit slightly more powerful than standard runners, can be spawned at a much higher volume than standard runners ... this is because we pay per minute for these runners vs being included on our GitHub Plan.

If a PR needs CI run without waiting for a runner, add the urgent label to the pull request.

Important Note:

Benefits:
Ability to leverage large runner to have quick time for a runner to pick up a job.

Possible Drawbacks:
Though we dictate the pool size of large runners, it is possible to still saturate it.

Related GitHub Issues:
None. sc-73711

Copy link
Contributor

Hello. You may have forgotten to update the changelog!
Please edit doc/releases/changelog-dev.md with:

  • A one-to-two sentence description of the change. You may include a small working example for new features.
  • A link back to this PR.
  • Your name (or GitHub username) in the contributors section.

@rashidnhm rashidnhm added the urgent Mark a pull request as high priority label Sep 13, 2024
Copy link
Contributor

@mudit2812 mudit2812 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks quite good! Just a couple of questions about where it might make sense to not use large runners.

@rashidnhm
Copy link
Contributor Author

There is also a syntax error with the runs-on expression, I will fix it up Monday morning!

Copy link

codecov bot commented Sep 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.40%. Comparing base (056bb92) to head (51754c7).
Report is 326 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6273      +/-   ##
==========================================
- Coverage   99.71%   99.40%   -0.32%     
==========================================
  Files         447      447              
  Lines       42418    42418              
==========================================
- Hits        42299    42164     -135     
- Misses        119      254     +135     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@mudit2812 mudit2812 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the test runtimes expected to stay the same even while using the large runners? I don't see noticeable improvements in the runtimes based on the most recent CI run, especially in the core tests. Additionally, looks like the jax tests are quite imbalanced when using the large runners. We might need different durations.json files for the case where large runners are used.

If that's expected, once the conversation about using large runners with the docs and format workflows is resolved, I'm happy to approve.

Copy link
Contributor

@mudit2812 mudit2812 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a good idea to unconditionally set the upload.yml action to use the large runners as well.

@Alex-Preciado
Copy link
Contributor

Automatic swap of rc branch to large runner

Hey @rashidnhm, how big is the large runners pool? I don't know a lot about the specifics but I’m concerned that having all codebases use the large runners pool during feature freeze (when multiple rc branches will be created in the ecosystem) might lead to the exact situation we’re trying to avoid—competing for resources. I’m worried this could severely impact Lightning during an already busy period for example.

@rashidnhm
Copy link
Contributor Author

Automatic swap of rc branch to large runner

Hey @rashidnhm, how big is the large runners pool? I don't know a lot about the specifics but I’m concerned that having all codebases use the large runners pool during feature freeze (when multiple rc branches will be created in the ecosystem) might lead to the exact situation we’re trying to avoid—competing for resources. I’m worried this could severely impact Lightning during an already busy period for example.

Great question Alex!

The current pool that is shared across the PL org is 60 runners, this is what GitHub provides as standard for our billing plan. The large runner pool is already set to 150, which is 2.5x the capacity of the standard pool, if we are worried about congestion, we can scale the pool up all the way to 1000, more than enough for PL needs!

Since we pay per build minute on large runners, there is a lot of flexibility with scaling.

We can also increase the pool size just during feature freeze and then scale back down after release

@Alex-Preciado
Copy link
Contributor

Nice, This is music to my ears 🚀 ... Thank you so much for the details, @rashidnhm !!

Copy link
Contributor

@mudit2812 mudit2812 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Rashid. Just a couple of comments, but very close to approval ready :)

.github/workflows/upload.yml Outdated Show resolved Hide resolved
@rashidnhm rashidnhm requested a review from mudit2812 October 17, 2024 17:37
Copy link
Contributor

@mudit2812 mudit2812 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! There's still one unresolved conversation, but that won't impact how we use the workflow, so approving :)

Copy link
Contributor

@PietropaoloFrisoni PietropaoloFrisoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @rashidnhm, it seems very good to me!

I just have a non-blocking observation, but I don't see any problem at this stage

.github/workflows/upload.yml Show resolved Hide resolved
@mudit2812
Copy link
Contributor

Since @rashidnhm is away today, I'm going to merge this once all checks pass.

@mudit2812 mudit2812 enabled auto-merge (squash) October 18, 2024 14:07
@albi3ro albi3ro disabled auto-merge October 18, 2024 14:34
@albi3ro albi3ro merged commit 1bc346a into master Oct 18, 2024
39 of 40 checks passed
@albi3ro albi3ro deleted the sc-73711-onboard-to-large-runners branch October 18, 2024 14:34
austingmhuang pushed a commit that referenced this pull request Oct 23, 2024
**Context:**

Currently the CI gets congested when large amounts of pull requests are
being updated simultaneously. This pull request gives PRs an escape
hatch and use large runners and use different queue to have CI jobs be
picked up.

**Description of the Change:**

This pull request adds two new features:
- Ability to add the `urgent` label to any pull request and switch it
over to large runners
- Automatic swap of rc branch to large runner
    - This assumes the rc branch is of the format `vX.Y.Z-rcN`

Large runners, albeit slightly more powerful than standard runners, can
be spawned at a much higher volume than standard runners ... this is
because we pay per minute for these runners vs being included on our
GitHub Plan.

If a PR needs CI run without waiting for a runner, **add the `urgent`
label to the pull request**.

Important Note:
- This only affect jobs that run on `pull_request` and use `ubuntu`
runners.
- This change is already in-place in lightning and catalyst.
    - PennyLaneAI/pennylane-lightning#774
    - PennyLaneAI/catalyst#846


**Benefits:**
Ability to leverage large runner to have quick time for a runner to pick
up a job.

**Possible Drawbacks:**
Though we dictate the pool size of large runners, it is possible to
still saturate it.

**Related GitHub Issues:**
None.
[sc-73711](https://app.shortcut.com/xanaduai/story/73711/update-pennylane-ci-to-use-large-runner-group)

---------

Co-authored-by: Mudit Pandey <mudit.pandey@xanadu.ai>
mudit2812 added a commit that referenced this pull request Nov 11, 2024
**Context:**

Currently the CI gets congested when large amounts of pull requests are
being updated simultaneously. This pull request gives PRs an escape
hatch and use large runners and use different queue to have CI jobs be
picked up.

**Description of the Change:**

This pull request adds two new features:
- Ability to add the `urgent` label to any pull request and switch it
over to large runners
- Automatic swap of rc branch to large runner
    - This assumes the rc branch is of the format `vX.Y.Z-rcN`

Large runners, albeit slightly more powerful than standard runners, can
be spawned at a much higher volume than standard runners ... this is
because we pay per minute for these runners vs being included on our
GitHub Plan.

If a PR needs CI run without waiting for a runner, **add the `urgent`
label to the pull request**.

Important Note:
- This only affect jobs that run on `pull_request` and use `ubuntu`
runners.
- This change is already in-place in lightning and catalyst.
    - PennyLaneAI/pennylane-lightning#774
    - PennyLaneAI/catalyst#846


**Benefits:**
Ability to leverage large runner to have quick time for a runner to pick
up a job.

**Possible Drawbacks:**
Though we dictate the pool size of large runners, it is possible to
still saturate it.

**Related GitHub Issues:**
None.
[sc-73711](https://app.shortcut.com/xanaduai/story/73711/update-pennylane-ci-to-use-large-runner-group)

---------

Co-authored-by: Mudit Pandey <mudit.pandey@xanadu.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
urgent Mark a pull request as high priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants