pin pytorch source builds nightly to speed up CI #1328

powderluv · 2022-08-31T22:18:42Z

Pin the Pytorch bump to a nightly builder -- that also populates the ccache for the day.

silvasean · 2022-08-31T23:20:10Z

The other option here is to do like we do for LLVM, with a deliberate ~weekly bump of the PyTorch dependency.

powderluv · 2022-08-31T23:29:52Z

I am hoping the infra to bump will be the same -- and we can adjust how frequently (nightly / weekly) etc. When it fails it will still need someone to fix it up but hopefully we get the green commit moving without hand holding.

We will also have to disable CIs from uploading their CI caches.

powderluv · 2022-09-23T06:15:22Z

@ashay I have a WIP on how I think we could pin PyTorch and roll only when our tests pass. So we get the best of being very close to PyTorch automatically like we do -- but we don't roll into a broken master.

I have a pseudo-code branch here: https://github.com/llvm/torch-mlir/tree/RollPyTorch if you want to give it a crack.

Basically we add a nightly that runs our current OOT+PyTorch source build with Top of Master Pytorch. If it passes we commit the torch_mlir_pytorch_version file.

And on the CI side if we load the pinned SHA. Happy to discuss any other suggestions just wanted to dump what I had in my mind.

FYI @sjain-stanford @makslevental

ashay · 2022-09-23T15:54:35Z

Building on top of what you had, I pushed a new commit to the RollPyTorch branch to fetch the binary release so that we don't need to build from source. Let me know if I'm veering too far off from what you had in mind! Thanks much!

This commit uses the pip index versions command and some bash commands, but we could change it to use Python if necessary. The fetched PyTorch version is written to pytorch-requirements.txt, which I've kept separate from the top-level requirements.txt so that CI can write to just this one file. Of course, the pytorch-requirements.txt is referenced by requirements.txt, so end users can continue to run pip install requirements.txt and they'll get all the required modules to build Torch-MLIR. I also changed the Github Actions script to do a Release build instead of an out-of-tree build to get better confidence.

I haven't tested a full CI run, but if this approach is generally along the lines of what we want, I'll go ahead and try this out in CI.

powderluv · 2022-09-23T18:26:39Z

Oh thank you for the commit. I think pinning the binary is good too, but I think we still need pure source builds for downstream customers. They will have long supported PyTorch forks (long past the PyTorch nightlies are removed) to enable a hermetic build environment without an external binary.

Do you know anyway we can find the SHA of that particular binary you got with
python -m pip index versions -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html --pre torch | grep "Available versions" | tr ' ' '\n' | sort -nr | head -n1 | tr -d ',' ?

We could use that SHA for the source builds CI and also pin the exact same binary for binary builds.

Feel free to add any commits to the branch. We can clean up / review / commit once we figure it out.

ashay · 2022-09-26T17:10:04Z

Here's a quick summary of our discussion from the Developer Hour meeting, just so that it's easier to access later.

I have updated the branch to fetch the commit hash for the nightly release from the whl file. Assuming that the Github Actions script in RollPyTorch runs once a day, it should be able to pick up the most recent whl file, write the pytorch-requirements.txt file (which is referenced by the top-level requirements.txt file), and validate whether an out-of-tree build works.

Both the Torch-MLIR release version and the PyTorch commit hash are written to a file called release-info at the root of the directory (although, if we want to write this information to separate files, we should consider putting it into a separate directory.

The part that remains, however, is pushing the updated pytorch-requirements.txt file and the release-info file to the main branch. Since the main branch is protected, I presume we'll need to create a PR from within the Github Actions script, which one of us will have to approve. Is there another way that doesn't require creating a PR?

powderluv · 2022-09-26T17:56:01Z

Thanks @ashay

I don't think we want to save our Torch-MLIR version in the github repo. That is calculated by the release build and named accordingly when creating it. We don't want to conflict with that in the RollPyTorch.yml builder. So just PyTorch SHA is sufficient I would think. I would also suggest pytorch-version.txt just like https://github.com/pytorch/pytorch/blob/master/version.txt since it is not a "release-info" yet just a Pytorch pin like git submodules.

BTW curious if we can drop the +cpu in pytorch-requirements.txt since we anyway point to the cpu nightly page ? Will fix #1381 with that.

silvasean · 2022-09-26T18:21:50Z

@GMNGeoffrey @stellaraccident -- any idea for how to push an updated SHA1 hash directly into the repo? Do we need to disable branch protection somehow? We currently have branch protection for CI and unresolved conversations.

GMNGeoffrey · 2022-09-26T18:33:34Z

In IREE we have it set to require pull request reviews, but admins can override. That means that admins can make direct pushes to the repo:

This is nice because gives us break glass when CI is stuck or something, but also a bit terrifying because running the wrong git push command can break things, which is why I have this little git command to set my upstream push URL to something invalid except when I explicitly activate it: https://gist.github.com/42dd9a9792390094a43bdb69659320c0

GMNGeoffrey · 2022-09-26T18:34:43Z

(Also note that from a security perspective, allowing admins to do this makes zero difference because if we check the box saying they can't, then as an admin they can always go uncheck it)

ashay · 2022-09-26T19:16:36Z

Thanks all! I have included a rudimentary change to push the updated files to the main branch (see commit 22eb82c), but I have not tried it at my end, for fear that it might do bad things (like recursive commits). If someone can vet it and/or run this in a controlled environment, that'd be very nice. Thanks!

I dropped the +cpu suffix, and everything seems to work well, although I'm wary of the change because regexes often create more problems than they solve. If something breaks, I'll be sure to revert the change. I've also dropped the release-info file. Instead, the hash is written to pytorch-version.txt.

GMNGeoffrey · 2022-09-26T19:37:55Z

Ah I've gone back and read the context here. A PR might not be a bad idea anyway, since it will run presubmits. You could even set up a bot to auto-approve it so it just needs to pass required checks :-) I think the latter would require a second bot account. If you want to push directly you'll need to use a bot account or app with the ability to bypass checks (can configure which users can do that in settings):

powderluv · 2022-09-26T19:39:38Z

@ashay for testing we can push to the same RollPytorch branch ref. Once it is working we can open a new PR

ashay · 2022-09-30T15:01:14Z

PRs #1419 and #1438 address the pinning and periodic update of the PyTorch release, but we don't seem to have resolved the caching issue. We have had the auto-update in place for the last two days, and the LLVM builds hit the cache, but libtorch builds don't, causing feature PRs in Torch-MLIR to spend about an hour and half in CI.

powderluv · 2022-09-30T15:06:30Z

I think now we turn off cache saving from the CI builds and the nightly roll just create the cache. I have another bug open for the cache issue with details (afk so can't link bug right now)

powderluv · 2022-09-30T16:16:45Z

Is the cache issue #1323

powderluv · 2022-09-30T16:17:15Z

Lets close this and track the Cache issue separately (since that happens with / without pinning)

…lvm#1328) Use the Diagnostic class to diagnose the axis attribute for the ONNX Flatten, Gather and GatherElements operators. Signed-off-by: Ettore Tiotto <etiotto@ca.ibm.com>

powderluv mentioned this issue Aug 31, 2022

Add a CI test to validate shapelib #1317

Closed

powderluv mentioned this issue Sep 23, 2022

Fix as_strided symint #1401

Merged

powderluv closed this as completed Sep 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pin pytorch source builds nightly to speed up CI #1328

pin pytorch source builds nightly to speed up CI #1328

powderluv commented Aug 31, 2022

silvasean commented Aug 31, 2022

powderluv commented Aug 31, 2022

powderluv commented Sep 23, 2022

ashay commented Sep 23, 2022

powderluv commented Sep 23, 2022

ashay commented Sep 26, 2022

powderluv commented Sep 26, 2022

silvasean commented Sep 26, 2022

GMNGeoffrey commented Sep 26, 2022

GMNGeoffrey commented Sep 26, 2022 •

edited

Loading

ashay commented Sep 26, 2022

GMNGeoffrey commented Sep 26, 2022

powderluv commented Sep 26, 2022

ashay commented Sep 30, 2022

powderluv commented Sep 30, 2022

powderluv commented Sep 30, 2022

powderluv commented Sep 30, 2022

pin pytorch source builds nightly to speed up CI #1328

pin pytorch source builds nightly to speed up CI #1328

Comments

powderluv commented Aug 31, 2022

silvasean commented Aug 31, 2022

powderluv commented Aug 31, 2022

powderluv commented Sep 23, 2022

ashay commented Sep 23, 2022

powderluv commented Sep 23, 2022

ashay commented Sep 26, 2022

powderluv commented Sep 26, 2022

silvasean commented Sep 26, 2022

GMNGeoffrey commented Sep 26, 2022

GMNGeoffrey commented Sep 26, 2022 • edited Loading

ashay commented Sep 26, 2022

GMNGeoffrey commented Sep 26, 2022

powderluv commented Sep 26, 2022

ashay commented Sep 30, 2022

powderluv commented Sep 30, 2022

powderluv commented Sep 30, 2022

powderluv commented Sep 30, 2022

GMNGeoffrey commented Sep 26, 2022 •

edited

Loading