-
Notifications
You must be signed in to change notification settings - Fork 533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pin pytorch source builds nightly to speed up CI #1328
Comments
The other option here is to do like we do for LLVM, with a deliberate ~weekly bump of the PyTorch dependency. |
I am hoping the infra to bump will be the same -- and we can adjust how frequently (nightly / weekly) etc. When it fails it will still need someone to fix it up but hopefully we get the green commit moving without hand holding. We will also have to disable CIs from uploading their CI caches. |
@ashay I have a WIP on how I think we could pin PyTorch and roll only when our tests pass. So we get the best of being very close to PyTorch automatically like we do -- but we don't roll into a broken master. I have a pseudo-code branch here: https://github.com/llvm/torch-mlir/tree/RollPyTorch if you want to give it a crack. Basically we add a nightly that runs our current OOT+PyTorch source build with Top of Master Pytorch. If it passes we commit the torch_mlir_pytorch_version file. And on the CI side if we load the pinned SHA. Happy to discuss any other suggestions just wanted to dump what I had in my mind. |
Building on top of what you had, I pushed a new commit to the RollPyTorch branch to fetch the binary release so that we don't need to build from source. Let me know if I'm veering too far off from what you had in mind! Thanks much! This commit uses the I haven't tested a full CI run, but if this approach is generally along the lines of what we want, I'll go ahead and try this out in CI. |
Oh thank you for the commit. I think pinning the binary is good too, but I think we still need pure source builds for downstream customers. They will have long supported PyTorch forks (long past the PyTorch nightlies are removed) to enable a hermetic build environment without an external binary. Do you know anyway we can find the SHA of that particular binary you got with We could use that SHA for the source builds CI and also pin the exact same binary for binary builds. Feel free to add any commits to the branch. We can clean up / review / commit once we figure it out. |
Here's a quick summary of our discussion from the Developer Hour meeting, just so that it's easier to access later. I have updated the branch to fetch the commit hash for the nightly release from the whl file. Assuming that the Github Actions script in RollPyTorch runs once a day, it should be able to pick up the most recent whl file, write the Both the Torch-MLIR release version and the PyTorch commit hash are written to a file called The part that remains, however, is pushing the updated |
Thanks @ashay I don't think we want to save our Torch-MLIR version in the github repo. That is calculated by the release build and named accordingly when creating it. We don't want to conflict with that in the RollPyTorch.yml builder. So just PyTorch SHA is sufficient I would think. I would also suggest BTW curious if we can drop the |
@GMNGeoffrey @stellaraccident -- any idea for how to push an updated SHA1 hash directly into the repo? Do we need to disable branch protection somehow? We currently have branch protection for CI and unresolved conversations. |
In IREE we have it set to require pull request reviews, but admins can override. That means that admins can make direct pushes to the repo: This is nice because gives us break glass when CI is stuck or something, but also a bit terrifying because running the wrong |
(Also note that from a security perspective, allowing admins to do this makes zero difference because if we check the box saying they can't, then as an admin they can always go uncheck it) |
Thanks all! I have included a rudimentary change to push the updated files to the main branch (see commit 22eb82c), but I have not tried it at my end, for fear that it might do bad things (like recursive commits). If someone can vet it and/or run this in a controlled environment, that'd be very nice. Thanks! I dropped the |
Ah I've gone back and read the context here. A PR might not be a bad idea anyway, since it will run presubmits. You could even set up a bot to auto-approve it so it just needs to pass required checks :-) I think the latter would require a second bot account. If you want to push directly you'll need to use a bot account or app with the ability to bypass checks (can configure which users can do that in settings): |
@ashay for testing we can push to the same RollPytorch branch ref. Once it is working we can open a new PR |
PRs #1419 and #1438 address the pinning and periodic update of the PyTorch release, but we don't seem to have resolved the caching issue. We have had the auto-update in place for the last two days, and the LLVM builds hit the cache, but libtorch builds don't, causing feature PRs in Torch-MLIR to spend about an hour and half in CI. |
I think now we turn off cache saving from the CI builds and the nightly roll just create the cache. I have another bug open for the cache issue with details (afk so can't link bug right now) |
Is the cache issue #1323 |
Lets close this and track the Cache issue separately (since that happens with / without pinning) |
…lvm#1328) Use the Diagnostic class to diagnose the axis attribute for the ONNX Flatten, Gather and GatherElements operators. Signed-off-by: Ettore Tiotto <etiotto@ca.ibm.com>
Pin the Pytorch bump to a nightly builder -- that also populates the ccache for the day.
The text was updated successfully, but these errors were encountered: