-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up the algorithm analysis workflow #53
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In short, the caching seems to be good and probably should be applied to the other workflow. The splitting doesn't seem to achieve anything though and it only appears to run faster because the tests are being skipped.
# with: | ||
# key: venv-${{ runner.os }}-${{ steps.setup_python.outputs.python-version}}-${{ hashFiles('requirements.txt') }} | ||
# path: .venv | ||
- name: Cache pip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also add this caching to the unit test pipeline?
.github/workflows/analysis.yml
Outdated
@@ -85,7 +118,7 @@ jobs: | |||
|
|||
merge: | |||
runs-on: ubuntu-latest | |||
needs: build | |||
needs: [build, long] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately I don't think this splitting approach will work. The caching seems to have accelerated the startup time down to 1 minute, but the long
tests aren't being run. I see the runs on your branch and the long
tests are taking just the 1 minute startup time. From the logs it looks like they're being skipped, but I don't see why.
But I don't think this approach will work because as an open source org, we have 20 (I believe) concurrent runners and that's shared across all workflows so moving jobs around like this won't have any effect.
6824352
to
f3df2c6
Compare
@etpeterson I kept only the caching commit on this PR. I tried adding caching to the unit test but I'm not sure if this is possible, we have a matrix of different python versions and operation systems. I think the installation step is necessary in this case and the cache will not help speeding the process. What do you think? Also if you have any thoughts on further improving the analysis workflow let me know. |
Now that I look into it more I realize I'm less sure what we really want. To start, it seems to work, it speeds up the setup time, so that's good. I also see that we're already hitting the 10GB cache limit on github, so maybe cache optimization could be a future project. That said, I notice you're double caching. The cache step and the python setup step. I found a post that suggests the opposite order is the way to go. I think because the cache restores the binaries and then python setup doesn't need to do anything anymore. I'm not sure if caching the binary is good or not. It seems potentially fragile - if anything changes then it could fail in strange ways. Because of this, I think the key should be as specific as possible to try to avoid matching incorrectly. Things like OS, python version, requirements file, and anything else you can think of. I think caching both workflows is good - at least in theory. The caches are saved across runs so no problem there, even if it is the testing grid that's running. Cache size is an issue though, but maybe we can figure that out once we optimize caching itself. |
Hi @etpeterson, the large cache size is because we create a pip cache 12 times on the unit test matrix, which is not used. For the |
a897471
to
90887f4
Compare
I noticed that we don't need the caching on the python-setup for the analysis workflow, because we already cached the environment, the extra caching restoration was taking around 25 seconds. Now it is removed. P.S. I realized the importance of the cache step on the unit test workflow (with the large size) so I put it back. |
I think this bit is still missing, being more specific about what we're caching. Right now if the OS gets upgraded we won't see that in the key and are going to end up with some strange errors. |
One other note, I see you're force pushing all the time. That's a big red flag because it can rewrite history and do all kinds of other nastyness. You should almost never need to force push, so try to avoid it if at all possible. |
I added the OS type, and I see that |
I am using it because I had a lot of debugging commits with names like fix, debug, etc. When I reach a solution, I rebase those commits into one commit with a descriptive name to have a clean history, and then force push it. |
Usually there's no need to push so locally you can do basically anything you want. Restarting, recreating branches, etc. In this case though you needed to push to run so it makes sense that you had many commits. It's probably the worst case scenario for git actually. I don't mind having other commits in there, but that's a personal preference and some people/repositories feel strongly one way or another. The workflow you describe might be solved with a squash commit which we can do at merge time on github. It's an option I can select when merging. Probably the cleanest but slightly more effort solution is have a development branch and a final branch. Checkout a testing branch and iterate on that until it's good, then squash those commits and create a final branch from that. Then any more changes start in the development until they're ready for the squish and move to final branch. That way the final branch is clean of all the intermediate modifications. I know git can sometimes be a hassle but hopefully that helped. And if force pushes are truly necessary, that's fine, especially on your own repository. Just keep in mind that it's not always an option for every respository. |
Fixes #43
Created dependency caching for python environment.
Separated the slow algorithms into another job that runs in parallel with the default algorithms set.
Those techniques speed up the workflow from ~25 minutes to ~15 minutes
See: https://github.com/AhmedBasem20/TF2.4_IVIM-MRI_CodeCollection/actions/runs/8149764438