Speed up the algorithm analysis workflow #53

AhmedBasem20 · 2024-03-05T02:01:20Z

Fixes #43

Created dependency caching for python environment.
Separated the slow algorithms into another job that runs in parallel with the default algorithms set.

Those techniques speed up the workflow from ~25 minutes to ~15 minutes
See: https://github.com/AhmedBasem20/TF2.4_IVIM-MRI_CodeCollection/actions/runs/8149764438

etpeterson

In short, the caching seems to be good and probably should be applied to the other workflow. The splitting doesn't seem to achieve anything though and it only appears to run faster because the tests are being skipped.

etpeterson · 2024-03-05T05:12:15Z

.github/workflows/analysis.yml

-      #   with:
-      #     key: venv-${{ runner.os }}-${{ steps.setup_python.outputs.python-version}}-${{ hashFiles('requirements.txt') }}
-      #     path: .venv
+      - name: Cache pip


Can we also add this caching to the unit test pipeline?

etpeterson · 2024-03-05T05:20:35Z

.github/workflows/analysis.yml

@@ -85,7 +118,7 @@ jobs:

  merge:
    runs-on: ubuntu-latest
-    needs: build
+    needs: [build, long]


Unfortunately I don't think this splitting approach will work. The caching seems to have accelerated the startup time down to 1 minute, but the long tests aren't being run. I see the runs on your branch and the long tests are taking just the 1 minute startup time. From the logs it looks like they're being skipped, but I don't see why.

But I don't think this approach will work because as an open source org, we have 20 (I believe) concurrent runners and that's shared across all workflows so moving jobs around like this won't have any effect.

AhmedBasem20 · 2024-03-05T13:50:03Z

@etpeterson I kept only the caching commit on this PR. I tried adding caching to the unit test but I'm not sure if this is possible, we have a matrix of different python versions and operation systems. I think the installation step is necessary in this case and the cache will not help speeding the process. What do you think? Also if you have any thoughts on further improving the analysis workflow let me know.

etpeterson · 2024-03-06T05:27:14Z

Now that I look into it more I realize I'm less sure what we really want. To start, it seems to work, it speeds up the setup time, so that's good. I also see that we're already hitting the 10GB cache limit on github, so maybe cache optimization could be a future project.

That said, I notice you're double caching. The cache step and the python setup step. I found a post that suggests the opposite order is the way to go. I think because the cache restores the binaries and then python setup doesn't need to do anything anymore.

I'm not sure if caching the binary is good or not. It seems potentially fragile - if anything changes then it could fail in strange ways. Because of this, I think the key should be as specific as possible to try to avoid matching incorrectly. Things like OS, python version, requirements file, and anything else you can think of.

I think caching both workflows is good - at least in theory. The caches are saved across runs so no problem there, even if it is the testing grid that's running. Cache size is an issue though, but maybe we can figure that out once we optimize caching itself.

AhmedBasem20 · 2024-03-06T15:52:29Z

Hi @etpeterson, the large cache size is because we create a pip cache 12 times on the unit test matrix, which is not used.
I see that the setup-python cache is not used as well on analysis.yml workflow, so I'll try your proposed solution (caching the binaries) and let you know how it goes.

For the unit_test.yml cache I still can't imagine how this could be done, if you have further thoughts or approaches please let me know.

AhmedBasem20 · 2024-03-06T19:04:17Z

I noticed that we don't need the caching on the python-setup for the analysis workflow, because we already cached the environment, the extra caching restoration was taking around 25 seconds. Now it is removed.

P.S. I realized the importance of the cache step on the unit test workflow (with the large size) so I put it back.

etpeterson · 2024-03-07T04:50:20Z

I'm not sure if caching the binary is good or not. It seems potentially fragile - if anything changes then it could fail in strange ways. Because of this, I think the key should be as specific as possible to try to avoid matching incorrectly. Things like OS, python version, requirements file, and anything else you can think of.

I think this bit is still missing, being more specific about what we're caching. Right now if the OS gets upgraded we won't see that in the key and are going to end up with some strange errors.

etpeterson · 2024-03-07T04:52:53Z

One other note, I see you're force pushing all the time. That's a big red flag because it can rewrite history and do all kinds of other nastyness. You should almost never need to force push, so try to avoid it if at all possible.

AhmedBasem20 · 2024-03-07T09:27:49Z

I think this bit is still missing, being more specific about what we're caching. Right now if the OS gets upgraded we won't see that in the key and are going to end up with some strange errors.

I added the OS type, and I see that env.pythonLocation already contains the python version. For the OS version I don't see a straight way to get it. Even in the setup-python action, they display the OS version using this script for Linux only.

AhmedBasem20 · 2024-03-07T09:40:51Z

One other note, I see you're force pushing all the time. That's a big red flag because it can rewrite history and do all kinds of other nastyness. You should almost never need to force push, so try to avoid it if at all possible.

I am using it because I had a lot of debugging commits with names like fix, debug, etc. When I reach a solution, I rebase those commits into one commit with a descriptive name to have a clean history, and then force push it.
I thought this was a good practice (except for the last step for sure).
What do you think is an alternative to having a clean history and avoiding force pushing?

etpeterson · 2024-03-07T17:03:55Z

One other note, I see you're force pushing all the time. That's a big red flag because it can rewrite history and do all kinds of other nastyness. You should almost never need to force push, so try to avoid it if at all possible.

I am using it because I had a lot of debugging commits with names like fix, debug, etc. When I reach a solution, I rebase those commits into one commit with a descriptive name to have a clean history, and then force push it. I thought this was a good practice (except for the last step for sure). What do you think is an alternative to having a clean history and avoiding force pushing?

Usually there's no need to push so locally you can do basically anything you want. Restarting, recreating branches, etc. In this case though you needed to push to run so it makes sense that you had many commits. It's probably the worst case scenario for git actually.

I don't mind having other commits in there, but that's a personal preference and some people/repositories feel strongly one way or another.

The workflow you describe might be solved with a squash commit which we can do at merge time on github. It's an option I can select when merging.

Probably the cleanest but slightly more effort solution is have a development branch and a final branch. Checkout a testing branch and iterate on that until it's good, then squash those commits and create a final branch from that. Then any more changes start in the development until they're ready for the squish and move to final branch. That way the final branch is clean of all the intermediate modifications.

I know git can sometimes be a hassle but hopefully that helped. And if force pushes are truly necessary, that's fine, especially on your own repository. Just keep in mind that it's not always an option for every respository.

Cache python environment.

f3df2c6

etpeterson requested changes Mar 5, 2024

View reviewed changes

AhmedBasem20 force-pushed the analysis/optimize-runtime branch 2 times, most recently from 6824352 to f3df2c6 Compare March 5, 2024 13:39

AhmedBasem20 requested a review from etpeterson March 5, 2024 15:23

Cache python setup step.

90887f4

AhmedBasem20 force-pushed the analysis/optimize-runtime branch 2 times, most recently from a897471 to 90887f4 Compare March 6, 2024 17:46

Use the unit test cache again.

4710f75

Add OS type to the cache key.

6c6cebb

etpeterson merged commit 1a8785a into OSIPI:main Mar 7, 2024
12 checks passed

etpeterson mentioned this pull request Mar 9, 2024

Generate a variety of testing data and came up command_line version of sim_vim_sig.py #55

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up the algorithm analysis workflow #53

Speed up the algorithm analysis workflow #53

AhmedBasem20 commented Mar 5, 2024

etpeterson left a comment

etpeterson Mar 5, 2024

etpeterson Mar 5, 2024

AhmedBasem20 commented Mar 5, 2024

etpeterson commented Mar 6, 2024

AhmedBasem20 commented Mar 6, 2024 •

edited

Loading

AhmedBasem20 commented Mar 6, 2024

etpeterson commented Mar 7, 2024

etpeterson commented Mar 7, 2024

AhmedBasem20 commented Mar 7, 2024

AhmedBasem20 commented Mar 7, 2024

etpeterson commented Mar 7, 2024

Speed up the algorithm analysis workflow #53

Speed up the algorithm analysis workflow #53

Conversation

AhmedBasem20 commented Mar 5, 2024

etpeterson left a comment

Choose a reason for hiding this comment

etpeterson Mar 5, 2024

Choose a reason for hiding this comment

etpeterson Mar 5, 2024

Choose a reason for hiding this comment

AhmedBasem20 commented Mar 5, 2024

etpeterson commented Mar 6, 2024

AhmedBasem20 commented Mar 6, 2024 • edited Loading

AhmedBasem20 commented Mar 6, 2024

etpeterson commented Mar 7, 2024

etpeterson commented Mar 7, 2024

AhmedBasem20 commented Mar 7, 2024

AhmedBasem20 commented Mar 7, 2024

etpeterson commented Mar 7, 2024

AhmedBasem20 commented Mar 6, 2024 •

edited

Loading