Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up the algorithm analysis workflow #53

Merged
merged 4 commits into from
Mar 7, 2024

Conversation

AhmedBasem20
Copy link
Contributor

Fixes #43

  • Created dependency caching for python environment.

  • Separated the slow algorithms into another job that runs in parallel with the default algorithms set.

Those techniques speed up the workflow from ~25 minutes to ~15 minutes
See: https://github.com/AhmedBasem20/TF2.4_IVIM-MRI_CodeCollection/actions/runs/8149764438

Copy link
Contributor

@etpeterson etpeterson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In short, the caching seems to be good and probably should be applied to the other workflow. The splitting doesn't seem to achieve anything though and it only appears to run faster because the tests are being skipped.

# with:
# key: venv-${{ runner.os }}-${{ steps.setup_python.outputs.python-version}}-${{ hashFiles('requirements.txt') }}
# path: .venv
- name: Cache pip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add this caching to the unit test pipeline?

@@ -85,7 +118,7 @@ jobs:

merge:
runs-on: ubuntu-latest
needs: build
needs: [build, long]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I don't think this splitting approach will work. The caching seems to have accelerated the startup time down to 1 minute, but the long tests aren't being run. I see the runs on your branch and the long tests are taking just the 1 minute startup time. From the logs it looks like they're being skipped, but I don't see why.

But I don't think this approach will work because as an open source org, we have 20 (I believe) concurrent runners and that's shared across all workflows so moving jobs around like this won't have any effect.

@AhmedBasem20 AhmedBasem20 force-pushed the analysis/optimize-runtime branch 2 times, most recently from 6824352 to f3df2c6 Compare March 5, 2024 13:39
@AhmedBasem20
Copy link
Contributor Author

@etpeterson I kept only the caching commit on this PR. I tried adding caching to the unit test but I'm not sure if this is possible, we have a matrix of different python versions and operation systems. I think the installation step is necessary in this case and the cache will not help speeding the process. What do you think? Also if you have any thoughts on further improving the analysis workflow let me know.

@AhmedBasem20 AhmedBasem20 requested a review from etpeterson March 5, 2024 15:23
@etpeterson
Copy link
Contributor

Now that I look into it more I realize I'm less sure what we really want. To start, it seems to work, it speeds up the setup time, so that's good. I also see that we're already hitting the 10GB cache limit on github, so maybe cache optimization could be a future project.

That said, I notice you're double caching. The cache step and the python setup step. I found a post that suggests the opposite order is the way to go. I think because the cache restores the binaries and then python setup doesn't need to do anything anymore.

I'm not sure if caching the binary is good or not. It seems potentially fragile - if anything changes then it could fail in strange ways. Because of this, I think the key should be as specific as possible to try to avoid matching incorrectly. Things like OS, python version, requirements file, and anything else you can think of.

I think caching both workflows is good - at least in theory. The caches are saved across runs so no problem there, even if it is the testing grid that's running. Cache size is an issue though, but maybe we can figure that out once we optimize caching itself.

@AhmedBasem20
Copy link
Contributor Author

AhmedBasem20 commented Mar 6, 2024

Hi @etpeterson, the large cache size is because we create a pip cache 12 times on the unit test matrix, which is not used.
I see that the setup-python cache is not used as well on analysis.yml workflow, so I'll try your proposed solution (caching the binaries) and let you know how it goes.

For the unit_test.yml cache I still can't imagine how this could be done, if you have further thoughts or approaches please let me know.

@AhmedBasem20 AhmedBasem20 force-pushed the analysis/optimize-runtime branch 2 times, most recently from a897471 to 90887f4 Compare March 6, 2024 17:46
@AhmedBasem20
Copy link
Contributor Author

I noticed that we don't need the caching on the python-setup for the analysis workflow, because we already cached the environment, the extra caching restoration was taking around 25 seconds. Now it is removed.

P.S. I realized the importance of the cache step on the unit test workflow (with the large size) so I put it back.

@etpeterson
Copy link
Contributor

I'm not sure if caching the binary is good or not. It seems potentially fragile - if anything changes then it could fail in strange ways. Because of this, I think the key should be as specific as possible to try to avoid matching incorrectly. Things like OS, python version, requirements file, and anything else you can think of.

I think this bit is still missing, being more specific about what we're caching. Right now if the OS gets upgraded we won't see that in the key and are going to end up with some strange errors.

@etpeterson
Copy link
Contributor

One other note, I see you're force pushing all the time. That's a big red flag because it can rewrite history and do all kinds of other nastyness. You should almost never need to force push, so try to avoid it if at all possible.

@AhmedBasem20
Copy link
Contributor Author

I think this bit is still missing, being more specific about what we're caching. Right now if the OS gets upgraded we won't see that in the key and are going to end up with some strange errors.

I added the OS type, and I see that env.pythonLocation already contains the python version. For the OS version I don't see a straight way to get it. Even in the setup-python action, they display the OS version using this script for Linux only.

@AhmedBasem20
Copy link
Contributor Author

One other note, I see you're force pushing all the time. That's a big red flag because it can rewrite history and do all kinds of other nastyness. You should almost never need to force push, so try to avoid it if at all possible.

I am using it because I had a lot of debugging commits with names like fix, debug, etc. When I reach a solution, I rebase those commits into one commit with a descriptive name to have a clean history, and then force push it.
I thought this was a good practice (except for the last step for sure).
What do you think is an alternative to having a clean history and avoiding force pushing?

@etpeterson
Copy link
Contributor

One other note, I see you're force pushing all the time. That's a big red flag because it can rewrite history and do all kinds of other nastyness. You should almost never need to force push, so try to avoid it if at all possible.

I am using it because I had a lot of debugging commits with names like fix, debug, etc. When I reach a solution, I rebase those commits into one commit with a descriptive name to have a clean history, and then force push it. I thought this was a good practice (except for the last step for sure). What do you think is an alternative to having a clean history and avoiding force pushing?

Usually there's no need to push so locally you can do basically anything you want. Restarting, recreating branches, etc. In this case though you needed to push to run so it makes sense that you had many commits. It's probably the worst case scenario for git actually.

I don't mind having other commits in there, but that's a personal preference and some people/repositories feel strongly one way or another.

The workflow you describe might be solved with a squash commit which we can do at merge time on github. It's an option I can select when merging.

Probably the cleanest but slightly more effort solution is have a development branch and a final branch. Checkout a testing branch and iterate on that until it's good, then squash those commits and create a final branch from that. Then any more changes start in the development until they're ready for the squish and move to final branch. That way the final branch is clean of all the intermediate modifications.

I know git can sometimes be a hassle but hopefully that helped. And if force pushes are truly necessary, that's fine, especially on your own repository. Just keep in mind that it's not always an option for every respository.

@etpeterson etpeterson merged commit 1a8785a into OSIPI:main Mar 7, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve runtime of Algorithm Analysis workflow
2 participants