Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve runtime of Algorithm Analysis workflow #43

Closed
etpeterson opened this issue Mar 1, 2024 · 3 comments · Fixed by #53
Closed

Improve runtime of Algorithm Analysis workflow #43

etpeterson opened this issue Mar 1, 2024 · 3 comments · Fixed by #53
Assignees
Labels
enhancement New feature or request

Comments

@etpeterson
Copy link
Contributor

Feature description

The Algorithm Analysis workflow parallelizes the jobs in a matrix form, however due to an imbalance in processing duration there is significant wasted time.

Describe the solution

The Algorithm Analysis workflow uses the matrix job parallelization. This is nice and easy to use, but the longest jobs take 15 minutes and the shortest take seconds, and furthermore all jobs take more than a minute to set up the environment properly. This means that for a lot of the shorter runs, the vast majority of the time is spent setting up the environment.

The solution requires some investigation, but it probably involves timing the algorithms and finding some groups with reasonably similar execution time. From there, using a test array that references the manually curated groups should allow for faster wall time test execution. There may be other options that work as well, this issue involves some research.

Describe alternatives

No response

Additional context

No response

Are you working on this?

None

@etpeterson etpeterson added the enhancement New feature or request label Mar 1, 2024
@AhmedBasem20
Copy link
Contributor

Hey @etpeterson, Since I'm already exploring this workflow, may I tackle this one? Thanks.

@etpeterson
Copy link
Contributor Author

@AhmedBasem20 you're welcome to take a look. This is one where it's currently working so it's an enhancement that should be possible. That said, if it's much too complicated, it's probably not worth changing. I also don't have a concrete idea of how it should look.

Here's strategies I've considered but not followed through on.

  • Like I mentioned in the original post, better dividing up the jobs. The matrix is currently algorithm and signal to noise (SNR). That could change to SNR and anatomic area, perhaps. Or even grouping a few algorithms together in a single run. This surely is possible but also might get complicated. For example, if more algorithms are added, how do we re-group them into similarly timed tranches?
  • Reusing runners. The startup time is >1minute so if we can reuse them, we're saving that startup cost. This does mean the tests aren't being truly run from a clean slate so there are plusses and minuses.
  • Starting long-running algorithms first. We know the slow ones so if we start those running first they'll be running the entire time and the short running ones can run in parallel.

I think these could all be tried together or only one. Keep in mind that something is better than nothing and if at the end you make some enhancements but have a plan for more you can do that or write another issue for someone else.

Another note is that in the output of this there are timings being saved so there's already information on how fast each algorithm is. And the tests could change a little too, there's nothing holding us to exactly these parameters if there's some optimization to be done there.

@AhmedBasem20
Copy link
Contributor

Thanks @etpeterson for breaking this down! I'll open a pull request with the specified subtasks and work on each one individually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants