-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tasks with the same workloads sometimes finish in much longer time than others #53269
Comments
This seems like a better fit for https://discourse.julialang.org/. |
Sorry, why? Doesn't this point at a problem with the task library? I can think of at least one reason: scheduling of tasks may be at fault (?)... |
Irrespective of whether this belongs here or not, can you try to provide a more minimal example? |
@carstenbauer That is hard. I tried, but for some reason much simpler setups did not yield reproducible behaviour like this. Even this example shows random variations: sometimes the tasks run equally well. But, start the sim one more time and some tasks will again lag behind the other group (it is always two groups: fast tasks and slow tasks). It is almost as if some tasks had to wait for a thread to run. Even though there should be enough for all to not to have to wait. |
It already is on discourse: https://discourse.julialang.org/t/parallel-assembly-of-a-finite-element-sparse-matrix/95947/ |
It seems I answered my own question: Strategy A. When the computation is started with N threads and the number of tasks is N-1, it seems that at some point one or more tasks cannot find a thread to run on, wait for one, and finish consequently late. Strategy B. The following approach works much better: start as many threads as you have on the machine, and use however many tasks you wish (leaving a substantial number of threads over). For instance, spin up julia with 24 threads, and use up to 16 tasks. Then all tasks will finish at the same time. I wonder why some of the tasks cannot find a thread to run on in Strategy A? What else does julia run on those threads? |
Sounds like this is better suited to continue discussion on the discourse thread then. Github does not handle threads well. |
The problem appears to be real, wouldn't you say? |
Edit: Updated MWE:
Many runs repeated. Often the max run time is twice the min, indicating scheduling problems. |
It does not sound real based on the current description. I think the usual approach is to divide the problem up into about |
@vtjnash I'm afraid I'm not following you. The workload in the MWE is perfectly distributed. I start enough threads so that each task can run on a thread, on a machine that has enough threads not to be oversubscribed. Yet apparently some tasks have to wait on a thread to run. Creating more chunks means creating more tasks means spending more time in the setup of the parallel loop. Is that a good strategy? |
I think the |
This batch file is run on an unloaded machine (single user):
As shown in the batch file, a computation is run twice with the same parameters:
So, some tasks take practically double the time of the others. The number of tasks that takes longer varies from run to run.
The text was updated successfully, but these errors were encountered: