Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: sched: Assigner experiments #10356

Merged
merged 3 commits into from
Mar 9, 2023
Merged

feat: sched: Assigner experiments #10356

merged 3 commits into from
Mar 9, 2023

Conversation

magik6k
Copy link
Contributor

@magik6k magik6k commented Feb 27, 2023

Related Issues

Proposed Changes

Currently lotus-miner can be configured with one of two assigners:

  • utilization (default) - For each task pick a worker with a lowest 'utilization factor', which is based on ratios of compute resources
  • spread - In each assign loop try to assign to a worker with least tasks assigned so far

This PR adds a few experimental assigners:

  • experiment-spread-qcount - Like spread, but also takes into account task counts which are running/preparing/queued
  • experiment-spread-tasks - Like spread, but counts running tasks on a per-task-type basis
  • experiment-spread-tasks-qcount - The two above combined - count running tasks grouped by task type, also takes into account tasks which are running/preparing/queued
  • experiment-random - In each schedule loop figure a set of all workers which can handle the task.. then pick a random one

Additional Info

  • Would be good to try all of those and see how they behave
  • Interested to hear if experiment-random / experiment-spread-tasks-qcount work better for --no-default storage workers
  • Interested to hear additional ideas

Checklist

Before you mark the PR ready for review, please make sure that:

  • Commits have a clear commit message.
  • PR title is in the form of of <PR type>: <area>: <change being made>
    • example: fix: mempool: Introduce a cache for valid signatures
    • PR type: fix, feat, build, chore, ci, docs, perf, refactor, revert, style, test
    • area, e.g. api, chain, state, market, mempool, multisig, networking, paych, proving, sealing, wallet, deps
  • New features have usage guidelines and / or documentation updates in
  • Tests exist for new functionality or change in behavior
  • CI is green

@rjan90
Copy link
Contributor

rjan90 commented Feb 27, 2023

Interested to hear if experiment-random / experiment-spread-tasks-qcount work better for --no-default storage workers

Ran som tests for these here. And these definetly make things way better for --no-default workers (not every sector assign to a single storage worker).

Interested to hear additional ideas

Only other idea I hear been floating around is a round robin throught the list of workers, but unsure if that would be possible with the current limitations of the built-in scheduler. I do however think that the random assigner with a possible fix for #9407 (comment), will be sufficient for spreading these sectors around.

@magik6k
Copy link
Contributor Author

magik6k commented Feb 28, 2023

Pushed a commit which implements the fix for #9407 (comment).

@magik6k magik6k force-pushed the feat/assigner-experiments branch from 13d9b3d to 2316363 Compare February 28, 2023 08:08
@RobQuistNL
Copy link
Contributor

RobQuistNL commented Mar 3, 2023

The one and only scheduler implementation I'd love to have would be "round-robin". For example, say our setup has this layout:

MachineA
       |- AP_1
       |- AP_2
       \- PC1_1

MachineB
       |- AP_3
       |- AP_4
       \- PC1_2
       
MachineC
       |- PC2_1
       \- C2_1

MachineD
       |- PC2_2
       \- C2_2

When we have the current scheduler, it would assign AP to the AP workers, decently spread. AP workers take work as they can. When we say the PC1 worker prefers work from their local disk, you end up with PC1_1 getting a lot of work. If we pledge every 4 minutes;

  • AP_1 gets an AP job (Sector0), finishes it in 1 minute.

  • PC1_1 gets Sector0 because its file is local

  • AP_1 gets an AP job (Sector1), finishes it in 1 minute.

  • PC1_1 gets Sector1 because its file is local

etc..

Only once both AP_1 and AP_2 are doing things, will PC1_2 receive jobs, because then the AP will be happening on MachineB.

Eventually:
PC1_1 is finally full with the amount of PC1 jobs that it can do;
AP_1 still gets the jobs.
PC1_2 (MachineB) will have to fetch data from AP_1 (MachineA)

This generates a lot of network strain.

Best solution: round-robin the workers.

Keep a list of workers, keep track of "assigned job count"

AP_1 | 0
AP_2 | 0
PC1_1 | 0
PC1_2 | 0

Sort the list by descending job count, assign a job, increment the job counter.

You will end up with (no matter the time / spread) all AP_1 machines getting an even amount of jobs, thus the PC1 workers getting an even amount of jobs, as well as the PC2 / C2 workers.

This will hold up if you pledge every 10 seconds, of every 50 minutes.
This will also prevent any worker from being overloaded, and it allows for PC1 to prefer local AP work. (As well the case if PC1 / AP / PC2 / C2 happens on a single machine)

In the case a machine has been selected by round-robin but it doesn't have the resources (this should never happen in case of similar hardware / times), you can always skip over it and take the next one in the list. It will end up with a skewed list where the "overloaded" worker always comes first due to a low job count, but this shouldn't matter in my opinion.

Copy link
Contributor

@ZenGround0 ZenGround0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skimmed this though this part of the system is probably the most foreign to me so not more than that. I can go over this with you in more depth later this week if you want a more extensive review. Approving now in the interest of getting in by code freeze.

@magik6k magik6k merged commit 80ccd14 into master Mar 9, 2023
@magik6k magik6k deleted the feat/assigner-experiments branch March 9, 2023 00:28
@rjan90 rjan90 mentioned this pull request Mar 9, 2023
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants