Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-2193] Improve tasks preferrd locality by sorting tasks partial or... #1131

Closed
wants to merge 1 commit into from

Conversation

li-zhihui
Copy link
Contributor

Now, the last executor(s) maybe not get it’s preferred task(s), although these tasks have build in pendingTasksForHosts map. Because executers pick up tasks sequential, their preferred task(s) maybe picked up by other executors.
This appearance can be eliminated by sorting tasks partial ordering. Executor pick up task by host’s order of task’s preferredLocation, that mean, executor firstly pick up all tasks which task.preferredLocations.1 = executor.hostName, then secondly…

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@mridulm
Copy link
Contributor

mridulm commented Jun 19, 2014

A preferred task for one worker might be picked up by another worker on process/node/rack only if they are at the same locality level : in which case, it is irrelevant which worker picks it up since both are at same locality level. I am probably missing why this is required ?

@li-zhihui
Copy link
Contributor Author

@mridulm
for example:
2 tasks(task_x, task_y), 2 executors(host1, host2)
task_x.preferredLocations = [host2, host3, host1]
task_y.preferredLocations = [host1, host3, host4]
the task_x exists in penddingTasks array of host1, host2, host3
the task_y exists in pendinngTasks array of host1, host3, host4
host1 pick up firstly
if host1 pick up task_x & task_x is the last task of host2
then host2 can't get any host_preferred_task(and it must wait 3 seconds to get task_y)
if host1 pick up task_y and leave task_x to host2
then both of host1 and host2 can get preferred task

@JoshRosen
Copy link
Contributor

Hi @li-zhihui,

Sorry for allowing this to sit unreviewed for so long.

To check my understanding, the original issue was that an individual executor's pending task queue might have non-preferred tasks that appear ahead of preferred ones in the queue? It looks like this might have been partially addressed by #1313, which modified TaskSetManage to maintain a separate list of pending tasks without locality preferences: 63bdb1f#diff-bad3987c83bd22d46416d3dd9d208e76R193.

Since we now maintain separate lists to track pending tasks for executors, hosts, and racks, I don't think that we need this sorting. If you agree, do you mind closing this pull request? Thanks!

@SparkQA
Copy link

SparkQA commented Sep 5, 2014

Can one of the admins verify this patch?

@asfgit asfgit closed this in eae81b0 Sep 12, 2014
udaynpusa pushed a commit to mapr/spark that referenced this pull request Jan 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants