-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-17304] Fix perf. issue caused by TaskSetManager.abortIfCompletelyBlacklisted #14871
Conversation
launchedTask = resourceOfferSingleTaskSet( | ||
for (taskSet <- sortedTaskSets) { | ||
var launchedAnyTask = false | ||
var launchedTaskAtMaxLocality = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's super verbose, but to minimize how confusing this code is, what about calling this launchedTaskAtCurrentMaxLocality, and renaming maxLocality below to currentMaxLocality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea; updated.
Test build #64609 has finished for PR 14871 at commit
|
Test build #64610 has finished for PR 14871 at commit
|
LGTM Josh, how long does your microbenchmark take if you comment out the call to abortIfCompletelyBlacklisted? Wondering how much that continues to affect performance. |
@kayousterhout, after this patch's changes commenting out |
Ok cool thanks @JoshRosen -- just wanted to make sure that the blacklisting wasn't still hurting performance. Thanks for fixing this and sorry about the oversight originally! |
Merging this to master. Thanks! |
thanks for finding this and the quick fix @JoshRosen ! |
This patch addresses a minor scheduler performance issue that was introduced in #13603. If you run
then most of the time ends up being spent in
TaskSetManager.abortIfCompletelyBlacklisted()
:When processing resource offers, the scheduler uses a nested loop which considers every task set at multiple locality levels:
In order to prevent jobs with globally blacklisted tasks from hanging, #13603 added a
taskSet.abortIfCompletelyBlacklisted
call inside ofresourceOfferSingleTaskSet
; if a call toresourceOfferSingleTaskSet
fails to schedule any tasks, thenabortIfCompletelyBlacklisted
checks whether the tasks are completely blacklisted in order to figure out whether they will ever be schedulable. The problem with this placement of the call is that the last call toresourceOfferSingleTaskSet
in thewhile
loop will returnfalse
, implying thatresourceOfferSingleTaskSet
will callabortIfCompletelyBlacklisted
, so almost every call toresourceOffers
will trigger theabortIfCompletelyBlacklisted
check for every task set.Instead, I think that this call should be moved out of the innermost loop and should be called at most once per task set in case none of the task set's tasks can be scheduled at any locality level.
Before this patch's changes, the microbenchmark example that I posted above took 35 seconds to run, but it now only takes 15 seconds after this change.
/cc @squito and @kayousterhout for review.