Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARK-2533 - Add locality levels on stage summary view #9487

Closed
wants to merge 7 commits into from

Conversation

jbonofre
Copy link
Member

@jbonofre jbonofre commented Nov 5, 2015

No description provided.

@jbonofre
Copy link
Member Author

jbonofre commented Nov 5, 2015

@kayousterhout suggestions:

  1. can we change the format of this to say "Process local: 2 tasks; Rack local: 4 tasks; Any: 15 tasks"? I think that's easier to read and more clear what it is.

  2. I wonder if we should put this under the "Show additional metrics" drop down. Right now, it's one of the first things someone will see when they open the UI, and it's not clear to me that that's the right prioritization.

I'm updating there.

@jbonofre
Copy link
Member Author

jbonofre commented Nov 5, 2015

Reference to the "old" PR: #9117

@SparkQA
Copy link

SparkQA commented Nov 5, 2015

Test build #1985 has finished for PR 9487 at commit f7cf968.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val locality = taskUIData.taskInfo.taskLocality.toString
localityCounts.put(locality, localityCounts.getOrElse(locality, 0L) + 1)
})
return localityCounts.map { _ match {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can omit return and the _ match since you have just one case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct Sean. Thanks ! Actually, I'm implementing second Kay's suggestion (using "Show Additional Metrics" checkbox). I will update the PR.

@jbonofre
Copy link
Member Author

jbonofre commented Nov 5, 2015

Updated PR implementing Kay's suggestions: formatting the locality level summary string, and moving under the "Show Additional Metrics" drop down.

case (localityLevel, count) => s"$localityLevel: $count task(s)"
}}.mkString("; ")
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might result in inconsistent order of output and always shows 0 counts, which may or may not be desirable. How about this which avoids some of the repetition:

val localities = stageData.taskData.values.map(_.taskInfo.taskLocality)
val localityCounts = localities.groupBy(identity).mapValues(_.size)
val localityNamesAndCounts = localityCounts.toSeq.map { case (locality, count) =>
  val localityName = locality match {
    case TaskLocality.PROCESS_LOCAL => "Process local"
    case TaskLocality.NODE_LOCAL => "Node local"
    case TaskLocality.RACK_LOCAL => "Rack local" 
    case TaskLocality.ANY => "Any"
  }
  (localityName, count)
}
localityNamesAndCounts.sorted.mkString("; ")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, let me try. Thanks again Sean, I appreciate all your help, support and guidance !

@jbonofre
Copy link
Member Author

jbonofre commented Nov 5, 2015

PR rebased and updated according to Sean's suggestion

@@ -21,6 +21,7 @@ import java.net.URLEncoder
import java.util.Date
import javax.servlet.http.HttpServletRequest

import scala.collection.mutable.HashMap
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Finally, I think this import is no longer needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True !!! Let me fix that.

@jbonofre
Copy link
Member Author

jbonofre commented Nov 5, 2015

PR rebased and updated with alphabetize import thanks to @kayousterhout suggestion (thanks ;))

@kayousterhout
Copy link
Contributor

Can you post the updated screenshot?

case TaskLocality.RACK_LOCAL => "Rack local"
case TaskLocality.ANY => "Any"
}
(localityName, count)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this prints as "(Process local, 3); (Rack local, 5)"? Am I understanding that correctly? If so, can you replace line 83 with s"$localityName: $count" to get a nicer looking format?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, let me do it.

@jbonofre
Copy link
Member Author

jbonofre commented Nov 6, 2015

Sure, let me prepare that.

@jbonofre
Copy link
Member Author

jbonofre commented Nov 6, 2015

screen

@jbonofre
Copy link
Member Author

jbonofre commented Nov 6, 2015

PR rebased and implement Kay's suggestion (about the locality level summary string format)

@srowen
Copy link
Member

srowen commented Nov 6, 2015

LGTM

@@ -177,6 +192,10 @@ private[ui] class StagePage(parent: StagesTab) extends WebUIPage("stage") {
<div class="additional-metrics collapsed">
<ul>
<li>
<strong>Locality Level Summary: </strong>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, actually on final thought, is this the right place to show this info? it's dumped into a list of metrics, but all the others are controlled by arrows and checkboxes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about a new table "Locality Levels Summary" displayed by a checkbox in additional metric ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, it didn't register to me that this was adding to the list of additional metrics enabled by checkboxes. If it goes here, I think it would have to be treated the same way. If a table displays well, that's reasonable to me. It might be harder to make render though; could be overkill at this stage. @kayousterhout has better sensibilities here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catcha. Let me try something around this for you.

@andrewor14
Copy link
Contributor

Hm, I find it a little weird that we put it under additional metrics. It's not really a metric and it's different from the rest of the things under that drop down. Should we just put it under Total time across all tasks? @kayousterhout @srowen

@kayousterhout
Copy link
Contributor

@andrewor14 I think that was how @jbonofre originally did it, but the complaint with that approach was that it clutters the UI with information that may not be useful to many people. I'm torn here though because it is pretty small; I'm fine with just putting it under the total time if @srowen is OK with that.

@jbonofre
Copy link
Member Author

Correct Kay, it's the way that I did at the beginning.
I'm proposing either:

  • display under the total time as I did at the beginning
  • add a "Locality Levels Summary" expanded div (like "Additional Metric"), collapsed by default (I'm preparing an update on the PR about this)

Thoughts ?
Thanks !

@jbonofre
Copy link
Member Author

I'm updating the PR with the collapsed table. I will attach a screenshot.

@jbonofre
Copy link
Member Author

PR rebased and updated with a collapsed div. Please, see the following screenshots:

Default stage view:
screen

When user expands the "locality level summary" div:
screen2

@kayousterhout
Copy link
Contributor

I'd prefer your original approach to this, because in either case, it takes up one line of text in the default view, but with this, it takes up one line of text and you have to click on that line to see the metrics.

@andrewor14 @srowen what about putting the summary ("Locality summary: process local: 3 tasks; dode local: 6 tasks; ..." under the "Summary metrics" heading and above the table (and always displayed)?

@jbonofre might be a good idea to hold off on changing the code until there's consensus on the right approach

@jbonofre
Copy link
Member Author

@kayousterhout ok, let's wait feedback from the others. I will revert my commit, back on a single string always displayed.

Thanks !

@srowen
Copy link
Member

srowen commented Nov 11, 2015

Thanks, the picture helps me think about where this should go.

This seems most related to the "Tasks" section which also reports the locality of each task. Is it too random to stick this in as a line of text before or after that table?

Summary Metrics: could go here too. This is only for completed tasks though. The metrics are for active tasks too right?

@jbonofre
Copy link
Member Author

You are right: it's more related to tasks. And yes, it's valid for both completed and active tasks.

@kayousterhout
Copy link
Contributor

Good point re: the summary metrics being only for completed tasks. I think it's a little weird to put the summary metric under the "tasks" table though; what about the original proposal of putting it near the top, under "total time"?

@jbonofre
Copy link
Member Author

+1 with @kayousterhout

@srowen
Copy link
Member

srowen commented Nov 11, 2015

Ha OK, full circle there. At least we've considered every possible position on the page. I'm OK with that. Maybe later there's a big redesign that re-sorts all the info here but for now it's no bad place to stick this info, logically or aesthetically.

@andrewor14
Copy link
Contributor

I agree, we should just put it under total time. The KV pairs at the top of the page are for summary info aggregated across tasks and that's exactly what this is. It looks clunky as a table and doesn't seem to be of equal importance to things like the event timeline.

@jbonofre
Copy link
Member Author

Let me revert the unecessary commits.

@jbonofre
Copy link
Member Author

PR rebased, locality levels summary on stage just under the time (as first proposal).

@srowen
Copy link
Member

srowen commented Nov 12, 2015

OK that seems pretty good.

@andrewor14
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Nov 12, 2015

Test build #45752 has finished for PR 9487 at commit bd7de49.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

LGTM merging into master 1.6, thanks!

asfgit pushed a commit that referenced this pull request Nov 12, 2015
Author: Jean-Baptiste Onofré <jbonofre@apache.org>

Closes #9487 from jbonofre/SPARK-2533-2.

(cherry picked from commit 74c3004)
Signed-off-by: Andrew Or <andrew@databricks.com>
@asfgit asfgit closed this in 74c3004 Nov 12, 2015
@jbonofre
Copy link
Member Author

Thanks ! Let me prepare new PR ;)

dskrvk pushed a commit to dskrvk/spark that referenced this pull request Nov 13, 2015
Author: Jean-Baptiste Onofré <jbonofre@apache.org>

Closes apache#9487 from jbonofre/SPARK-2533-2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants