[SPARK-20648][core] Port JobsTab and StageTab to the new UI backend. #19698

vanzin · 2017-11-08T19:03:33Z

This change is a little larger because there's a whole lot of logic
behind these pages, all really tied to internal types and listeners,
and some of that logic had to be implemented in the new listener and
the needed data exposed through the API types.

Added missing StageData and ExecutorStageSummary fields which are
used by the UI. Some json golden files needed to be updated to account
for new fields.
Save RDD graph data in the store. This tries to re-use existing types as
much as possible, so that the code doesn't need to be re-written. So it's
probably not very optimal.
Some old classes (e.g. JobProgressListener) still remain, since they're used
in other parts of the code; they're not used by the UI anymore, though, and
will be cleaned up in a separate change.
Save information about active pools in the store. This data is not really used
in the SHS, but it's not a lot of data so it's still recorded when replaying
applications.
Because the new store sorts things slightly differently from the previous
code, some json golden files had some elements within them shuffled around.
The retention unit test in UISeleniumSuite was disabled because the code
to throw away old stages / tasks hasn't been added yet.
The job description field in the API tries to follow the old behavior, which
makes it be empty most of the time, even though there's information to fill it
in. For stages, a new field was added to hold the description (which is basically
the job description), so that the UI can be rendered in the old way.
A new stage status ("SKIPPED") was added to account for the fact that the API
couldn't represent that state before. Without this, the stage would show up as
"PENDING" in the UI, which is now based on API types.
The API used to expose "executorRunTime" as the value of the task's duration,
which wasn't really correct (also because that value was easily available
from the metrics object); this change fixes that by storing the correct duration,
which also means a few expectation files needed to be updated to account for
the new durations and sorting differences due to the changed values.
Added changes to implement SPARK-20713 and SPARK-21922 in the new code.

Tested with existing unit tests (and by using the UI a lot).

This change is a little larger because there's a whole lot of logic behind these pages, all really tied to internal types and listeners, and some of that logic had to be implemented in the new listener and the needed data exposed through the API types. - Added missing StageData and ExecutorStageSummary fields which are used by the UI. Some json golden files needed to be updated to account for new fields. - Save RDD graph data in the store. This tries to re-use existing types as much as possible, so that the code doesn't need to be re-written. So it's probably not very optimal. - Some old classes (e.g. JobProgressListener) still remain, since they're used in other parts of the code; they're not used by the UI anymore, though, and will be cleaned up in a separate change. - Save information about active pools in the store. This data is not really used in the SHS, but it's not a lot of data so it's still recorded when replaying applications. - Because the new store sorts things slightly differently from the previous code, some json golden files had some elements within them shuffled around. - The retention unit test in UISeleniumSuite was disabled because the code to throw away old stages / tasks hasn't been added yet. - The job description field in the API tries to follow the old behavior, which makes it be empty most of the time, even though there's information to fill it in. For stages, a new field was added to hold the description (which is basically the job description), so that the UI can be rendered in the old way. - A new stage status ("SKIPPED") was added to account for the fact that the API couldn't represent that state before. Without this, the stage would show up as "PENDING" in the UI, which is now based on API types. - The API used to expose "executorRunTime" as the value of the task's duration, which wasn't really correct (also because that value was easily available from the metrics object); this change fixes that by storing the correct duration, which also means a few expectation files needed to be updated to account for the new durations and sorting differences due to the changed values. - Added changes to implement SPARK-20713 and SPARK-21922 in the new code.

vanzin · 2017-11-08T19:06:41Z

For context:

Project link: https://issues.apache.org/jira/browse/SPARK-18085
Upcoming PRs that build on this code: https://github.com/vanzin/spark/pulls
PR with more comments: SHS-NG M4.4: Port JobsTab and StageTab to the new backend. vanzin/spark#47

This PR is missing some code from the original PR that cleans up some now not needed data kept in the UI. That's because SPARK-20647 is not yet committed. When both that and this PR are in, I'll do the cleanup, probably as part of SPARK-20650.

This change is kinda large, but it gets a bit smaller if you ignore whitespace changes:
https://github.com/apache/spark/pull/19698/files?w=1

Unfortunately you can't comment in that view.

SparkQA · 2017-11-08T22:30:08Z

Test build #83606 has finished for PR 19698 at commit a22c458.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-11-10T02:22:15Z

Test build #83659 has finished for PR 19698 at commit 1d7242b.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

squito · 2017-11-13T15:09:12Z

core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala

@@ -23,19 +23,21 @@ import javax.servlet.http.HttpServletRequest

 import scala.collection.JavaConverters._
 import scala.collection.mutable.{HashMap, ListBuffer}
+import scala.util.Try


unused (and more imports here can be cleaned up)

squito · 2017-11-13T15:38:02Z

core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala

-              {listener.schedulingMode.map(_.toString).getOrElse("Unknown")}
-            </li>
+    val completedJobs = _completedJobs.toSeq.reverse
+    val failedJobs = _failedJobs.toSeq.reverse


actually is the reverse necessary at all? seems if you trace through, only goes to JobsDataSource, where its sorted anyway

squito · 2017-11-13T18:19:36Z

core/src/main/scala/org/apache/spark/ui/jobs/JobPage.scala

-      content ++= makeTimeline(activeStages ++ completedStages ++ failedStages,
-          parent.parent.store.executorList(false), appStartTime)
+    content ++= makeTimeline(activeStages ++ completedStages ++ failedStages,
+        store.executorList(false), appStartTime)


nit: indentation

squito · 2017-11-13T18:22:57Z

core/src/main/scala/org/apache/spark/ui/jobs/JobPage.scala

-        listener.stageIdToInfo.getOrElse(stageId,
-          new StageInfo(stageId, 0, "Unknown", 0, Seq.empty, Seq.empty, "Unknown"))
+    val jobId = parameterId.toInt
+    val jobDataOption = Try(store.job(jobId)).toOption


most places, you do an explicit try/catch, I assume because you only want to convert NoSuchElementException to None. Does that concern not apply here? also since you do that so much, maybe its worth a helper, so that you could use it like this?

I added a shared method; I'm not too happy with it, but this code needs to be resilient to data disappearing (either because events don't arrive or because data is cleaned up to save memory), so, there's not really a good way around it...

squito · 2017-11-13T18:35:50Z

core/src/main/scala/org/apache/spark/ui/jobs/PoolPage.scala

-        new StageTableBase(request, activeStages, "", "activeStage", parent.basePath, "stages/pool",
-          parent.progressListener, parent.isFairScheduler, parent.killEnabled,
-          isFailedStage = false)
+    // For now, pool information is only accessible in live UIs


weird that the PoolPage is even hooked up when there isn't a live UI
(but I think you have the right change here, I wouldn't change that behavior as part of this)

squito · 2017-11-13T18:56:31Z

core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala

    currentTime: Long,
    pageSize: Int,
    sortColumn: String,
    desc: Boolean,
    store: AppStatusStore) extends PagedDataSource[TaskTableRowData](pageSize) {
  import StagePage._

-  // Convert TaskUIData to TaskTableRowData which contains the final contents to show in the table
+  // Keep an internal cache of executor log maps so that long task lists render faster.
+  private val executors = new HashMap[String, Map[String, String]]()


nit: rename to something like executorIdToLogs

squito · 2017-11-13T18:58:57Z

core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala

+
+  private def executorLogs(id: String): Map[String, String] = {
+    executors.getOrElseUpdate(id,
+      store.executorSummary(id).map(_.executorLogs).getOrElse(Map.empty))


I think you need to protect executors from a race if two UI threads both call this.

Pretty sure there's a separate instance of this class per request.

ah right, good point

squito

OK I have made it through this. Other than one potential race, my other comments are just cosmetic

squito · 2017-11-13T19:18:58Z

core/src/main/scala/org/apache/spark/ui/jobs/JobsTab.scala

+    val configName = "spark.scheduler.mode"
+    val config = sc match {
+      case Some(_sc) =>
+        _sc.conf.getOption(configName)


how come this needs to check the sc.conf, but StagesTab doesn't? Also doesn't seem like the old code would check this either.

squito · 2017-11-13T20:28:31Z

core/src/main/scala/org/apache/spark/status/AppStatusStore.scala

-        None
-    }
+  def executorSummary(executorId: String): v1.ExecutorSummary = {
+      store.read(classOf[ExecutorSummaryWrapper], executorId).info


nit: indentation

squito · 2017-11-13T20:33:30Z

lgtm

SparkQA · 2017-11-13T23:11:24Z

Test build #83808 has finished for PR 19698 at commit 1fef09d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-11-13T23:52:18Z

Test build #83812 has finished for PR 19698 at commit 0454ed1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

squito · 2017-11-14T16:36:34Z

I merged this to master, since its holding up a bunch more of the history server stuff, but I'd encourage other reviews to take a look still @cloud-fan @ajbozarth @jerryshao

### What changes were proposed in this pull request? SPARK-15591(#13708) introduced the `MissingStageTableRowData`, but it is no longer used after SPARK-20648(#19698), so this PR removes it. ### Why are the changes needed? Clean up unused code. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #43748 from LuciferYang/SPARK-45875. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

Merge branch 'master' into SPARK-20648

1d7242b

squito reviewed Nov 13, 2017

View reviewed changes

Marcelo Vanzin added 3 commits November 13, 2017 11:13

Merge branch 'master' into SPARK-20648

dcccd23

Cleanup imports.

9617db8

Remove unnecessary reverse.

5f848c6

squito reviewed Nov 13, 2017

View reviewed changes

Marcelo Vanzin added 2 commits November 13, 2017 11:47

Centralize handling of NoSuchElementException -> Option.

1734dc0

Other feedback.

1fef09d

squito reviewed Nov 13, 2017

View reviewed changes

Indentation.

0454ed1

asfgit closed this in 4741c07 Nov 14, 2017

vanzin deleted the SPARK-20648 branch November 14, 2017 17:45

LuciferYang mentioned this pull request Nov 10, 2023

[SPARK-45875][CORE] Remove MissingStageTableRowData from core module #43748

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-20648][core] Port JobsTab and StageTab to the new UI backend. #19698

[SPARK-20648][core] Port JobsTab and StageTab to the new UI backend. #19698

vanzin commented Nov 8, 2017

vanzin commented Nov 8, 2017

SparkQA commented Nov 8, 2017

SparkQA commented Nov 10, 2017

squito Nov 13, 2017

squito Nov 13, 2017

squito Nov 13, 2017

squito Nov 13, 2017

vanzin Nov 13, 2017

squito Nov 13, 2017

squito Nov 13, 2017

squito Nov 13, 2017

vanzin Nov 13, 2017

squito Nov 13, 2017

squito left a comment

squito Nov 13, 2017

squito Nov 13, 2017

squito commented Nov 13, 2017

SparkQA commented Nov 13, 2017

SparkQA commented Nov 13, 2017

squito commented Nov 14, 2017

[SPARK-20648][core] Port JobsTab and StageTab to the new UI backend. #19698

[SPARK-20648][core] Port JobsTab and StageTab to the new UI backend. #19698

Conversation

vanzin commented Nov 8, 2017

vanzin commented Nov 8, 2017

SparkQA commented Nov 8, 2017

SparkQA commented Nov 10, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

squito left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

squito commented Nov 13, 2017

SparkQA commented Nov 13, 2017

SparkQA commented Nov 13, 2017

squito commented Nov 14, 2017