Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20648][core] Port JobsTab and StageTab to the new UI backend. #19698

Closed
wants to merge 8 commits into from

Conversation

vanzin
Copy link
Contributor

@vanzin vanzin commented Nov 8, 2017

This change is a little larger because there's a whole lot of logic
behind these pages, all really tied to internal types and listeners,
and some of that logic had to be implemented in the new listener and
the needed data exposed through the API types.

  • Added missing StageData and ExecutorStageSummary fields which are
    used by the UI. Some json golden files needed to be updated to account
    for new fields.

  • Save RDD graph data in the store. This tries to re-use existing types as
    much as possible, so that the code doesn't need to be re-written. So it's
    probably not very optimal.

  • Some old classes (e.g. JobProgressListener) still remain, since they're used
    in other parts of the code; they're not used by the UI anymore, though, and
    will be cleaned up in a separate change.

  • Save information about active pools in the store. This data is not really used
    in the SHS, but it's not a lot of data so it's still recorded when replaying
    applications.

  • Because the new store sorts things slightly differently from the previous
    code, some json golden files had some elements within them shuffled around.

  • The retention unit test in UISeleniumSuite was disabled because the code
    to throw away old stages / tasks hasn't been added yet.

  • The job description field in the API tries to follow the old behavior, which
    makes it be empty most of the time, even though there's information to fill it
    in. For stages, a new field was added to hold the description (which is basically
    the job description), so that the UI can be rendered in the old way.

  • A new stage status ("SKIPPED") was added to account for the fact that the API
    couldn't represent that state before. Without this, the stage would show up as
    "PENDING" in the UI, which is now based on API types.

  • The API used to expose "executorRunTime" as the value of the task's duration,
    which wasn't really correct (also because that value was easily available
    from the metrics object); this change fixes that by storing the correct duration,
    which also means a few expectation files needed to be updated to account for
    the new durations and sorting differences due to the changed values.

  • Added changes to implement SPARK-20713 and SPARK-21922 in the new code.

Tested with existing unit tests (and by using the UI a lot).

This change is a little larger because there's a whole lot of logic
behind these pages, all really tied to internal types and listeners,
and some of that logic had to be implemented in the new listener and
the needed data exposed through the API types.

- Added missing StageData and ExecutorStageSummary fields which are
  used by the UI. Some json golden files needed to be updated to account
  for new fields.

- Save RDD graph data in the store. This tries to re-use existing types as
  much as possible, so that the code doesn't need to be re-written. So it's
  probably not very optimal.

- Some old classes (e.g. JobProgressListener) still remain, since they're used
  in other parts of the code; they're not used by the UI anymore, though, and
  will be cleaned up in a separate change.

- Save information about active pools in the store. This data is not really used
  in the SHS, but it's not a lot of data so it's still recorded when replaying
  applications.

- Because the new store sorts things slightly differently from the previous
  code, some json golden files had some elements within them shuffled around.

- The retention unit test in UISeleniumSuite was disabled because the code
  to throw away old stages / tasks hasn't been added yet.

- The job description field in the API tries to follow the old behavior, which
  makes it be empty most of the time, even though there's information to fill it
  in. For stages, a new field was added to hold the description (which is basically
  the job description), so that the UI can be rendered in the old way.

- A new stage status ("SKIPPED") was added to account for the fact that the API
  couldn't represent that state before. Without this, the stage would show up as
  "PENDING" in the UI, which is now based on API types.

- The API used to expose "executorRunTime" as the value of the task's duration,
  which wasn't really correct (also because that value was easily available
  from the metrics object); this change fixes that by storing the correct duration,
  which also means a few expectation files needed to be updated to account for
  the new durations and sorting differences due to the changed values.

- Added changes to implement SPARK-20713 and SPARK-21922 in the new code.
@vanzin
Copy link
Contributor Author

vanzin commented Nov 8, 2017

For context:

This PR is missing some code from the original PR that cleans up some now not needed data kept in the UI. That's because SPARK-20647 is not yet committed. When both that and this PR are in, I'll do the cleanup, probably as part of SPARK-20650.

This change is kinda large, but it gets a bit smaller if you ignore whitespace changes:
https://github.com/apache/spark/pull/19698/files?w=1

Unfortunately you can't comment in that view.

@SparkQA
Copy link

SparkQA commented Nov 8, 2017

Test build #83606 has finished for PR 19698 at commit a22c458.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 10, 2017

Test build #83659 has finished for PR 19698 at commit 1d7242b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -23,19 +23,21 @@ import javax.servlet.http.HttpServletRequest

import scala.collection.JavaConverters._
import scala.collection.mutable.{HashMap, ListBuffer}
import scala.util.Try
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused (and more imports here can be cleaned up)

{listener.schedulingMode.map(_.toString).getOrElse("Unknown")}
</li>
val completedJobs = _completedJobs.toSeq.reverse
val failedJobs = _failedJobs.toSeq.reverse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually is the reverse necessary at all? seems if you trace through, only goes to JobsDataSource, where its sorted anyway

content ++= makeTimeline(activeStages ++ completedStages ++ failedStages,
parent.parent.store.executorList(false), appStartTime)
content ++= makeTimeline(activeStages ++ completedStages ++ failedStages,
store.executorList(false), appStartTime)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indentation

listener.stageIdToInfo.getOrElse(stageId,
new StageInfo(stageId, 0, "Unknown", 0, Seq.empty, Seq.empty, "Unknown"))
val jobId = parameterId.toInt
val jobDataOption = Try(store.job(jobId)).toOption
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most places, you do an explicit try/catch, I assume because you only want to convert NoSuchElementException to None. Does that concern not apply here? also since you do that so much, maybe its worth a helper, so that you could use it like this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a shared method; I'm not too happy with it, but this code needs to be resilient to data disappearing (either because events don't arrive or because data is cleaned up to save memory), so, there's not really a good way around it...

new StageTableBase(request, activeStages, "", "activeStage", parent.basePath, "stages/pool",
parent.progressListener, parent.isFairScheduler, parent.killEnabled,
isFailedStage = false)
// For now, pool information is only accessible in live UIs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird that the PoolPage is even hooked up when there isn't a live UI
(but I think you have the right change here, I wouldn't change that behavior as part of this)

currentTime: Long,
pageSize: Int,
sortColumn: String,
desc: Boolean,
store: AppStatusStore) extends PagedDataSource[TaskTableRowData](pageSize) {
import StagePage._

// Convert TaskUIData to TaskTableRowData which contains the final contents to show in the table
// Keep an internal cache of executor log maps so that long task lists render faster.
private val executors = new HashMap[String, Map[String, String]]()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename to something like executorIdToLogs


private def executorLogs(id: String): Map[String, String] = {
executors.getOrElseUpdate(id,
store.executorSummary(id).map(_.executorLogs).getOrElse(Map.empty))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to protect executors from a race if two UI threads both call this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure there's a separate instance of this class per request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah right, good point

Copy link
Contributor

@squito squito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I have made it through this. Other than one potential race, my other comments are just cosmetic

val configName = "spark.scheduler.mode"
val config = sc match {
case Some(_sc) =>
_sc.conf.getOption(configName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how come this needs to check the sc.conf, but StagesTab doesn't? Also doesn't seem like the old code would check this either.

None
}
def executorSummary(executorId: String): v1.ExecutorSummary = {
store.read(classOf[ExecutorSummaryWrapper], executorId).info
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indentation

@squito
Copy link
Contributor

squito commented Nov 13, 2017

lgtm

@SparkQA
Copy link

SparkQA commented Nov 13, 2017

Test build #83808 has finished for PR 19698 at commit 1fef09d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 13, 2017

Test build #83812 has finished for PR 19698 at commit 0454ed1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@squito
Copy link
Contributor

squito commented Nov 14, 2017

I merged this to master, since its holding up a bunch more of the history server stuff, but I'd encourage other reviews to take a look still @cloud-fan @ajbozarth @jerryshao

@asfgit asfgit closed this in 4741c07 Nov 14, 2017
@vanzin vanzin deleted the SPARK-20648 branch November 14, 2017 17:45
dongjoon-hyun pushed a commit that referenced this pull request Nov 12, 2023
### What changes were proposed in this pull request?
SPARK-15591(#13708) introduced the `MissingStageTableRowData`, but it is no longer used after SPARK-20648(#19698), so this PR removes it.

### Why are the changes needed?
Clean up unused code.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43748 from LuciferYang/SPARK-45875.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants