-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-20648][core] Port JobsTab and StageTab to the new UI backend. #19698
Conversation
This change is a little larger because there's a whole lot of logic behind these pages, all really tied to internal types and listeners, and some of that logic had to be implemented in the new listener and the needed data exposed through the API types. - Added missing StageData and ExecutorStageSummary fields which are used by the UI. Some json golden files needed to be updated to account for new fields. - Save RDD graph data in the store. This tries to re-use existing types as much as possible, so that the code doesn't need to be re-written. So it's probably not very optimal. - Some old classes (e.g. JobProgressListener) still remain, since they're used in other parts of the code; they're not used by the UI anymore, though, and will be cleaned up in a separate change. - Save information about active pools in the store. This data is not really used in the SHS, but it's not a lot of data so it's still recorded when replaying applications. - Because the new store sorts things slightly differently from the previous code, some json golden files had some elements within them shuffled around. - The retention unit test in UISeleniumSuite was disabled because the code to throw away old stages / tasks hasn't been added yet. - The job description field in the API tries to follow the old behavior, which makes it be empty most of the time, even though there's information to fill it in. For stages, a new field was added to hold the description (which is basically the job description), so that the UI can be rendered in the old way. - A new stage status ("SKIPPED") was added to account for the fact that the API couldn't represent that state before. Without this, the stage would show up as "PENDING" in the UI, which is now based on API types. - The API used to expose "executorRunTime" as the value of the task's duration, which wasn't really correct (also because that value was easily available from the metrics object); this change fixes that by storing the correct duration, which also means a few expectation files needed to be updated to account for the new durations and sorting differences due to the changed values. - Added changes to implement SPARK-20713 and SPARK-21922 in the new code.
For context:
This PR is missing some code from the original PR that cleans up some now not needed data kept in the UI. That's because SPARK-20647 is not yet committed. When both that and this PR are in, I'll do the cleanup, probably as part of SPARK-20650. This change is kinda large, but it gets a bit smaller if you ignore whitespace changes: Unfortunately you can't comment in that view. |
Test build #83606 has finished for PR 19698 at commit
|
Test build #83659 has finished for PR 19698 at commit
|
@@ -23,19 +23,21 @@ import javax.servlet.http.HttpServletRequest | |||
|
|||
import scala.collection.JavaConverters._ | |||
import scala.collection.mutable.{HashMap, ListBuffer} | |||
import scala.util.Try |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused (and more imports here can be cleaned up)
{listener.schedulingMode.map(_.toString).getOrElse("Unknown")} | ||
</li> | ||
val completedJobs = _completedJobs.toSeq.reverse | ||
val failedJobs = _failedJobs.toSeq.reverse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually is the reverse
necessary at all? seems if you trace through, only goes to JobsDataSource
, where its sorted anyway
content ++= makeTimeline(activeStages ++ completedStages ++ failedStages, | ||
parent.parent.store.executorList(false), appStartTime) | ||
content ++= makeTimeline(activeStages ++ completedStages ++ failedStages, | ||
store.executorList(false), appStartTime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indentation
listener.stageIdToInfo.getOrElse(stageId, | ||
new StageInfo(stageId, 0, "Unknown", 0, Seq.empty, Seq.empty, "Unknown")) | ||
val jobId = parameterId.toInt | ||
val jobDataOption = Try(store.job(jobId)).toOption |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
most places, you do an explicit try/catch, I assume because you only want to convert NoSuchElementException
to None
. Does that concern not apply here? also since you do that so much, maybe its worth a helper, so that you could use it like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a shared method; I'm not too happy with it, but this code needs to be resilient to data disappearing (either because events don't arrive or because data is cleaned up to save memory), so, there's not really a good way around it...
new StageTableBase(request, activeStages, "", "activeStage", parent.basePath, "stages/pool", | ||
parent.progressListener, parent.isFairScheduler, parent.killEnabled, | ||
isFailedStage = false) | ||
// For now, pool information is only accessible in live UIs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird that the PoolPage is even hooked up when there isn't a live UI
(but I think you have the right change here, I wouldn't change that behavior as part of this)
currentTime: Long, | ||
pageSize: Int, | ||
sortColumn: String, | ||
desc: Boolean, | ||
store: AppStatusStore) extends PagedDataSource[TaskTableRowData](pageSize) { | ||
import StagePage._ | ||
|
||
// Convert TaskUIData to TaskTableRowData which contains the final contents to show in the table | ||
// Keep an internal cache of executor log maps so that long task lists render faster. | ||
private val executors = new HashMap[String, Map[String, String]]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rename to something like executorIdToLogs
|
||
private def executorLogs(id: String): Map[String, String] = { | ||
executors.getOrElseUpdate(id, | ||
store.executorSummary(id).map(_.executorLogs).getOrElse(Map.empty)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to protect executors
from a race if two UI threads both call this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure there's a separate instance of this class per request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah right, good point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I have made it through this. Other than one potential race, my other comments are just cosmetic
val configName = "spark.scheduler.mode" | ||
val config = sc match { | ||
case Some(_sc) => | ||
_sc.conf.getOption(configName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how come this needs to check the sc.conf, but StagesTab doesn't? Also doesn't seem like the old code would check this either.
None | ||
} | ||
def executorSummary(executorId: String): v1.ExecutorSummary = { | ||
store.read(classOf[ExecutorSummaryWrapper], executorId).info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indentation
lgtm |
Test build #83808 has finished for PR 19698 at commit
|
Test build #83812 has finished for PR 19698 at commit
|
I merged this to master, since its holding up a bunch more of the history server stuff, but I'd encourage other reviews to take a look still @cloud-fan @ajbozarth @jerryshao |
### What changes were proposed in this pull request? SPARK-15591(#13708) introduced the `MissingStageTableRowData`, but it is no longer used after SPARK-20648(#19698), so this PR removes it. ### Why are the changes needed? Clean up unused code. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #43748 from LuciferYang/SPARK-45875. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
This change is a little larger because there's a whole lot of logic
behind these pages, all really tied to internal types and listeners,
and some of that logic had to be implemented in the new listener and
the needed data exposed through the API types.
Added missing StageData and ExecutorStageSummary fields which are
used by the UI. Some json golden files needed to be updated to account
for new fields.
Save RDD graph data in the store. This tries to re-use existing types as
much as possible, so that the code doesn't need to be re-written. So it's
probably not very optimal.
Some old classes (e.g. JobProgressListener) still remain, since they're used
in other parts of the code; they're not used by the UI anymore, though, and
will be cleaned up in a separate change.
Save information about active pools in the store. This data is not really used
in the SHS, but it's not a lot of data so it's still recorded when replaying
applications.
Because the new store sorts things slightly differently from the previous
code, some json golden files had some elements within them shuffled around.
The retention unit test in UISeleniumSuite was disabled because the code
to throw away old stages / tasks hasn't been added yet.
The job description field in the API tries to follow the old behavior, which
makes it be empty most of the time, even though there's information to fill it
in. For stages, a new field was added to hold the description (which is basically
the job description), so that the UI can be rendered in the old way.
A new stage status ("SKIPPED") was added to account for the fact that the API
couldn't represent that state before. Without this, the stage would show up as
"PENDING" in the UI, which is now based on API types.
The API used to expose "executorRunTime" as the value of the task's duration,
which wasn't really correct (also because that value was easily available
from the metrics object); this change fixes that by storing the correct duration,
which also means a few expectation files needed to be updated to account for
the new durations and sorting differences due to the changed values.
Added changes to implement SPARK-20713 and SPARK-21922 in the new code.
Tested with existing unit tests (and by using the UI a lot).