Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20355] Add per application spark version on the history server headerpage #17658

Closed
wants to merge 14 commits into from

Conversation

redsanket
Copy link

@redsanket redsanket commented Apr 17, 2017

What changes were proposed in this pull request?

Spark Version for a specific application is not displayed on the history page now. It should be nice to switch the spark version on the UI when we click on the specific application.
Currently there seems to be way as SparkListenerLogStart records the application version. So, it should be trivial to listen to this event and provision this change on the UI.
For Example
screen shot 2017-04-06 at 3 23 41 pm
screen shot 2017-04-17 at 9 59 33 am

{"Event":"SparkListenerLogStart","Spark Version":"2.0.0"}
(Please fill in changes proposed in this fix)
Modified the SparkUI for History server to listen to SparkLogListenerStart event and extract the version and print it.

How was this patch tested?

Manual testing of UI page. Attaching the UI screenshot changes here

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Please review http://spark.apache.org/contributing.html before opening a pull request.

@tgravescs
Copy link
Contributor

Jenkins, test this please

@SparkQA
Copy link

SparkQA commented Apr 17, 2017

Test build #75856 has finished for PR 17658 at commit 1f50b27.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@ajbozarth ajbozarth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this idea is nice, but it also seems confusing (though leaving it the way it is now is just as confusing). I'm a bit worried about the listener changes as noted in my comments, but you can probably clear my worries with some explanations of your decisions.

SparkUI.createHistoryUI(conf, replayBus, appSecManager,
appListener.appSparkVersion.getOrElse(""), appInfo.name,
HistoryServer.getAttemptURI(appId, attempt.attemptId),
attempt.startTime)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think adding this line wrap is necessary

@@ -71,7 +71,6 @@ private[spark] trait SparkListenerBus
listener.onNodeUnblacklisted(nodeUnblacklisted)
case blockUpdated: SparkListenerBlockUpdated =>
listener.onBlockUpdated(blockUpdated)
case logStart: SparkListenerLogStart => // ignore event log metadata
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at this closely, but will allowing this through cause any issues? Why was it ignored before?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see it getting consumed apart from registering some metadata, so I guess it should be fine as this event already logs the version

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this change the behavior of SparkListenerBus? Previously we ignored this event, but here with this change we will post this event through listener.onOtherEvent and user will get this event. But from the code this event is not accessible outside of Spark.

private[spark] case class SparkListenerLogStart(sparkVersion: String) extends SparkListenerEvent

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if this is gonna be posted on the bus it might be better to make it public and @DeveloperApi like other events.

Adding new events is not an issue, but posting an event that people can't handle is a little odd.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not too sure if it will change any behavior and precisely why we post it to other events, in case someone wants to listen to them and utilize the event like in this scenario

@@ -160,7 +160,6 @@ case class SparkListenerApplicationEnd(time: Long) extends SparkListenerEvent

/**
* An internal class that describes the metadata of an event log.
* This event is not meant to be posted to listeners downstream.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this note was here? What was the reasoning behind it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was only for metadata info, so when this was written it was just not meant to be consumed but now we can reuse it for this case

@tgravescs
Copy link
Contributor

the idea here was mine. I agree that it could be confusing but its also confusing as is and its hard to find the version that was run. I was figuring this way would be consistent with the live UI. If you have other suggestions I am definitely open. I thought about just removing the version from there and putting it elsewhere but then its less consistent.

As far as you questions on the event. I don't really know why it was made private my assumption here is that it was for metadata only and could have potentially changed. However this is the only place we currently store the version and I don't really see any harm in sending this event out but it would be good to check with others.

@andrewor14 @vanzin Do you have any issues with changing to send SparkListenerLogStart event out?

The other option is to add a new event or put the version in with another event but then it won't work with any of the older versions.

// Do not call ui.bind() to avoid creating a new server for each application
}

val fileStatus = fs.getFileStatus(new Path(logDir, attempt.logPath))

val appListener = replay(fileStatus, isApplicationCompleted(fileStatus), replayBus)
Copy link
Contributor

@jerryshao jerryshao Apr 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct to remove this two lines? As I know it is necessary because in the previous replay call we only parse SparkListenerApplicationStart and SparkListenerApplicationEnd event.

Sorry about it, just saw you moved this two lines above.

Copy link
Contributor

@vanzin vanzin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine. But it would be great to expose this in the API too (e.g. in ApplicationInfo).

override def onOtherEvent(event:SparkListenerEvent): Unit = event match {
case SparkListenerLogStart(sparkVersion) =>
appSparkVersion = Some(sparkVersion)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need a "catch all" here:

scala> class MyListener extends org.apache.spark.scheduler.SparkListener {
     |   override def onOtherEvent(event: org.apache.spark.scheduler.SparkListenerEvent): Unit = event match {
     |     case _: org.apache.spark.scheduler.SparkListenerTaskStart => ()
     |   }
     | }
defined class MyListener
scala> sc.addSparkListener(new MyListener())
scala> spark.range(1000).write.saveAsTable("test")
17/04/19 10:39:01 ERROR LiveListenerBus: Listener MyListener threw an exception
scala.MatchError: SparkListenerSQLExecutionStart(0,saveAsTable at <console>:27,org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:354)

@@ -71,7 +71,6 @@ private[spark] trait SparkListenerBus
listener.onNodeUnblacklisted(nodeUnblacklisted)
case blockUpdated: SparkListenerBlockUpdated =>
listener.onBlockUpdated(blockUpdated)
case logStart: SparkListenerLogStart => // ignore event log metadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if this is gonna be posted on the bus it might be better to make it public and @DeveloperApi like other events.

Adding new events is not an issue, but posting an event that people can't handle is a little odd.

@redsanket
Copy link
Author

@vanzin sure will address the concerns thanks for the review

@tgravescs
Copy link
Contributor

+1. @vanzin any further comments?

@redsanket
Copy link
Author

Jenkins, test this please

Copy link
Contributor

@vanzin vanzin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you going to add this information to the REST API? I don't think it's there yet, and I'd really like to keep UI and API close when possible. It should be pretty easy here.

My preference would be a new field in RuntimeInfo, or ApplicationInfo if that's too awkward.

@@ -57,4 +58,10 @@ private[spark] class ApplicationEventListener extends SparkListener {
adminAclsGroups = allProperties.get("spark.admin.acls.groups")
}
}

override def onOtherEvent(event:SparkListenerEvent):Unit = event match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space after :

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scala style check failed Scalastyle checks failed for this in the above run

at following occurrences:
[error] /home/jenkins/workspace/SparkPullRequestBuilder/core/src/main/scala/org/apache/spark/scheduler/ApplicationEventListener.scala:62:33: No space after token :
[error] (core/compile:scalastyle) errors exist
[error] Total time: 11 s, completed Apr 17, 2017 8:39:08 AM
[error] running /home/jenkins/workspace/SparkPullRequestBuilder/dev/lint-scala ; received return code 1
Attempting to post to Github...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh ok I thought I had space before the style check complained about it but might have interpreted it wrongly

@tgravescs
Copy link
Contributor

Jenkins, test this please

@@ -139,6 +140,8 @@ private[spark] abstract class SparkUITab(parent: SparkUI, prefix: String)

def appName: String = parent.appName

def appSparkVersion: String = parent.appSparkVersion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, remove extra line if you are making changes.

@redsanket
Copy link
Author

@vanzin Can I add this to SparkConf.scala https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkConf.scala#L58 just to have the application info here it will have the info on the API without modifying the intended interface as I see modifying ApplicationInfo API will have consequences on wherever it is used and might be a bigger change? Also adding it to RuntimeInfo will be awkward like you said, having it is conf would be nice place where it can live without making major changes. Let me know your opinion, I can make changes

@vanzin
Copy link
Contributor

vanzin commented Apr 25, 2017

Adding fields to API types is ok. It doesn't break the contract.

Adding this in the configuration is weird, because it's not a config.

@redsanket
Copy link
Author

ok will add it to either RuntimeInfo or ApplicationInfo. I thought it might break the contract underneath but if it doesn't then I should add it here thanks for the input. Will do

@redsanket
Copy link
Author

Jenkins, test this please

1 similar comment
@tgravescs
Copy link
Contributor

Jenkins, test this please

@tgravescs
Copy link
Contributor

test this please

@SparkQA
Copy link

SparkQA commented Apr 26, 2017

Test build #76190 has finished for PR 17658 at commit 5f6e80b.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Apr 26, 2017

The mima problem is actually not a problem; the constructor is private so there's no problem in changing it. Just add the exception in MimaExcludes.scala.

@@ -92,7 +92,8 @@ private[spark] object ApplicationsListResource {
sparkUser = internalAttemptInfo.sparkUser,
completed = internalAttemptInfo.completed
)
}
},
appSparkVersion = org.apache.spark.SPARK_VERSION
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not right. This might be different from the current version, just like the UI case. You have to somehow plumb it from the event log information to here.

@tgravescs
Copy link
Contributor

Jenkins, okay to test

@tgravescs
Copy link
Contributor

test this please

@SparkQA
Copy link

SparkQA commented May 2, 2017

Test build #76389 has finished for PR 17658 at commit e7da27d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented May 3, 2017

retest this please

@redsanket
Copy link
Author

The issue before was that I was replaying the events before adding the listeners in the getSparkUI.
That caused a bunch of tests to fail and other files were to do with the appSparkVersion not being set in the expection_json to ""

@@ -60,6 +60,8 @@ private[spark] class SparkUI private (

var appId: String = _

var appSparkVersion = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are no longer setting this for the liveUI so it either needs to be default here to org.apache.spark.SPARK_VERSION or we need to call the setAppSparkVersion in SparkContext.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah will fix that thanks

@tgravescs
Copy link
Contributor

there are also a few formatting things that look like they were just line wraps and extra new lines.

@SparkQA
Copy link

SparkQA commented May 3, 2017

Test build #76425 has finished for PR 17658 at commit f1fa890.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@redsanket
Copy link
Author

SparkContext was not able to read SparkListenerLogStart event as it is not a part of it and the subsequent replay listener suite tries to compare eventLogs and original events emitted via SparkContext

@tgravescs
Copy link
Contributor

retest this please

@vanzin
Copy link
Contributor

vanzin commented May 4, 2017

no idea why tests are not triggering automatically for this PR. maybe: ok to test

@SparkQA
Copy link

SparkQA commented May 4, 2017

Test build #76464 has finished for PR 17658 at commit bafad1a.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@redsanket
Copy link
Author

I think I should set up my IDE would be nice to have something like checkstyle.xml instead of configuring these things, sorry abt that

@tgravescs
Copy link
Contributor

@srowen @vanzin do either of you know where the jenkins stuff is configured? wondering why this isn't working for me.

@vanzin
Copy link
Contributor

vanzin commented May 4, 2017

I have no idea, sorry.

@@ -283,10 +283,15 @@ private[spark] object EventLoggingListener extends Logging {
*
* @param logStream Raw output stream to the event log file.
*/
def initEventLog(logStream: OutputStream): Unit = {
def initEventLog(logStream: OutputStream, testing: Boolean,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: style

See style for multi-line arguments in ApplicationAttemptInfo, for example. But see below.

logStream.write(metadataJson.getBytes(StandardCharsets.UTF_8))
if (testing && loggedEvents != null) {
loggedEvents += eventJson
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this, how about returning the SparkListenerLogStart event?

Copy link
Author

@redsanket redsanket May 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the loggedEvents only takes json value. Also the loggedEvents are generated here as a part of spark context and probably through other sources. The ReplayListenerSuite however tests the original events with the replay events (here the replay events are written to the event log but however the loggedEvents will not have the SparkListenerLogStart event, as this is not a part of SparkContext if I understand it correctly).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downside to returning the event is that this function can't do more in the future. For instance if it wants to add 2 events. I guess its private so it doesn't matter to much and its just for testing stuff.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern is that it makes calling this method from other places awkward. (i.e. the semantics of the new arguments are not clear.)

In any case, there's a single other call to this method, in a test suite, so probably ok.

@@ -93,6 +95,10 @@ private[spark] class SparkUI private (
appId = id
}

def setAppSparkVersion(version: String): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

appSparkVersion is a var, so this is not technically needed.

@@ -22,6 +23,7 @@
"duration" : 101795,
"sparkUser" : "jose",
"completed" : true,
"appSparkVersion" : "",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look correct. If I look at the log file for this app (core/src/test/resources/spark-events/app-20161115172038-0000), there's a log start event:

{"Event":"SparkListenerLogStart","Spark Version":"2.1.0-SNAPSHOT"}

So this value should be "2.1.0-SNAPSHOT", no?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure the if the tests hit this code path https://github.com/apache/spark/pull/17658/files#diff-a7befb99e7bd7e3ab5c46c2568aa5b3eR474, so they take the default value

Copy link
Author

@redsanket redsanket May 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably I could change the default value, ok will do it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really about the default value; these tests replay the log files, which contain the Spark version, so I would expect the data retrieved through the API to contain the version that was recorded in the event log.

Another way of saying that probably there's a bug somewhere in your code that is preventing the data from the event log from being exposed correctly through the REST API.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see events being passed to ApplicationEventListener in debug logs but interestingly I doPostEvent seems to be not posting events to the listener to listen to the event which is a bit odd, not sure I need to add a change in contract for the tests to pick this up though

17/05/05 16:39:42.235 qtp1618269752-1127 INFO ReplayListenerBus: listener --- event org.apache.spark.ui.jobs.JobProgressListener@53e58df ---SparkListenerLogStart(2.3.0-SNAPSHOT)
17/05/05 16:39:42.235 qtp1618269752-1127 INFO ReplayListenerBus: listener --- event org.apache.spark.ui.env.EnvironmentListener@53371e87 ---SparkListenerLogStart(2.3.0-SNAPSHOT)
17/05/05 16:39:42.235 qtp1618269752-1127 INFO ReplayListenerBus: listener --- event org.apache.spark.storage.StorageStatusListener@394665e7 ---SparkListenerLogStart(2.3.0-SNAPSHOT)
17/05/05 16:39:42.235 qtp1618269752-1127 INFO ReplayListenerBus: listener --- event org.apache.spark.ui.exec.ExecutorsListener@2d92f38e ---SparkListenerLogStart(2.3.0-SNAPSHOT)
17/05/05 16:39:42.235 qtp1618269752-1127 INFO ReplayListenerBus: listener --- event org.apache.spark.ui.storage.StorageListener@5a9dbe43 ---SparkListenerLogStart(2.3.0-SNAPSHOT)
17/05/05 16:39:42.235 qtp1618269752-1127 INFO ReplayListenerBus: listener --- event org.apache.spark.ui.scope.RDDOperationGraphListener@31b550b1 ---SparkListenerLogStart(2.3.0-SNAPSHOT)
17/05/05 16:39:42.236 qtp1618269752-1127 INFO ReplayListenerBus: listener --- event org.apache.spark.scheduler.ApplicationEventListener@3b56723d ---SparkListenerLogStart(2.3.0-SNAPSHOT)
17/05/05 16:39:43.300 qtp1618269752-1126 INFO ReplayListenerBus: listener --- event org.apache.spark.ui.jobs.JobProgressListener@3647a865 ---SparkListenerLogStart(2.3.0-SNAPSHOT)
17/05/05 16:39:43.300 qtp1618269752-1126 INFO ReplayListenerBus: listener --- event org.apache.spark.ui.env.EnvironmentListener@4357e8cc ---SparkListenerLogStart(2.3.0-SNAPSHOT)
17/05/05 16:39:43.300 qtp1618269752-1126 INFO ReplayListenerBus: listener --- event org.apache.spark.storage.StorageStatusListener@5062f4c0 ---SparkListenerLogStart(2.3.0-SNAPSHOT)
17/05/05 16:39:43.300 qtp1618269752-1126 INFO ReplayListenerBus: listener --- event org.apache.spark.ui.exec.ExecutorsListener@273c097f ---SparkListenerLogStart(2.3.0-SNAPSHOT)
17/05/05 16:39:43.300 qtp1618269752-1126 INFO ReplayListenerBus: listener --- event org.apache.spark.ui.storage.StorageListener@2c33997d ---SparkListenerLogStart(2.3.0-SNAPSHOT)
17/05/05 16:39:43.300 qtp1618269752-1126 INFO ReplayListenerBus: listener --- event org.apache.spark.ui.scope.RDDOperationGraphListener@505e15bf ---SparkListenerLogStart(2.3.0-SNAPSHOT)
17/05/05 16:39:43.300 qtp1618269752-1126 INFO ReplayListenerBus: listener --- event org.apache.spark.scheduler.ApplicationEventListener@175d3f80 ---SparkListenerLogStart(2.3.0-SNAPSHOT)
17/05/05 16:39:44.305 qtp1618269752-1131 INFO ReplayListenerBus: listener --- event org.apache.spark.ui.jobs.JobProgressListener@71dacceb ---SparkListenerLogStart(2.3.0-SNAPSHOT)
17/05/05 16:39:44.305 qtp1618269752-1131 INFO ReplayListenerBus: listener --- event org.apache.spark.ui.env.EnvironmentListener@6f41235c ---SparkListenerLogStart(2.3.0-SNAPSHOT)
17/05/05 16:39:44.305 qtp1618269752-1131 INFO ReplayListenerBus: listener --- event org.apache.spark.storage.StorageStatusListener@3cdcb6d0 ---SparkListenerLogStart(2.3.0-SNAPSHOT)
17/05/05 16:39:44.305 qtp1618269752-1131 INFO ReplayListenerBus: listener --- event org.apache.spark.ui.exec.ExecutorsListener@4abcd5c6 ---SparkListenerLogStart(2.3.0-SNAPSHOT)
17/05/05 16:39:44.305 qtp1618269752-1131 INFO ReplayListenerBus: listener --- event org.apache.spark.ui.storage.StorageListener@6d26f5ee ---SparkListenerLogStart(2.3.0-SNAPSHOT)
17/05/05 16:39:44.305 qtp1618269752-1131 INFO ReplayListenerBus: listener --- event org.apache.spark.ui.scope.RDDOperationGraphListener@b1b60d5 ---SparkListenerLogStart(2.3.0-SNAPSHOT)
17/05/05 16:39:44.305 qtp1618269752-1131 INFO ReplayListenerBus: listener --- event org.apache.spark.scheduler.ApplicationEventListener@52283d50 ---SparkListenerLogStart(2.3.0-SNAPSHOT)
17/05/05 16:39:44.676 qtp1478318899-1195 INFO ReplayListenerBus: listener --- event org.apache.spark.ui.jobs.JobProgressListener@f1563e7 ---SparkListenerLogStart(1.4.0-SNAPSHOT)
17/05/05 16:39:44.676 qtp1478318899-1195 INFO ReplayListenerBus: listener --- event org.apache.spark.ui.env.EnvironmentListener@9122315 ---SparkListenerLogStart(1.4.0-SNAPSHOT)
17/05/05 16:39:44.676 qtp1478318899-1195 INFO ReplayListenerBus: listener --- event org.apache.spark.storage.StorageStatusListener@eb98c21 ---SparkListenerLogStart(1.4.0-SNAPSHOT)
17/05/05 16:39:44.676 qtp1478318899-1195 INFO ReplayListenerBus: listener --- event org.apache.spark.ui.exec.ExecutorsListener@1da79f75 ---SparkListenerLogStart(1.4.0-SNAPSHOT)
17/05/05 16:39:44.676 qtp1478318899-1195 INFO ReplayListenerBus: listener --- event org.apache.spark.ui.storage.StorageListener@3989cd7d ---SparkListenerLogStart(1.4.0-SNAPSHOT)
17/05/05 16:39:44.676 qtp1478318899-1195 INFO ReplayListenerBus: listener --- event org.apache.spark.ui.scope.RDDOperationGraphListener@219b4bd2 ---SparkListenerLogStart(1.4.0-SNAPSHOT)
17/05/05 16:39:44.676 qtp1478318899-1195 INFO ReplayListenerBus: listener --- event org.apache.spark.scheduler.ApplicationEventListener@63811344 ---SparkListenerLogStart(1.4.0-SNAPSHOT)
17/05/05 16:39:44.722 qtp1478318899-1195 INFO ReplayListenerBus: listener --- event org.apache.spark.ui.jobs.JobProgressListener@52177d75 ---SparkListenerLogStart(1.4.0-SNAPSHOT)
17/05/05 16:39:44.722 qtp1478318899-1195 INFO ReplayListenerBus: listener --- event org.apache.spark.ui.env.EnvironmentListener@72409b15 ---SparkListenerLogStart(1.4.0-SNAPSHOT)
17/05/05 16:39:44.722 qtp1478318899-1195 INFO ReplayListenerBus: listener --- event org.apache.spark.storage.StorageStatusListener@5ee11505 ---SparkListenerLogStart(1.4.0-SNAPSHOT)
17/05/05 16:39:44.722 qtp1478318899-1195 INFO ReplayListenerBus: listener --- event org.apache.spark.ui.exec.ExecutorsListener@24bad915 ---SparkListenerLogStart(1.4.0-SNAPSHOT)
17/05/05 16:39:44.722 qtp1478318899-1195 INFO ReplayListenerBus: listener --- event org.apache.spark.ui.storage.StorageListener@4d2fe422 ---SparkListenerLogStart(1.4.0-SNAPSHOT)
17/05/05 16:39:44.723 qtp1478318899-1195 INFO ReplayListenerBus: listener --- event org.apache.spark.ui.scope.RDDOperationGraphListener@3bc156a1 ---SparkListenerLogStart(1.4.0-SNAPSHOT)
17/05/05 16:39:44.723 qtp1478318899-1195 INFO ReplayListenerBus: listener --- event org.apache.spark.scheduler.ApplicationEventListener@6d5e35e1 ---SparkListenerLogStart(1.4.0-SNAPSHOT)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I know what this is. FsHistoryProvider has this code to speed up the log parsing for listings:

  protected def mergeApplicationListing(fileStatus: FileStatus): Unit = {
    val newAttempts = try {
      val eventsFilter: ReplayEventsFilter = { eventString =>
        eventString.startsWith(APPL_START_EVENT_PREFIX) ||
          eventString.startsWith(APPL_END_EVENT_PREFIX)
      }

It only allows those two events through currently, you probably need to the log start event to that list too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok doPost is posting the events
7/05/05 17:17:17.265 qtp927704210-1183 INFO ReplayListenerBus: eventInDoPost SparkListenerLogStart(1.4.0-SNAPSHOT) So it should be good, so looks like ApplicationEventListener is not able to read the events, it used to before something has changed will dig deeper

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aah ok the UI will work because it is replaying but the rest end point would break as it is not allowing it to pass through, thanks @vanzin

@SparkQA
Copy link

SparkQA commented May 5, 2017

Test build #76466 has finished for PR 17658 at commit 52f414c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 5, 2017

Test build #76469 has finished for PR 17658 at commit dad87a6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented May 5, 2017

LGTM pending tests.

@SparkQA
Copy link

SparkQA commented May 6, 2017

Test build #76505 has finished for PR 17658 at commit 93e7a20.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@redsanket
Copy link
Author

@tgravescs @vanzin ready for merge?

@tgravescs
Copy link
Contributor

looks good, I'll merge. thanks @redsanket

@asfgit asfgit closed this in 181261a May 9, 2017
liyichao pushed a commit to liyichao/spark that referenced this pull request May 24, 2017
… headerpage

## What changes were proposed in this pull request?

Spark Version for a specific application is not displayed on the history page now. It should be nice to switch the spark version on the UI when we click on the specific application.
Currently there seems to be way as SparkListenerLogStart records the application version. So, it should be trivial to listen to this event and provision this change on the UI.
For Example
<img width="1439" alt="screen shot 2017-04-06 at 3 23 41 pm" src="https://cloud.githubusercontent.com/assets/8295799/25092650/41f3970a-2354-11e7-9b0d-4646d0adeb61.png">
<img width="1399" alt="screen shot 2017-04-17 at 9 59 33 am" src="https://cloud.githubusercontent.com/assets/8295799/25092743/9f9e2f28-2354-11e7-9605-f2f1c63f21fe.png">

{"Event":"SparkListenerLogStart","Spark Version":"2.0.0"}
(Please fill in changes proposed in this fix)
Modified the SparkUI for History server to listen to SparkLogListenerStart event and extract the version and print it.

## How was this patch tested?
Manual testing of UI page. Attaching the UI screenshot changes here

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: Sanket <schintap@untilservice-lm>

Closes apache#17658 from redsanket/SPARK-20355.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants