-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-4705] Handle multiple app attempts event logs, history server. #5432
Conversation
…ll do the rest of the changes after this
…ll do the rest of the changes after this
…the master branch. 2) Added the attempt id inside the SparkListenerApplicationStart, to make the info available independent of directory structure. 3) Changes in History Server to render the UI as per the snaphot II
This change explicitly models app attempts in the history server. An app is now a collection of app attempts, instead of the logic to match apps to app attempts being implicit in the rendering code. This makes the rendering code a lot simpler, since it doesn't need to do any fancy processing of the app list to figure out what to show.
For backwards compatibility.
This PR is an updated version of #4845. I rebased the code on top of current master, added the suggestions I made on the original PR, fixed a bunch of style nits and other issues, and added a couple of tests. Here's a screenshot: |
Test build #29905 has finished for PR 5432 at commit
|
BTW the zebra-striping in the UI looks a little broken right now, I'll take a look at that. |
Test build #29907 timed out for PR 5432 at commit |
Hmm, didn't find a test failure in the output. |
Jenkins, retest this please. |
Test build #29917 has finished for PR 5432 at commit
|
Test build #29949 has finished for PR 5432 at commit
|
Funny. All YARN tests (not just in this PR) are failing with this:
Wonder what changed in the environment since they were working before? (And why is github's user name search so useless it cannot autocomplete Shane's user name?) |
so i just grepped through the code and found stuff like this: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala: YarnSparkHadoopUtil.expandEnvironment(Environment.JAVA_HOME) + "/bin/java", "-server" i've never explicitly set JAVA_HOME in jenkins' slave user space before, but that's obviously why it's failing. that's pretty bad code imo. solutions:
i prefer the second solution. |
I think Anyway, I'm trying something out in #5441. |
cool. on our systems, at least, the system java we use is /usr/bin/java, which points (through /etc/alternatives), to /usr/java/latest (which itself is a link to /usr/java/jdk1.7.0_71/). |
oh, i just had a thought: i installed a couple of different versions of java through jenkins, and right now the tests are set in the config to use 'Default', which is system level java. i bet this is why JAVA_HOME isn't being set and why the tests are failing. @JoshRosen is going to set JAVA_HOME for us to get the builds green and then we can look a little deeper in to the problem. |
Is it always safe to rely on On a side note: http://stackoverflow.com/questions/17023782/are-java-system-properties-always-non-null |
Note that the YARN code is not resolving |
Ah yes, scratch that. |
Jenkins, retest this please. |
// Do not call ui.bind() to avoid creating a new server for each application | ||
} | ||
applications.get(appId).flatMap { appInfo => | ||
val attempts = appInfo.attempts.filter(_.attemptId == attemptId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the doc for getAppUI
says to use an empty string for apps with a single attempt -- but that isn't exactly what is reflected here. Some yarn apps will be successful on the first attempt, but with this implementation, you still need to pass in the actual attempt id. One way or the other, the doc & this should be resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The interface doc is slightly misleading, but all event logs from YARN will have an attempt ID after this change, even for a single attempt.
Test build #31146 has finished for PR 5432 at commit
|
Conflicts: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
lgtm I'll wait a day for any other feedback. |
@vanzin thanks for the fix. I'll have a quick look at this tonight. |
Test build #31166 has finished for PR 5432 at commit
|
@andrewor14 did you have any comments on this? otherwise I am ready to merge |
Sorry I will look at this right now. |
} | ||
applications.get(appId).flatMap { appInfo => | ||
val attempts = appInfo.attempts.filter(_.attemptId == attemptId) | ||
attempts.headOption.map { attempt => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: this can just be find
I think
Great to see this fixed @vanzin. My comments are mostly minor. Unfortunately I don't have the time to do a closer review. How much more work do you imagine fixing this additionally for standalone mode would be? If it's not that much we should also fix that for 1.4 in separate patch. |
I have no idea, I'm mostly unfamiliar with standalone cluster mode. Feel free to file a separate bug for it. |
Test build #31464 has finished for PR 5432 at commit
|
retest this please |
Latest changes LGTM based on my quick review. @squito feel free to merge it. |
Test build #31480 has finished for PR 5432 at commit
|
This change modifies the event logging listener to write the logs for different application attempts to different files. The attempt ID is set by the scheduler backend, so as long as the backend returns that ID to SparkContext, things should work. Currently, the YARN backend does that. The history server was also modified to model multiple attempts per application. Each attempt has its own UI and a separate row in the listing table, so that users can look at all the attempts separately. The UI "adapts" itself to avoid showing attempt-specific info when all the applications being shown have a single attempt. Author: Marcelo Vanzin <vanzin@cloudera.com> Author: twinkle sachdeva <twinkle@kite.ggn.in.guavus.com> Author: twinkle.sachdeva <twinkle.sachdeva@guavus.com> Author: twinkle sachdeva <twinkle.sachdeva@guavus.com> Closes apache#5432 from vanzin/SPARK-4705 and squashes the following commits: 7e289fa [Marcelo Vanzin] Review feedback. f66dcc5 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705 bc885b7 [Marcelo Vanzin] Review feedback. 76a3651 [Marcelo Vanzin] Fix log cleaner, add test. 7c381ec [Marcelo Vanzin] Merge branch 'master' into SPARK-4705 1aa309d [Marcelo Vanzin] Improve sorting of app attempts. 2ad77e7 [Marcelo Vanzin] Missed a reference to the old property name. 9d59d92 [Marcelo Vanzin] Scalastyle... d5a9c37 [Marcelo Vanzin] Update JsonProtocol test, make property name consistent. ba34b69 [Marcelo Vanzin] Use Option[String] for attempt id. f1cb9b3 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705 c14ec19 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705 9092d39 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705 86de638 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705 07446c6 [Marcelo Vanzin] Disable striping for app id / name when multiple attempts exist. 9092af5 [Marcelo Vanzin] Fix HistoryServer test. 3a14503 [Marcelo Vanzin] Argh scalastyle. 657ec18 [Marcelo Vanzin] Fix yarn history URL, app links. c3e0a82 [Marcelo Vanzin] Move app name to app info, more UI fixes. ce5ee5d [Marcelo Vanzin] Misc UI, test, style fixes. cbe8bba [Marcelo Vanzin] Attempt ID in listener event should be an option. 88b1de8 [Marcelo Vanzin] Add a test for apps with multiple attempts. 3245aa2 [Marcelo Vanzin] Make app attempts part of the history server model. 5fd5c6f [Marcelo Vanzin] Fix my broken rebase. 318525a [twinkle.sachdeva] SPARK-4705: 1) moved from directory structure to single file, as per the master branch. 2) Added the attempt id inside the SparkListenerApplicationStart, to make the info available independent of directory structure. 3) Changes in History Server to render the UI as per the snaphot II 6b2e521 [twinkle sachdeva] SPARK-4705 Incorporating the review comments regarding formatting, will do the rest of the changes after this 4c1fc26 [twinkle sachdeva] SPARK-4705 Incorporating the review comments regarding formatting, will do the rest of the changes after this 0eb7722 [twinkle sachdeva] SPARK-4705: Doing cherry-pick of fix into master
This change modifies the event logging listener to write the logs for different application attempts to different files. The attempt ID is set by the scheduler backend, so as long as the backend returns that ID to SparkContext, things should work. Currently, the YARN backend does that. The history server was also modified to model multiple attempts per application. Each attempt has its own UI and a separate row in the listing table, so that users can look at all the attempts separately. The UI "adapts" itself to avoid showing attempt-specific info when all the applications being shown have a single attempt. Author: Marcelo Vanzin <vanzin@cloudera.com> Author: twinkle sachdeva <twinkle@kite.ggn.in.guavus.com> Author: twinkle.sachdeva <twinkle.sachdeva@guavus.com> Author: twinkle sachdeva <twinkle.sachdeva@guavus.com> Closes apache#5432 from vanzin/SPARK-4705 and squashes the following commits: 7e289fa [Marcelo Vanzin] Review feedback. f66dcc5 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705 bc885b7 [Marcelo Vanzin] Review feedback. 76a3651 [Marcelo Vanzin] Fix log cleaner, add test. 7c381ec [Marcelo Vanzin] Merge branch 'master' into SPARK-4705 1aa309d [Marcelo Vanzin] Improve sorting of app attempts. 2ad77e7 [Marcelo Vanzin] Missed a reference to the old property name. 9d59d92 [Marcelo Vanzin] Scalastyle... d5a9c37 [Marcelo Vanzin] Update JsonProtocol test, make property name consistent. ba34b69 [Marcelo Vanzin] Use Option[String] for attempt id. f1cb9b3 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705 c14ec19 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705 9092d39 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705 86de638 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705 07446c6 [Marcelo Vanzin] Disable striping for app id / name when multiple attempts exist. 9092af5 [Marcelo Vanzin] Fix HistoryServer test. 3a14503 [Marcelo Vanzin] Argh scalastyle. 657ec18 [Marcelo Vanzin] Fix yarn history URL, app links. c3e0a82 [Marcelo Vanzin] Move app name to app info, more UI fixes. ce5ee5d [Marcelo Vanzin] Misc UI, test, style fixes. cbe8bba [Marcelo Vanzin] Attempt ID in listener event should be an option. 88b1de8 [Marcelo Vanzin] Add a test for apps with multiple attempts. 3245aa2 [Marcelo Vanzin] Make app attempts part of the history server model. 5fd5c6f [Marcelo Vanzin] Fix my broken rebase. 318525a [twinkle.sachdeva] SPARK-4705: 1) moved from directory structure to single file, as per the master branch. 2) Added the attempt id inside the SparkListenerApplicationStart, to make the info available independent of directory structure. 3) Changes in History Server to render the UI as per the snaphot II 6b2e521 [twinkle sachdeva] SPARK-4705 Incorporating the review comments regarding formatting, will do the rest of the changes after this 4c1fc26 [twinkle sachdeva] SPARK-4705 Incorporating the review comments regarding formatting, will do the rest of the changes after this 0eb7722 [twinkle sachdeva] SPARK-4705: Doing cherry-pick of fix into master
This change modifies the event logging listener to write the logs for different application attempts to different files. The attempt ID is set by the scheduler backend, so as long as the backend returns that ID to SparkContext, things should work. Currently, the YARN backend does that. The history server was also modified to model multiple attempts per application. Each attempt has its own UI and a separate row in the listing table, so that users can look at all the attempts separately. The UI "adapts" itself to avoid showing attempt-specific info when all the applications being shown have a single attempt. Author: Marcelo Vanzin <vanzin@cloudera.com> Author: twinkle sachdeva <twinkle@kite.ggn.in.guavus.com> Author: twinkle.sachdeva <twinkle.sachdeva@guavus.com> Author: twinkle sachdeva <twinkle.sachdeva@guavus.com> Closes apache#5432 from vanzin/SPARK-4705 and squashes the following commits: 7e289fa [Marcelo Vanzin] Review feedback. f66dcc5 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705 bc885b7 [Marcelo Vanzin] Review feedback. 76a3651 [Marcelo Vanzin] Fix log cleaner, add test. 7c381ec [Marcelo Vanzin] Merge branch 'master' into SPARK-4705 1aa309d [Marcelo Vanzin] Improve sorting of app attempts. 2ad77e7 [Marcelo Vanzin] Missed a reference to the old property name. 9d59d92 [Marcelo Vanzin] Scalastyle... d5a9c37 [Marcelo Vanzin] Update JsonProtocol test, make property name consistent. ba34b69 [Marcelo Vanzin] Use Option[String] for attempt id. f1cb9b3 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705 c14ec19 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705 9092d39 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705 86de638 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705 07446c6 [Marcelo Vanzin] Disable striping for app id / name when multiple attempts exist. 9092af5 [Marcelo Vanzin] Fix HistoryServer test. 3a14503 [Marcelo Vanzin] Argh scalastyle. 657ec18 [Marcelo Vanzin] Fix yarn history URL, app links. c3e0a82 [Marcelo Vanzin] Move app name to app info, more UI fixes. ce5ee5d [Marcelo Vanzin] Misc UI, test, style fixes. cbe8bba [Marcelo Vanzin] Attempt ID in listener event should be an option. 88b1de8 [Marcelo Vanzin] Add a test for apps with multiple attempts. 3245aa2 [Marcelo Vanzin] Make app attempts part of the history server model. 5fd5c6f [Marcelo Vanzin] Fix my broken rebase. 318525a [twinkle.sachdeva] SPARK-4705: 1) moved from directory structure to single file, as per the master branch. 2) Added the attempt id inside the SparkListenerApplicationStart, to make the info available independent of directory structure. 3) Changes in History Server to render the UI as per the snaphot II 6b2e521 [twinkle sachdeva] SPARK-4705 Incorporating the review comments regarding formatting, will do the rest of the changes after this 4c1fc26 [twinkle sachdeva] SPARK-4705 Incorporating the review comments regarding formatting, will do the rest of the changes after this 0eb7722 [twinkle sachdeva] SPARK-4705: Doing cherry-pick of fix into master
This change modifies the event logging listener to write the logs for different application
attempts to different files. The attempt ID is set by the scheduler backend, so as long
as the backend returns that ID to SparkContext, things should work. Currently, the
YARN backend does that.
The history server was also modified to model multiple attempts per application. Each
attempt has its own UI and a separate row in the listing table, so that users can look at
all the attempts separately. The UI "adapts" itself to avoid showing attempt-specific info
when all the applications being shown have a single attempt.