Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-21655][YARN] Support Kill CLI for Yarn mode #18897

Closed
wants to merge 4 commits into from

Conversation

yoonlee95
Copy link

What changes were proposed in this pull request?

Similar to how standalone and Mesos have the capability to safely shut down the spark application, there should be a way to safely shut down spark on Yarn mode. This will ensure a clean shutdown and unregistration from yarn.

This PR adds YARN support for --kill SUBMISSION_ID CLI

How was this patch tested?

Tested by running a word count job and killing it via the kill CLI.

  • Test condition
    • Application running on Client mode, cluster mode
    • different ACL condition for the client that is submitting the kill CLI
      - no view, modify ACL
      - only view ACL
      - only modify ACL
      - both view, modify ACL
    • spark.authentication true/false
    • using keytab

@jiangxb1987
Copy link
Contributor

@jerryshao Could you review this PR? Thanks!

@tgravescs
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Aug 10, 2017

Test build #80491 has finished for PR 18897 at commit af9d7e3.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Aug 10, 2017

have the capability to safely shut down the spark application

That's not true; that works just like "yarn kill". This is the code that does it in DriverRunner.scala, which is where the kill is finally processed:

  /** Terminate this driver (or prevent it from ever starting if not yet started) */
  private[worker] def kill(): Unit = {
    logInfo("Killing driver process!")
    killed = true
    synchronized {
      process.foreach { p =>
        val exitCode = Utils.terminateProcess(p, DRIVER_TERMINATE_TIMEOUT_MS)
        if (exitCode.isEmpty) {
          logWarning("Failed to terminate driver process: " + p +
              ". This process will likely be orphaned.")
        }
      }
    }
  }

This change seems also way more about creating some communication channel between arbitrary clients and the Spark AM. I'd like to see the bug and the PR explain that in way more detail, since the "kill" implementation doesn't really seem to be the meat of this change.

@tgravescs
Copy link
Contributor

I think the kill is to more cleanly shutdown on the yarn side of things. If you yarn kill an application on yarn it doesn't set the history appropriately, etc. Its also just nice to have one spot to go to do things, similar to mapred, pig, oozie, type commands.

I think we can definitely split this up into 2 prs, one for the rpc layer changes (which requires some lower level changes due to authentication with tokens) and then one for the kill command itself, does that sounds good @vanzin?

@vanzin
Copy link
Contributor

vanzin commented Aug 10, 2017

Yes, I'd really like a more thorough explanation of the RPC changes. Once that's sorted out, adding something like a "kill" command should be trivial.

@SparkQA
Copy link

SparkQA commented Aug 10, 2017

Test build #80494 has finished for PR 18897 at commit 26a92dc.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 10, 2017

Test build #80496 has finished for PR 18897 at commit f538483.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jerryshao
Copy link
Contributor

I think this RPC layer could also serve for SPARK-19143.

@tgravescs
Copy link
Contributor

rpc layer split off into https://issues.apache.org/jira/browse/SPARK-21737

@vanzin
Copy link
Contributor

vanzin commented Sep 15, 2017

Can we close this PR until #18978 is figured out?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants