Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-2165] spark on yarn: add support for setting maxAppAttempts in the ApplicationSubmissionContext #1279

Closed
wants to merge 11 commits into from

Conversation

knusbaum
Copy link

@knusbaum knusbaum commented Jul 1, 2014

No description provided.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

/**
* Set the max number of submission retries the Spark client will attempt
* before giving up
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't been adding specific routines to set the configs. The user can just set it using the existing SparkConf.set routines so I think we should remove this.

@@ -81,6 +81,10 @@ class Client(clientArgs: ClientArguments, hadoopConf: Configuration, spConf: Spa
appContext.setQueue(args.amQueue)
appContext.setAMContainerSpec(amContainer)
appContext.setApplicationType("SPARK")
sparkConf.getOption("spark.maxappattempts").map(_.toInt) match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets make the config name yarn specific. spark.yarn.maxappattempts.

Also can you update the yarn documentation to include the new config and description. lookin docs/running-on-yarn.md

@tgravescs
Copy link
Contributor

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Aug 18, 2014

QA tests have started for PR 1279 at commit f848797.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 18, 2014

QA tests have finished for PR 1279 at commit f848797.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class OutputFaker(output: Seq[Attribute], child: SparkPlan) extends SparkPlan
    • implicit class LogicalPlanHacks(s: SchemaRDD)
    • implicit class PhysicalPlanHacks(originalPlan: SparkPlan)
    • class FakeParquetSerDe extends SerDe

@tgravescs
Copy link
Contributor

Something must be wrong with the QA box as this patch doesn't add any classes. The test failure is unrelated to this patch also.

@@ -81,6 +81,10 @@ class Client(clientArgs: ClientArguments, hadoopConf: Configuration, spConf: Spa
appContext.setQueue(args.amQueue)
appContext.setAMContainerSpec(amContainer)
appContext.setApplicationType("SPARK")
sparkConf.getOption("spark.yarn.maxappattempts").map(_.toInt) match {
case Some(v) => appContext.setMaxAppAttempts(v)
case None => logDebug("Not setting spark.yarn.maxappattempts. Cluster default will be used.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should change the debug statement to say something like spark.yarn.maxappattempts is not set, using cluster default...

@tgravescs
Copy link
Contributor

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Aug 19, 2014

QA tests have started for PR 1279 at commit f848797.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 19, 2014

QA tests have finished for PR 1279 at commit f848797.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tgravescs
Copy link
Contributor

So one thing I just realized we need to update is in the ApplicationMaster we are using the maxAppAttempts to determine if its the last AM retry. Currently its just grabbing the cluster maximum. We need to update that to handle this config.

@SparkQA
Copy link

SparkQA commented Sep 5, 2014

Can one of the admins verify this patch?

@@ -81,6 +81,10 @@ class Client(clientArgs: ClientArguments, hadoopConf: Configuration, spConf: Spa
appContext.setQueue(args.amQueue)
appContext.setAMContainerSpec(amContainer)
appContext.setApplicationType("SPARK")
sparkConf.getOption("spark.yarn.maxappattempts").map(_.toInt) match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer this to be called spark.yarn.applicationSubmissionMaxAttempts or something to be more specific

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its not the number of application submission attempts though, this is how many times the resource manager will retry the application for you. to me application submission is done by the client, not a retry from the RM. what isn't clear about the current one so we can come up with something more clear? We could do something like ApplicationMasterMaxAttempts, am-maxattempts, rmappattempts

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, our client only submits it once, but it's up to the RM how many attempts it makes to set up state for the application or launch the AM container etc. (The docs at https://hadoop.apache.org/docs/current/api/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.html#setMaxAppAttempts(int) are pretty misleading...)

Then maybe it makes sense to call this spark.yarn.application.rmMaxAttempts or something? Or spark.yarn.amMaxAttempts. Or actually even just spark.yarn.applicationMaxAttempts sound alright now after considering the more verbose alternatives.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah and I see where you got the submissions from. I'm fine with any of those also. We'll see if Kyle responds otherwise I might take this over. We should just make sure to document it we'll so user understands.

@andrewor14
Copy link
Contributor

This has merge conflicts, @knusbaum can you rebase on master?

@tgravescs
Copy link
Contributor

Also note Kyle was intern at yahoo but has went back to school not sure if he will have time to continue this. We can wait to see if he responds

@pwendell
Copy link
Contributor

Okay let's close this issue for now and he can reopen it if he has time.

@asfgit asfgit closed this in f73b56f Nov 10, 2014
@WangTaoTheTonic
Copy link
Contributor

Looks like Kyle @knusbaum is busy, I will take this over if everyone is ok with it.
#3878

sunchao pushed a commit to sunchao/spark that referenced this pull request Dec 8, 2021
wangyum pushed a commit that referenced this pull request May 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants