-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-2165] spark on yarn: add support for setting maxAppAttempts in the ApplicationSubmissionContext #1279
Conversation
Can one of the admins verify this patch? |
/** | ||
* Set the max number of submission retries the Spark client will attempt | ||
* before giving up | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We haven't been adding specific routines to set the configs. The user can just set it using the existing SparkConf.set routines so I think we should remove this.
@@ -81,6 +81,10 @@ class Client(clientArgs: ClientArguments, hadoopConf: Configuration, spConf: Spa | |||
appContext.setQueue(args.amQueue) | |||
appContext.setAMContainerSpec(amContainer) | |||
appContext.setApplicationType("SPARK") | |||
sparkConf.getOption("spark.maxappattempts").map(_.toInt) match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets make the config name yarn specific. spark.yarn.maxappattempts.
Also can you update the yarn documentation to include the new config and description. lookin docs/running-on-yarn.md
Conflicts: docs/running-on-yarn.md
Jenkins, test this please. |
QA tests have started for PR 1279 at commit
|
QA tests have finished for PR 1279 at commit
|
Something must be wrong with the QA box as this patch doesn't add any classes. The test failure is unrelated to this patch also. |
@@ -81,6 +81,10 @@ class Client(clientArgs: ClientArguments, hadoopConf: Configuration, spConf: Spa | |||
appContext.setQueue(args.amQueue) | |||
appContext.setAMContainerSpec(amContainer) | |||
appContext.setApplicationType("SPARK") | |||
sparkConf.getOption("spark.yarn.maxappattempts").map(_.toInt) match { | |||
case Some(v) => appContext.setMaxAppAttempts(v) | |||
case None => logDebug("Not setting spark.yarn.maxappattempts. Cluster default will be used.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should change the debug statement to say something like spark.yarn.maxappattempts is not set, using cluster default...
Jenkins, test this please. |
QA tests have started for PR 1279 at commit
|
QA tests have finished for PR 1279 at commit
|
So one thing I just realized we need to update is in the ApplicationMaster we are using the maxAppAttempts to determine if its the last AM retry. Currently its just grabbing the cluster maximum. We need to update that to handle this config. |
Can one of the admins verify this patch? |
@@ -81,6 +81,10 @@ class Client(clientArgs: ClientArguments, hadoopConf: Configuration, spConf: Spa | |||
appContext.setQueue(args.amQueue) | |||
appContext.setAMContainerSpec(amContainer) | |||
appContext.setApplicationType("SPARK") | |||
sparkConf.getOption("spark.yarn.maxappattempts").map(_.toInt) match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer this to be called spark.yarn.applicationSubmissionMaxAttempts
or something to be more specific
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its not the number of application submission attempts though, this is how many times the resource manager will retry the application for you. to me application submission is done by the client, not a retry from the RM. what isn't clear about the current one so we can come up with something more clear? We could do something like ApplicationMasterMaxAttempts, am-maxattempts, rmappattempts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, our client only submits it once, but it's up to the RM how many attempts it makes to set up state for the application or launch the AM container etc. (The docs at https://hadoop.apache.org/docs/current/api/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.html#setMaxAppAttempts(int) are pretty misleading...)
Then maybe it makes sense to call this spark.yarn.application.rmMaxAttempts
or something? Or spark.yarn.amMaxAttempts
. Or actually even just spark.yarn.applicationMaxAttempts
sound alright now after considering the more verbose alternatives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah and I see where you got the submissions from. I'm fine with any of those also. We'll see if Kyle responds otherwise I might take this over. We should just make sure to document it we'll so user understands.
This has merge conflicts, @knusbaum can you rebase on master? |
Also note Kyle was intern at yahoo but has went back to school not sure if he will have time to continue this. We can wait to see if he responds |
Okay let's close this issue for now and he can reopen it if he has time. |
No description provided.