-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-20946][SQL] simplify the config setting logic in SparkSession.getOrCreate #18172
Conversation
@@ -43,7 +43,6 @@ private[ml] object TreeTests extends SparkFunSuite { | |||
categoricalFeatures: Map[Int, Int], | |||
numClasses: Int): DataFrame = { | |||
val spark = SparkSession.builder() | |||
.master("local[2]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here we build SparkSession
with an existing spark context, so setting the master
is useless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great if we can check whether there are similar changes can be made.
val sc = SparkContext.getOrCreate(sparkConf) | ||
// maybe this is an existing SparkContext, update its SparkConf which maybe used | ||
// by SparkSession | ||
options.foreach { case (k, v) => sc.conf.set(k, v) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previously we set the given options
to the newly created SparkConf
, then create SparkContext
, then set the options
to sc.conf
again, in case the SparkContext.getOrCreate
returns an existing one. We can just create SparkConf
and use it to create SparkContext
, and then set options
to sc.conf
, so that we only need to set the conf once.
@@ -935,7 +929,6 @@ object SparkSession { | |||
} | |||
|
|||
session = new SparkSession(sparkContext, None, None, extensions) | |||
options.foreach { case (k, v) => session.sessionState.conf.setConfString(k, v) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When building SessionState
, we will merge spark conf to sql conf, which means we don't need to set options
to session.sessionState.conf
here, if the spark conf already contains options
.
@@ -899,21 +899,15 @@ object SparkSession { | |||
|
|||
// No active nor global default session. Create a new one. | |||
val sparkContext = userSuppliedContext.getOrElse { | |||
val sparkConf = new SparkConf() | |||
SparkContext.getOrCreate(sparkConf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in order to remove https://github.com/apache/spark/pull/18172/files#diff-d91c284798f1c98bf03a31855e26d71cL938 , here I change the behavior to also setting options
to sparkContext.conf
even if the sparkContext
is supplied by users. This change is safe as SparkSession.Builder.sparkContext
is private, and I checked all the callers to confirm that it's ok to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty SparkConf
lacking necessary setting such as spark.master
causes failure.
Test build #77628 has finished for PR 18172 at commit
|
@@ -753,6 +753,8 @@ object SparkSession { | |||
|
|||
private[this] val options = new scala.collection.mutable.HashMap[String, String] | |||
|
|||
private[this] var master: Option[String] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
master
and appName
are only useful when creating a new SparkContext
, so here I separate them from options
and only use it when creating new SparkContext
SparkContext.getOrCreate(sparkConf) | ||
} | ||
|
||
options.foreach { case (k, v) => sparkContext.conf.set(k, v) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in order to remove https://github.com/apache/spark/pull/18172/files#diff-d91c284798f1c98bf03a31855e26d71cL938 , here I change the behavior to also setting options
to sparkContext.conf
even if the sparkContext
is supplied by users. This change is safe as SparkSession.Builder.sparkContext
is private, and I checked all the callers to confirm that it's ok to do so.
Test build #77643 has finished for PR 18172 at commit
|
} | ||
options.foreach { case (k, v) => sparkContext.conf.set(k, v) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in order to remove https://github.com/apache/spark/pull/18172/files#diff-d91c284798f1c98bf03a31855e26d71cL938 , here I change the behavior to also setting options
to sparkContext.conf
even if the sparkContext
is supplied by users. This change is safe as SparkSession.Builder.sparkContext
is private, and I checked all the callers to confirm that it's ok to do so.
Test build #77648 has finished for PR 18172 at commit
|
} | ||
options.foreach { case (k, v) => sparkContext.conf.set(k, v) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in order to remove https://github.com/apache/spark/pull/18172/files#diff-d91c284798f1c98bf03a31855e26d71cL938 , here I change the behavior to also setting options
to sparkContext.conf
even if the sparkContext
is supplied by users. This change is safe as SparkSession.Builder.sparkContext
is private, and I checked all the callers to confirm that it's ok to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can add a comment to SparkSession.Builder.sparkContext
that we will modify its conf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea good idea
Test build #77652 has finished for PR 18172 at commit
|
CC @liancheng |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -43,7 +43,6 @@ private[ml] object TreeTests extends SparkFunSuite { | |||
categoricalFeatures: Map[Int, Int], | |||
numClasses: Int): DataFrame = { | |||
val spark = SparkSession.builder() | |||
.master("local[2]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great if we can check whether there are similar changes can be made.
LGTM except for one comment regarding the behavior change. |
Test build #77665 has finished for PR 18172 at commit
|
…getOrCreate ## What changes were proposed in this pull request? The current conf setting logic is a little complex and has duplication, this PR simplifies it. ## How was this patch tested? existing tests. Author: Wenchen Fan <wenchen@databricks.com> Closes #18172 from cloud-fan/session. (cherry picked from commit e11d90b) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
thanks for the review, merging to master/2.2! |
Reverting this because it breaks repl tests. |
What changes were proposed in this pull request?
The current conf setting logic is a little complex and has duplication, this PR simplifies it.
How was this patch tested?
existing tests.