[SPARK-20430][SQL] Initialise RangeExec parameters in a driver side #17717

maropu · 2017-04-21T13:27:09Z

What changes were proposed in this pull request?

This pr initialised RangeExec parameters in a driver side.
In the current master, a query below throws NullPointerException;

sql("SET spark.sql.codegen.wholeStage=false")
sql("SELECT * FROM range(1)").show

17/04/20 17:11:05 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.NullPointerException
        at org.apache.spark.sql.execution.SparkPlan.sparkContext(SparkPlan.scala:54)
        at org.apache.spark.sql.execution.RangeExec.numSlices(basicPhysicalOperators.scala:343)
        at org.apache.spark.sql.execution.RangeExec$$anonfun$20.apply(basicPhysicalOperators.scala:506)
        at org.apache.spark.sql.execution.RangeExec$$anonfun$20.apply(basicPhysicalOperators.scala:505)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:844)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:844)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:108)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:320)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

How was this patch tested?

Added a test in DataFrameRangeSuite.

SparkQA · 2017-04-21T15:38:10Z

Test build #76032 has finished for PR 17717 at commit da833f2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2017-04-22T00:10:50Z

cc: @gatorsmile

rxin · 2017-04-22T05:32:44Z

sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala

@@ -1732,4 +1732,10 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
      .filter($"x1".isNotNull || !$"y".isin("a!"))
      .count
  }
+
+  test("SPARK-20430 Initialize Range parameters in a deriver side") {


also move this into dataframe range suite?

yea, will do

SparkQA · 2017-04-22T05:52:31Z

Test build #76056 has started for PR 17717 at commit 9b5bdc7.

rxin · 2017-04-22T05:57:51Z

LGTM pending Jenkins.

gatorsmile

LGTM

gatorsmile · 2017-04-22T07:11:37Z

retest this please

SparkQA · 2017-04-22T09:21:42Z

Test build #76058 has finished for PR 17717 at commit 9b5bdc7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

## What changes were proposed in this pull request? This pr initialised `RangeExec` parameters in a driver side. In the current master, a query below throws `NullPointerException`; ``` sql("SET spark.sql.codegen.wholeStage=false") sql("SELECT * FROM range(1)").show 17/04/20 17:11:05 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.lang.NullPointerException at org.apache.spark.sql.execution.SparkPlan.sparkContext(SparkPlan.scala:54) at org.apache.spark.sql.execution.RangeExec.numSlices(basicPhysicalOperators.scala:343) at org.apache.spark.sql.execution.RangeExec$$anonfun$20.apply(basicPhysicalOperators.scala:506) at org.apache.spark.sql.execution.RangeExec$$anonfun$20.apply(basicPhysicalOperators.scala:505) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:844) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:844) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:320) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ``` ## How was this patch tested? Added a test in `DataFrameRangeSuite`. Author: Takeshi Yamamuro <yamamuro@apache.org> Closes #17717 from maropu/SPARK-20430. (cherry picked from commit b3c572a) Signed-off-by: Xiao Li <gatorsmile@gmail.com>

gatorsmile · 2017-04-22T16:44:51Z

Thanks! Merging to master and 2.2

## What changes were proposed in this pull request? This pr initialised `RangeExec` parameters in a driver side. In the current master, a query below throws `NullPointerException`; ``` sql("SET spark.sql.codegen.wholeStage=false") sql("SELECT * FROM range(1)").show 17/04/20 17:11:05 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.lang.NullPointerException at org.apache.spark.sql.execution.SparkPlan.sparkContext(SparkPlan.scala:54) at org.apache.spark.sql.execution.RangeExec.numSlices(basicPhysicalOperators.scala:343) at org.apache.spark.sql.execution.RangeExec$$anonfun$20.apply(basicPhysicalOperators.scala:506) at org.apache.spark.sql.execution.RangeExec$$anonfun$20.apply(basicPhysicalOperators.scala:505) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:844) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:844) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:320) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ``` ## How was this patch tested? Added a test in `DataFrameRangeSuite`. Author: Takeshi Yamamuro <yamamuro@apache.org> Closes apache#17717 from maropu/SPARK-20430.

Initialize Range parameters in a driver side

da833f2

rxin reviewed Apr 22, 2017

View reviewed changes

Apply comments

9b5bdc7

gatorsmile approved these changes Apr 22, 2017

View reviewed changes

asfgit closed this in b3c572a Apr 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-20430][SQL] Initialise RangeExec parameters in a driver side #17717

[SPARK-20430][SQL] Initialise RangeExec parameters in a driver side #17717

maropu commented Apr 21, 2017 •

edited

Loading

SparkQA commented Apr 21, 2017

maropu commented Apr 22, 2017

rxin Apr 22, 2017

rxin Apr 22, 2017

maropu Apr 22, 2017

SparkQA commented Apr 22, 2017

rxin commented Apr 22, 2017

gatorsmile left a comment

gatorsmile commented Apr 22, 2017

SparkQA commented Apr 22, 2017

gatorsmile commented Apr 22, 2017

[SPARK-20430][SQL] Initialise RangeExec parameters in a driver side #17717

[SPARK-20430][SQL] Initialise RangeExec parameters in a driver side #17717

Conversation

maropu commented Apr 21, 2017 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Apr 21, 2017

maropu commented Apr 22, 2017

rxin Apr 22, 2017

Choose a reason for hiding this comment

rxin Apr 22, 2017

Choose a reason for hiding this comment

maropu Apr 22, 2017

Choose a reason for hiding this comment

SparkQA commented Apr 22, 2017

rxin commented Apr 22, 2017

gatorsmile left a comment

Choose a reason for hiding this comment

gatorsmile commented Apr 22, 2017

SparkQA commented Apr 22, 2017

gatorsmile commented Apr 22, 2017

maropu commented Apr 21, 2017 •

edited

Loading