-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-30780][SQL] Empty LocalTableScan should use RDD without partitions #27530
Conversation
Test build #118179 has finished for PR 27530 at commit
|
|
||
private lazy val rdd = sqlContext.sparkContext.parallelize(unsafeRows, numParallelism) | ||
@transient private lazy val rdd: RDD[InternalRow] = { | ||
if (rows.isEmpty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, unsafeRows.isEmpty
? Otherwise I have to look at the difference between unsafeRows
and rows
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way we avoid materializing the unsafeRows
lazy val.
sqlContext.sparkContext.emptyRDD | ||
} else { | ||
val numSlices = math.min(unsafeRows.length, sqlContext.sparkContext.defaultParallelism) | ||
sqlContext.sparkContext.parallelize(unsafeRows, numSlices) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just in case, does it make sense to put this code (handling empty rows) inside of parallelize
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parallelize
need to respect the numSlices
parameter, even if the data is empty.
Test build #118260 has finished for PR 27530 at commit
|
Merged to master, and branch-3.0 to consistent with #27400. |
…ions ### What changes were proposed in this pull request? This is a small follow-up for #27400. This PR makes an empty `LocalTableScanExec` return an `RDD` without partitions. ### Why are the changes needed? It is a bit unexpected that the RDD contains partitions if there is not work to do. It also can save a bit of work when this is used in a more complex plan. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Added test to `SparkPlanSuite`. Closes #27530 from hvanhovell/SPARK-30780. Authored-by: herman <herman@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org> (cherry picked from commit b25359c) Signed-off-by: HyukjinKwon <gurwls223@apache.org>
…ions ### What changes were proposed in this pull request? This is a small follow-up for apache#27400. This PR makes an empty `LocalTableScanExec` return an `RDD` without partitions. ### Why are the changes needed? It is a bit unexpected that the RDD contains partitions if there is not work to do. It also can save a bit of work when this is used in a more complex plan. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Added test to `SparkPlanSuite`. Closes apache#27530 from hvanhovell/SPARK-30780. Authored-by: herman <herman@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
What changes were proposed in this pull request?
This is a small follow-up for #27400. This PR makes an empty
LocalTableScanExec
return anRDD
without partitions.Why are the changes needed?
It is a bit unexpected that the RDD contains partitions if there is not work to do. It also can save a bit of work when this is used in a more complex plan.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Added test to
SparkPlanSuite
.