-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-14596][SQL] Remove not used SqlNewHadoopRDD and some more unused imports #12354
Conversation
Test build #55707 has finished for PR 12354 at commit
|
cc @cloud-fan |
There is a |
Thanks! I just renamed. |
Test build #55708 has finished for PR 12354 at commit
|
Test build #55709 has finished for PR 12354 at commit
|
@@ -220,8 +220,8 @@ class HadoopRDD[K, V]( | |||
|
|||
// Sets the thread local variable for the file's name | |||
split.inputSplit.value match { | |||
case fs: FileSplit => SqlNewHadoopRDDState.setInputFileName(fs.getPath.toString) | |||
case _ => SqlNewHadoopRDDState.unsetInputFileName() | |||
case fs: FileSplit => FileScanRDDState.setInputFileName(fs.getPath.toString) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liancheng @yhuai Do you still use HadoopRDD
to read data source relation? If not, I think we don't need to update file name here anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is used by HiveTableScan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan I see. This is used in https://github.com/apache/spark/blob/d6dc12ef0146ae409834c78737c116050961f350/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InputFileName.scala during code generation.
Should I remove the change of file name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see, how about renaming it to InputFileNameHolder
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Thanks.
@@ -20,8 +20,7 @@ package org.apache.spark.rdd | |||
import org.apache.spark.unsafe.types.UTF8String | |||
|
|||
/** | |||
* State for SqlNewHadoopRDD objects. This is split this way because of the package splits. | |||
* TODO: Move/Combine this with org.apache.spark.sql.datasources.SqlNewHadoopRDD | |||
* State for FileScanRDD objects. This is split this way because of the package splits. | |||
*/ | |||
private[spark] object SqlNewHadoopRDDState { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SqlNewHadoopRDDState
is definitely not a good name here, how about InputFileNameHolder
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Thanks!
@cloud-fan The commits I just submitted include the changes for MiMa tests and some comments. |
Test build #55772 has finished for PR 12354 at commit
|
Test build #55769 has finished for PR 12354 at commit
|
retest this please |
Test build #55784 has finished for PR 12354 at commit
|
thanks! merging to master! |
What changes were proposed in this pull request?
Old
HadoopFsRelation
API includesbuildInternalScan()
which usesSqlNewHadoopRDD
inParquetRelation
.Because now the old API is removed,
SqlNewHadoopRDD
is not used anymore.So, this PR removes
SqlNewHadoopRDD
and several unused imports.This was discussed in #12326.
How was this patch tested?
Several related existing unit tests and
sbt scalastyle
.