-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-2075][Core] Make the compiler generate same bytes code for Hadoop 1.+ and Hadoop 2.+ #3740
Changes from all commits
fa40db0
ca03559
734bac9
e4ad8b5
39d9df2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1174,15 +1174,32 @@ abstract class RDD[T: ClassTag]( | |
* Save this RDD as a text file, using string representations of elements. | ||
*/ | ||
def saveAsTextFile(path: String) { | ||
this.map(x => (NullWritable.get(), new Text(x.toString))) | ||
// https://issues.apache.org/jira/browse/SPARK-2075 | ||
// | ||
// NullWritable is a `Comparable` in Hadoop 1.+, so the compiler cannot find an implicit | ||
// Ordering for it and will use the default `null`. However, it's a `Comparable[NullWritable]` | ||
// in Hadoop 2.+, so the compiler will call the implicit `Ordering.ordered` method to create an | ||
// Ordering for `NullWritable`. That's why the compiler will generate different anonymous | ||
// classes for `saveAsTextFile` in Hadoop 1.+ and Hadoop 2.+. | ||
// | ||
// Therefore, here we provide an explicit Ordering `null` to make sure the compiler generate | ||
// same bytecodes for `saveAsTextFile`. | ||
val nullWritableClassTag = implicitly[ClassTag[NullWritable]] | ||
val textClassTag = implicitly[ClassTag[Text]] | ||
val r = this.map(x => (NullWritable.get(), new Text(x.toString))) | ||
RDD.rddToPairRDDFunctions(r)(nullWritableClassTag, textClassTag, null) | ||
.saveAsHadoopFile[TextOutputFormat[NullWritable, Text]](path) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the problem here that while compiling Hadoop 2, the compiler chooses to specify the Ordering on the implicit rddToPairRDDFunctions, while in Hadoop 1 it instead uses the default method ( I wonder if a more explicit solution, like the introduction of a conversion to PairRDDFunctions which takes an Ordering, is warranted for these cases. e.g.: this.map(x => (NullWritable.get(), new Text(x.toString)))
.toPairRDD(nullWritableOrdering)
.saveAsHadoopFile[TextOutputFormat[NullWritable, Text]](path) This would be less magical in why the definition of an implicit Ordering changes bytecode. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right. Explicit solution is better for such tricky issue. |
||
} | ||
|
||
/** | ||
* Save this RDD as a compressed text file, using string representations of elements. | ||
*/ | ||
def saveAsTextFile(path: String, codec: Class[_ <: CompressionCodec]) { | ||
this.map(x => (NullWritable.get(), new Text(x.toString))) | ||
// https://issues.apache.org/jira/browse/SPARK-2075 | ||
val nullWritableClassTag = implicitly[ClassTag[NullWritable]] | ||
val textClassTag = implicitly[ClassTag[Text]] | ||
val r = this.map(x => (NullWritable.get(), new Text(x.toString))) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just noticed we can reuse the text array here to reduce gc. anyway that's not part of this PR - would you be willing to submit a new PR for that? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK. I'll send another PR for that after this one is merged. |
||
RDD.rddToPairRDDFunctions(r)(nullWritableClassTag, textClassTag, null) | ||
.saveAsHadoopFile[TextOutputFormat[NullWritable, Text]](path, codec) | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here can be reused too