-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-2075][Core] Make the compiler generate same bytes code for Hadoop 1.+ and Hadoop 2.+ #3740
Changes from 3 commits
fa40db0
ca03559
734bac9
e4ad8b5
39d9df2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,6 +19,8 @@ package org.apache.spark.input | |
|
||
import java.io.{ByteArrayInputStream, ByteArrayOutputStream, DataInputStream, DataOutputStream} | ||
|
||
import org.apache.spark.deploy.SparkHadoopUtil | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. import is out of order here , but i can fix it when i commit this. |
||
|
||
import scala.collection.JavaConversions._ | ||
|
||
import com.google.common.io.ByteStreams | ||
|
@@ -145,7 +147,8 @@ class PortableDataStream( | |
|
||
private val confBytes = { | ||
val baos = new ByteArrayOutputStream() | ||
context.getConfiguration.write(new DataOutputStream(baos)) | ||
SparkHadoopUtil.get.getConfigurationFromJobContext(context). | ||
write(new DataOutputStream(baos)) | ||
baos.toByteArray | ||
} | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1174,15 +1174,34 @@ abstract class RDD[T: ClassTag]( | |
* Save this RDD as a text file, using string representations of elements. | ||
*/ | ||
def saveAsTextFile(path: String) { | ||
this.map(x => (NullWritable.get(), new Text(x.toString))) | ||
// https://issues.apache.org/jira/browse/SPARK-2075 | ||
// NullWritable is a Comparable rather than Comparable[NullWritable] in Hadoop 1.+, | ||
// so the compiler cannot find an implicit Ordering for it and will use the default `null`. | ||
// It will generate different anonymous classes for `saveAsTextFile` in Hadoop 1.+ and | ||
// Hadoop 2.+. Therefore, here we provide an Ordering for NullWritable so that the compiler | ||
// will generate same bytecode. | ||
val nullWritableOrdering = new Ordering[NullWritable] { | ||
override def compare(x: NullWritable, y: NullWritable): Int = 0 | ||
} | ||
val nullWritableClassTag = implicitly[ClassTag[NullWritable]] | ||
val textClassTag = implicitly[ClassTag[Text]] | ||
val r = this.map(x => (NullWritable.get(), new Text(x.toString))) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. here can be reused too |
||
RDD.rddToPairRDDFunctions(r)(nullWritableClassTag, textClassTag, nullWritableOrdering) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we can just pass null in for nullWritableOrdering, can't we? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, we can, since |
||
.saveAsHadoopFile[TextOutputFormat[NullWritable, Text]](path) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the problem here that while compiling Hadoop 2, the compiler chooses to specify the Ordering on the implicit rddToPairRDDFunctions, while in Hadoop 1 it instead uses the default method ( I wonder if a more explicit solution, like the introduction of a conversion to PairRDDFunctions which takes an Ordering, is warranted for these cases. e.g.: this.map(x => (NullWritable.get(), new Text(x.toString)))
.toPairRDD(nullWritableOrdering)
.saveAsHadoopFile[TextOutputFormat[NullWritable, Text]](path) This would be less magical in why the definition of an implicit Ordering changes bytecode. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right. Explicit solution is better for such tricky issue. |
||
} | ||
|
||
/** | ||
* Save this RDD as a compressed text file, using string representations of elements. | ||
*/ | ||
def saveAsTextFile(path: String, codec: Class[_ <: CompressionCodec]) { | ||
this.map(x => (NullWritable.get(), new Text(x.toString))) | ||
// https://issues.apache.org/jira/browse/SPARK-2075 | ||
val nullWritableOrdering = new Ordering[NullWritable] { | ||
override def compare(x: NullWritable, y: NullWritable): Int = 0 | ||
} | ||
val nullWritableClassTag = implicitly[ClassTag[NullWritable]] | ||
val textClassTag = implicitly[ClassTag[Text]] | ||
val r = this.map(x => (NullWritable.get(), new Text(x.toString))) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just noticed we can reuse the text array here to reduce gc. anyway that's not part of this PR - would you be willing to submit a new PR for that? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK. I'll send another PR for that after this one is merged. |
||
RDD.rddToPairRDDFunctions(r)(nullWritableClassTag, textClassTag, nullWritableOrdering) | ||
.saveAsHadoopFile[TextOutputFormat[NullWritable, Text]](path, codec) | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to indent here - i can fix it when i commit