-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPARK-3926 [CORE] Result of JavaRDD.collectAsMap() is not Serializable #2805
Conversation
QA tests have started for PR 2805 at commit
|
QA tests have finished for PR 2805 at commit
|
Test PASSed. |
I think it's also used in SparkSQL's Java API |
QA tests have started for PR 2805 at commit
|
QA tests have finished for PR 2805 at commit
|
Test FAILed. |
QA tests have started for PR 2805 at commit
|
QA tests have finished for PR 2805 at commit
|
Test FAILed. |
QA tests have started for PR 2805 at commit
|
QA tests have finished for PR 2805 at commit
|
Test PASSed. |
I can confirm that this seems to have fixed the serialization issue; here's my test-case: import org.apache.spark.api.java._
val pairs = sc.parallelize(1 to 10).map(x => (x, x))
val map = new JavaPairRDD(pairs).collectAsMap()
def ser(a: AnyRef) =
(new java.io.ObjectOutputStream(new java.io.ByteArrayOutputStream())).writeObject(a)
ser(map) It looks like there's one more case in There still might be some other corner-cases with serializability of results that we haven't tested yet. The result of I mentioned this over on JIRA, but for GitHub readers: I've opened an issue to fix this upstream in Scala: https://issues.scala-lang.org/browse/SI-8911 I'll merge this now with my fixup. Thanks! |
@@ -587,4 +587,11 @@ trait JavaRDDLike[T, This <: JavaRDDLike[T, This]] extends Serializable { | |||
rdd.foreachAsync(x => f.call(x)) | |||
} | |||
|
|||
private[java] def mapAsSerializableJavaMap[A, B](underlying: collection.Map[A, B]) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that these methods should live in JavaUtils
rather than in this trait so that they can be used by the Streaming and SQL Java APIs, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, private[java]
methods here will become public from Java's POV. We can't use Scala package-private access modifiers when implementing Java API classes.
Make JavaPairRDD.collectAsMap result Serializable since Java Maps generally are Author: Sean Owen <sowen@cloudera.com> Closes #2805 from srowen/SPARK-3926 and squashes the following commits: ecb78ee [Sean Owen] Fix conflict between java.io.Serializable and use of Scala's Serializable f4717f9 [Sean Owen] Oops, fix compile problem ae1b36f [Sean Owen] Expand to cover Maps returned from other Java API methods as well 51c26c2 [Sean Owen] Make JavaPairRDD.collectAsMap result Serializable since Java Maps generally are
I made those minor fixups and committed this as f406a83. I also cherry-picked it into |
Yes, all SGTM. |
Make JavaPairRDD.collectAsMap result Serializable since Java Maps generally are