Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-11016] Move RoaringBitmap to explicit Kryo serializer #1

Closed
wants to merge 1 commit into from

Conversation

drcrallen
Copy link

This patch may make rolling updates not possible.

As a side note, it is mentioned in https://issues.apache.org/jira/browse/SPARK-11016 the "proper" way to handle this is with a general Kryo serializer for Externizable. But until that's ready this is a workaround.

@drcrallen drcrallen changed the title Move RoaringBitmap to explicit Kryo serializer [SPARK-11016] Move RoaringBitmap to explicit Kryo serializer Oct 9, 2015
@drcrallen
Copy link
Author

Unit tests do not pass for me for spark in general. Don't have a good way of testing.

@drcrallen
Copy link
Author

- check spark-class location correctly *** FAILED ***
  "[cd spark-1*;  .]/bin/spark-class org..." did not equal "[ /mesos-home]/bin/spark-class org..." (MesosSchedulerBackendSuite.scala:102)

xdralex pushed a commit that referenced this pull request Oct 13, 2015
This PR is based on apache#4229, thanks prabeesh.

Closes apache#4229

Author: Prabeesh K <prabsmails@gmail.com>
Author: zsxwing <zsxwing@gmail.com>
Author: prabs <prabsmails@gmail.com>
Author: Prabeesh K <prabeesh.k@namshi.com>

Closes apache#7833 from zsxwing/pr4229 and squashes the following commits:

9570bec [zsxwing] Fix the variable name and check null in finally
4a9c79e [zsxwing] Fix pom.xml indentation
abf5f18 [zsxwing] Merge branch 'master' into pr4229
935615c [zsxwing] Fix the flaky MQTT tests
47278c5 [zsxwing] Include the project class files
478f844 [zsxwing] Add unpack
5f8a1d4 [zsxwing] Make the maven build generate the test jar for Python MQTT tests
734db99 [zsxwing] Merge branch 'master' into pr4229
126608a [Prabeesh K] address the comments
b90b709 [Prabeesh K] Merge pull request #1 from zsxwing/pr4229
d07f454 [zsxwing] Register StreamingListerner before starting StreamingContext; Revert unncessary changes; fix the python unit test
a6747cb [Prabeesh K] wait for starting the receiver before publishing data
87fc677 [Prabeesh K] address the comments:
97244ec [zsxwing] Make sbt build the assembly test jar for streaming mqtt
80474d1 [Prabeesh K] fix
1f0cfe9 [Prabeesh K] python style fix
e1ee016 [Prabeesh K] scala style fix
a5a8f9f [Prabeesh K] added Python test
9767d82 [Prabeesh K] implemented Python-friendly class
a11968b [Prabeesh K] fixed python style
795ec27 [Prabeesh K] address comments
ee387ae [Prabeesh K] Fix assembly jar location of mqtt-assembly
3f4df12 [Prabeesh K] updated version
b34c3c1 [prabs] adress comments
3aa7fff [prabs] Added Python streaming mqtt word count example
b7d42ff [prabs] Mqtt streaming support in Python

(cherry picked from commit 853809e)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
@drcrallen
Copy link
Author

@xdralex how was this PR referenced?

@drcrallen
Copy link
Author

FYI, this PR is not needed if apache#9243 is accepted instead

@drcrallen
Copy link
Author

Verify

ghost pushed a commit to dbtsai/spark that referenced this pull request Nov 18, 2015
Fix the serialization of RoaringBitmap with Kyro serializer

This PR came from metamx#1, thanks to drcrallen

Author: Davies Liu <davies@databricks.com>
Author: Charles Allen <charles@allen-net.com>

Closes apache#9748 from davies/SPARK-11016.
asfgit pushed a commit to apache/spark that referenced this pull request Nov 18, 2015
Fix the serialization of RoaringBitmap with Kyro serializer

This PR came from metamx#1, thanks to drcrallen

Author: Davies Liu <davies@databricks.com>
Author: Charles Allen <charles@allen-net.com>

Closes #9748 from davies/SPARK-11016.

(cherry picked from commit bf25f9b)
Signed-off-by: Davies Liu <davies.liu@gmail.com>
@scwf
Copy link

scwf commented Dec 3, 2015

seems this patch does not work on branch-1.5, i apply this patch to branch 1.5 and get this error:

com.esotericsoftware.kryo.KryoException: Buffer underflow.
Serialization trace:
org$apache$spark$scheduler$HighlyCompressedMapStatus$$emptyBlocks (org.apache.spark.scheduler.HighlyCompressedMapStatus)
        at com.esotericsoftware.kryo.io.Input.require(Input.java:156)
        at com.esotericsoftware.kryo.io.Input.skip(Input.java:131)
        at com.esotericsoftware.kryo.io.Input.skip(Input.java:264)
        at org.apache.spark.serializer.KryoInputDataInputBridge.skipBytes(KryoSerializer.scala:401)
        at org.roaringbitmap.RoaringArray.deserialize(RoaringArray.java:328)
        at org.roaringbitmap.RoaringBitmap.deserialize(RoaringBitmap.java:547)
        at org.apache.spark.serializer.KryoSerializer$$anon$1.read(KryoSerializer.scala:385)
        at org.apache.spark.serializer.KryoSerializer$$anon$1.read(KryoSerializer.scala:379)
        at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:651)
        at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
        at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
        at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:311)
        at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:97)
        at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:60)
        at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:51)
        at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:51)
        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
        at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:50)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

@drcrallen
Copy link
Author

@scwf #2 pulled it into 1.5.1 in this repo and I've been running it in test for a while without seeing this. Do you have a repeatable case where the error occurs?

@scwf
Copy link

scwf commented Dec 10, 2015

Hi @drcrallen i have fixed this in apache@3934562

@drcrallen
Copy link
Author

@scwf Thanks!

kiszk pushed a commit to kiszk/spark-gpu that referenced this pull request Dec 26, 2015
Fix the serialization of RoaringBitmap with Kyro serializer

This PR came from metamx/spark#1, thanks to drcrallen

Author: Davies Liu <davies@databricks.com>
Author: Charles Allen <charles@allen-net.com>

Closes #9748 from davies/SPARK-11016.
@drcrallen
Copy link
Author

This is already in spark, so closing this PR

@drcrallen drcrallen closed this Apr 26, 2016
drcrallen pushed a commit that referenced this pull request Aug 16, 2016
## What changes were proposed in this pull request?
This patch introduces SQLQueryTestSuite, a basic framework for end-to-end SQL test cases defined in spark/sql/core/src/test/resources/sql-tests. This is a more standard way to test SQL queries end-to-end in different open source database systems, because it is more manageable to work with files.

This is inspired by HiveCompatibilitySuite, but simplified for general Spark SQL tests. Once this is merged, I can work towards porting SQLQuerySuite over, and eventually also move the existing HiveCompatibilitySuite to use this framework.

Unlike HiveCompatibilitySuite, SQLQueryTestSuite compares both the output schema and the output data (in string form).

When there is a mismatch, the error message looks like the following:

```
[info] - blacklist.sql !!! IGNORED !!!
[info] - number-format.sql *** FAILED *** (2 seconds, 405 milliseconds)
[info]   Expected "...147483648	-214748364[8]", but got "...147483648	-214748364[9]" Result should match for query #1 (SQLQueryTestSuite.scala:171)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:495)
[info]   at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
[info]   at org.scalatest.Assertions$class.assertResult(Assertions.scala:1171)
```

## How was this patch tested?
This is a test infrastructure change.

Author: petermaxlee <petermaxlee@gmail.com>

Closes apache#14472 from petermaxlee/SPARK-16866.
drcrallen pushed a commit that referenced this pull request Aug 16, 2016
## What changes were proposed in this pull request?
This patch introduces SQLQueryTestSuite, a basic framework for end-to-end SQL test cases defined in spark/sql/core/src/test/resources/sql-tests. This is a more standard way to test SQL queries end-to-end in different open source database systems, because it is more manageable to work with files.

This is inspired by HiveCompatibilitySuite, but simplified for general Spark SQL tests. Once this is merged, I can work towards porting SQLQuerySuite over, and eventually also move the existing HiveCompatibilitySuite to use this framework.

Unlike HiveCompatibilitySuite, SQLQueryTestSuite compares both the output schema and the output data (in string form).

When there is a mismatch, the error message looks like the following:

```
[info] - blacklist.sql !!! IGNORED !!!
[info] - number-format.sql *** FAILED *** (2 seconds, 405 milliseconds)
[info]   Expected "...147483648	-214748364[8]", but got "...147483648	-214748364[9]" Result should match for query #1 (SQLQueryTestSuite.scala:171)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:495)
[info]   at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
[info]   at org.scalatest.Assertions$class.assertResult(Assertions.scala:1171)
```

## How was this patch tested?
This is a test infrastructure change.

Author: petermaxlee <petermaxlee@gmail.com>

Closes apache#14472 from petermaxlee/SPARK-16866.

(cherry picked from commit b9f8a11)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants