Skip to content

Commit

Permalink
[SPARK-17808][PYSPARK] Upgraded version of Pyrolite to 4.13
Browse files Browse the repository at this point in the history
## What changes were proposed in this pull request?
Upgraded to a newer version of Pyrolite which supports serialization of a BinaryType StructField for PySpark.SQL

## How was this patch tested?
Added a unit test which fails with a raised ValueError when using the previous version of Pyrolite 4.9 and Python3

Author: Bryan Cutler <cutlerb@gmail.com>

Closes #15386 from BryanCutler/pyrolite-upgrade-SPARK-17808.

(cherry picked from commit 658c714)
Signed-off-by: Sean Owen <sowen@cloudera.com>
  • Loading branch information
BryanCutler authored and srowen committed Oct 12, 2016
1 parent f12b74c commit 4dcbde4
Show file tree
Hide file tree
Showing 7 changed files with 14 additions and 6 deletions.
2 changes: 1 addition & 1 deletion core/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -316,7 +316,7 @@
<dependency>
<groupId>net.razorvine</groupId>
<artifactId>pyrolite</artifactId>
<version>4.9</version>
<version>4.13</version>
<exclusions>
<exclusion>
<groupId>net.razorvine</groupId>
Expand Down
2 changes: 1 addition & 1 deletion dev/deps/spark-deps-hadoop-2.2
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ pmml-model-1.2.15.jar
pmml-schema-1.2.15.jar
protobuf-java-2.5.0.jar
py4j-0.10.3.jar
pyrolite-4.9.jar
pyrolite-4.13.jar
scala-compiler-2.11.8.jar
scala-library-2.11.8.jar
scala-parser-combinators_2.11-1.0.4.jar
Expand Down
2 changes: 1 addition & 1 deletion dev/deps/spark-deps-hadoop-2.3
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,7 @@ pmml-model-1.2.15.jar
pmml-schema-1.2.15.jar
protobuf-java-2.5.0.jar
py4j-0.10.3.jar
pyrolite-4.9.jar
pyrolite-4.13.jar
scala-compiler-2.11.8.jar
scala-library-2.11.8.jar
scala-parser-combinators_2.11-1.0.4.jar
Expand Down
2 changes: 1 addition & 1 deletion dev/deps/spark-deps-hadoop-2.4
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,7 @@ pmml-model-1.2.15.jar
pmml-schema-1.2.15.jar
protobuf-java-2.5.0.jar
py4j-0.10.3.jar
pyrolite-4.9.jar
pyrolite-4.13.jar
scala-compiler-2.11.8.jar
scala-library-2.11.8.jar
scala-parser-combinators_2.11-1.0.4.jar
Expand Down
2 changes: 1 addition & 1 deletion dev/deps/spark-deps-hadoop-2.6
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,7 @@ pmml-model-1.2.15.jar
pmml-schema-1.2.15.jar
protobuf-java-2.5.0.jar
py4j-0.10.3.jar
pyrolite-4.9.jar
pyrolite-4.13.jar
scala-compiler-2.11.8.jar
scala-library-2.11.8.jar
scala-parser-combinators_2.11-1.0.4.jar
Expand Down
2 changes: 1 addition & 1 deletion dev/deps/spark-deps-hadoop-2.7
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ pmml-model-1.2.15.jar
pmml-schema-1.2.15.jar
protobuf-java-2.5.0.jar
py4j-0.10.3.jar
pyrolite-4.9.jar
pyrolite-4.13.jar
scala-compiler-2.11.8.jar
scala-library-2.11.8.jar
scala-parser-combinators_2.11-1.0.4.jar
Expand Down
8 changes: 8 additions & 0 deletions python/pyspark/sql/tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -1703,6 +1703,14 @@ def test_read_text_file_list(self):
count = df.count()
self.assertEquals(count, 4)

def test_BinaryType_serialization(self):
# Pyrolite version <= 4.9 could not serialize BinaryType with Python3 SPARK-17808
schema = StructType([StructField('mybytes', BinaryType())])
data = [[bytearray(b'here is my data')],
[bytearray(b'and here is some more')]]
df = self.spark.createDataFrame(data, schema=schema)
df.collect()


class HiveSparkSubmitTests(SparkSubmitTests):

Expand Down

0 comments on commit 4dcbde4

Please sign in to comment.