Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-25258][SPARK-23131][SPARK-25176][BUILD] Upgrade Kryo to 4.0.2 #22179

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions dev/deps/spark-deps-hadoop-2.6
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,8 @@ breeze_2.11-0.13.2.jar
calcite-avatica-1.2.0-incubating.jar
calcite-core-1.2.0-incubating.jar
calcite-linq4j-1.2.0-incubating.jar
chill-java-0.8.4.jar
chill_2.11-0.8.4.jar
chill-java-0.9.3.jar
chill_2.11-0.9.3.jar
commons-beanutils-1.7.0.jar
commons-beanutils-core-1.8.0.jar
commons-cli-1.2.jar
Expand Down Expand Up @@ -130,7 +130,7 @@ jsr305-1.3.9.jar
jta-1.1.jar
jtransforms-2.4.0.jar
jul-to-slf4j-1.7.16.jar
kryo-shaded-3.0.3.jar
kryo-shaded-4.0.2.jar
kubernetes-client-3.0.0.jar
kubernetes-model-2.0.0.jar
leveldbjni-all-1.8.jar
Expand All @@ -149,7 +149,7 @@ metrics-jvm-3.1.5.jar
minlog-1.3.0.jar
netty-3.9.9.Final.jar
netty-all-4.1.17.Final.jar
objenesis-2.1.jar
objenesis-2.5.1.jar
okhttp-3.8.1.jar
okio-1.13.0.jar
opencsv-2.3.jar
Expand Down
8 changes: 4 additions & 4 deletions dev/deps/spark-deps-hadoop-2.7
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,8 @@ breeze_2.11-0.13.2.jar
calcite-avatica-1.2.0-incubating.jar
calcite-core-1.2.0-incubating.jar
calcite-linq4j-1.2.0-incubating.jar
chill-java-0.8.4.jar
chill_2.11-0.8.4.jar
chill-java-0.9.3.jar
chill_2.11-0.9.3.jar
commons-beanutils-1.7.0.jar
commons-beanutils-core-1.8.0.jar
commons-cli-1.2.jar
Expand Down Expand Up @@ -132,7 +132,7 @@ jsr305-1.3.9.jar
jta-1.1.jar
jtransforms-2.4.0.jar
jul-to-slf4j-1.7.16.jar
kryo-shaded-3.0.3.jar
kryo-shaded-4.0.2.jar
kubernetes-client-3.0.0.jar
kubernetes-model-2.0.0.jar
leveldbjni-all-1.8.jar
Expand All @@ -151,7 +151,7 @@ metrics-jvm-3.1.5.jar
minlog-1.3.0.jar
netty-3.9.9.Final.jar
netty-all-4.1.17.Final.jar
objenesis-2.1.jar
objenesis-2.5.1.jar
okhttp-3.8.1.jar
okio-1.13.0.jar
opencsv-2.3.jar
Expand Down
8 changes: 4 additions & 4 deletions dev/deps/spark-deps-hadoop-3.1
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@ breeze_2.11-0.13.2.jar
calcite-avatica-1.2.0-incubating.jar
calcite-core-1.2.0-incubating.jar
calcite-linq4j-1.2.0-incubating.jar
chill-java-0.8.4.jar
chill_2.11-0.8.4.jar
chill-java-0.9.3.jar
chill_2.11-0.9.3.jar
commons-beanutils-1.9.3.jar
commons-cli-1.2.jar
commons-codec-1.10.jar
Expand Down Expand Up @@ -146,7 +146,7 @@ kerby-config-1.0.1.jar
kerby-pkix-1.0.1.jar
kerby-util-1.0.1.jar
kerby-xdr-1.0.1.jar
kryo-shaded-3.0.3.jar
kryo-shaded-4.0.2.jar
kubernetes-client-3.0.0.jar
kubernetes-model-2.0.0.jar
leveldbjni-all-1.8.jar
Expand All @@ -167,7 +167,7 @@ mssql-jdbc-6.2.1.jre7.jar
netty-3.9.9.Final.jar
netty-all-4.1.17.Final.jar
nimbus-jose-jwt-4.41.1.jar
objenesis-2.1.jar
objenesis-2.5.1.jar
okhttp-2.7.5.jar
okhttp-3.8.1.jar
okio-1.13.0.jar
Expand Down
2 changes: 1 addition & 1 deletion docs/tuning.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ in your operations) and performance. It provides two serialization libraries:
Java serialization is flexible but often quite slow, and leads to large
serialized formats for many classes.
* [Kryo serialization](https://github.com/EsotericSoftware/kryo): Spark can also use
the Kryo library (version 2) to serialize objects more quickly. Kryo is significantly
the Kryo library (version 4) to serialize objects more quickly. Kryo is significantly
faster and more compact than Java serialization (often as much as 10x), but does not support all
`Serializable` types and requires you to *register* the classes you'll use in the program in advance
for best performance.
Expand Down
6 changes: 5 additions & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@
<hive.parquet.version>1.6.0</hive.parquet.version>
<jetty.version>9.3.24.v20180605</jetty.version>
<javaxservlet.version>3.1.0</javaxservlet.version>
<chill.version>0.8.4</chill.version>
<chill.version>0.9.3</chill.version>
<ivy.version>2.4.0</ivy.version>
<oro.version>2.0.8</oro.version>
<codahale.metrics.version>3.1.5</codahale.metrics.version>
Expand Down Expand Up @@ -1770,6 +1770,10 @@
<groupId>org.apache.hive</groupId>
<artifactId>hive-storage-api</artifactId>
</exclusion>
<exclusion>
<groupId> com.esotericsoftware</groupId>
<artifactId>kryo-shaded</artifactId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC @wangyum yes good point. ORC also uses kryo-shaded, and uses 3.0.3. In theory that could cause a break, but the tests pass. That's a positive sign but not bulletproof. @dongjoon-hyun do you have any insight into how ORC uses kryo? Is it a code path that wouldn't matter to Spark?

Copy link
Member

@dongjoon-hyun dongjoon-hyun Aug 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pinging me, @srowen . ORC uses Kryo only for writing/reading one ORC configuration, orc.kryo.sarg. The followings are the Spark's indirect code path. I guess Kryo provides forward compatibility at least, but I'll take a look at this PR today.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen In short, the current Spark always uses the same Kryo version for read/write SearchArgument and it's used only on runtime.

  1. Old OrcFileFormat always uses org.spark-project.hive:hive-exec:1.2.1.spark2 which uses the shaded one in hive-exec.

    • com.esotericsoftware.kryo:kryo:2.21.
  2. New OrcFileFormat uses org.apache.orc which uses the one provided by Spark.

    • com.esotericsoftware:kryo-shaded:3.0.3 (All Spark/Orc/Hive uses this version for now)
  3. New OrcFileFormat (in this PR) uses org.apache.orc which uses the one provided by Spark.

    • com.esotericsoftware:kryo-shaded:4.0.2

So, (1) is unchanged by this PR. (2) and (3) also doesn't use a mixed version of Kryo. So, it should be fine because Apache Spark doesn't allow a mixed Spark version(master and executor). BTW, during investigation, there was some performance issue in createFilter. I'll file a new JIRA for that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my question is whether Orc complied against kryo-shaded 3.x necessarily works with kryo-shaded 4.x, because there were API and behavior changes. Our current tests don't seem to surface any such problems, but who knows if there's something they don't test.

I agree that our distribution won't include two versions, but the issue I'm wondering about is whether Orc will like the one later version that it gets bundled with.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I checked that, @srowen . org.apache.orc only uses Kryo constructor, writeObject, and readObject from kryo-shaded library. There is no change for them.

WRITE

(new Kryo()).writeObject(out, sarg);

READ

... = (new Kryo()).readObject(new Input(sargBytes), SearchArgumentImpl.class);

Copy link
Member

@srowen srowen Sep 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, if that's the extent of the usage, then I believe that ORC is OK with Kryo 4. That much is not a problem. Edit: Hm, on another second thought, if the serialization format changes from 3 to 4, I wonder if it means that config it writes becomes unreadable? does this config file matter to Spark?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also worried that part, but it's used only in a run-time SearchArgument serialization. There was no usage with ORC files.

</exclusion>
</exclusions>
</dependency>
<dependency>
Expand Down