-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-21422][BUILD] Depend on Apache ORC 1.4.0 #18640
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -132,6 +132,8 @@ | |
<hive.version.short>1.2.1</hive.version.short> | ||
<derby.version>10.12.1.1</derby.version> | ||
<parquet.version>1.8.2</parquet.version> | ||
<orc.version>1.4.0</orc.version> | ||
<orc.classifier>nohive</orc.classifier> | ||
<hive.parquet.version>1.6.0</hive.parquet.version> | ||
<jetty.version>9.3.20.v20170531</jetty.version> | ||
<javaxservlet.version>3.1.0</javaxservlet.version> | ||
|
@@ -207,6 +209,7 @@ | |
<flume.deps.scope>compile</flume.deps.scope> | ||
<hadoop.deps.scope>compile</hadoop.deps.scope> | ||
<hive.deps.scope>compile</hive.deps.scope> | ||
<orc.deps.scope>compile</orc.deps.scope> | ||
<parquet.deps.scope>compile</parquet.deps.scope> | ||
<parquet.test.deps.scope>test</parquet.test.deps.scope> | ||
|
||
|
@@ -1677,6 +1680,44 @@ | |
</exclusion> | ||
</exclusions> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.apache.orc</groupId> | ||
<artifactId>orc-core</artifactId> | ||
<version>${orc.version}</version> | ||
<classifier>${orc.classifier}</classifier> | ||
<scope>${orc.deps.scope}</scope> | ||
<exclusions> | ||
<exclusion> | ||
<groupId>org.apache.hadoop</groupId> | ||
<artifactId>hadoop-common</artifactId> | ||
</exclusion> | ||
<exclusion> | ||
<groupId>org.apache.hive</groupId> | ||
<artifactId>hive-storage-api</artifactId> | ||
</exclusion> | ||
</exclusions> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.apache.orc</groupId> | ||
<artifactId>orc-mapreduce</artifactId> | ||
<version>${orc.version}</version> | ||
<classifier>${orc.classifier}</classifier> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for review, @viirya . There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks. I think they are come from https://issues.apache.org/jira/browse/ORC-174. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right. The wording is a little bit different, but technically those jars come from that JIRA patch. |
||
<scope>${orc.deps.scope}</scope> | ||
<exclusions> | ||
<exclusion> | ||
<groupId>org.apache.hadoop</groupId> | ||
<artifactId>hadoop-common</artifactId> | ||
</exclusion> | ||
<exclusion> | ||
<groupId>org.apache.orc</groupId> | ||
<artifactId>orc-core</artifactId> | ||
</exclusion> | ||
<exclusion> | ||
<groupId>org.apache.hive</groupId> | ||
<artifactId>hive-storage-api</artifactId> | ||
</exclusion> | ||
</exclusions> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.apache.parquet</groupId> | ||
<artifactId>parquet-column</artifactId> | ||
|
@@ -2710,6 +2751,9 @@ | |
<profile> | ||
<id>hive-provided</id> | ||
</profile> | ||
<profile> | ||
<id>orc-provided</id> | ||
</profile> | ||
<profile> | ||
<id>parquet-provided</id> | ||
</profile> | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -86,6 +86,16 @@ | |
<scope>test</scope> | ||
</dependency> | ||
|
||
<dependency> | ||
<groupId>org.apache.orc</groupId> | ||
<artifactId>orc-core</artifactId> | ||
<classifier>${orc.classifier}</classifier> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sorry a dumb question, what does There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what exactly is the storage api? confused about this too ... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In Maven, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you, @rxin ! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok good to learn the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. sbt understands There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @rxin Storage-API is a separately released artifact from the Hive project. Basically, Storage-API are the in-memory format for Hive's vectorization. You could draw the analogy that Storage-Api is for Hive what Arrow is for Drill. It allows formats to read and write directly in the format that is needed by the execution engine. With the nohive classifier, ORC shades the storage-api jar into the ORC namespace so that it is compatible with any version of Hive. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you, @omalley ! |
||
</dependency> | ||
<dependency> | ||
<groupId>org.apache.orc</groupId> | ||
<artifactId>orc-mapreduce</artifactId> | ||
<classifier>${orc.classifier}</classifier> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.apache.parquet</groupId> | ||
<artifactId>parquet-column</artifactId> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the orc core module still contains hive related stuff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and to confirm, this exclusion is safe only if we don't use hive storage api of orc in sql/core, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for review, @cloud-fan .
orc-core-1.4.0.jar
hashive-storage-api
dependency. (Maven Repo)orc-core-1.4.0-nohive.jar
is a shaded jar file includinghive-storage-api
underorg.apache.orc
namespace.orc-core-1.4.0-nohive.jar
is designed for users and apps who don't want to depend on (or consider)hive
.nohive
is a classifier for this purpose.This PR uses
orc-core-1.4.0-nohive
only. To avoid Maven confusion, this exclusion makes it sure by removing thehive-storage-api
dependency explicitly fromorc-core
artifact.