SPARK-3039: Allow spark to be built using avro-mapred for hadoop2 #1945

bbossy · 2014-08-14T13:55:01Z

SPARK-3039: Adds the maven property "avro.mapred.classifier" to build spark-assembly with avro-mapred with support for the new Hadoop API. Sets this property to hadoop2 for Hadoop 2 profiles.

I am not very familiar with maven, nor do I know whether this potentially breaks something in the hive part of spark. There might be a more elegant way of doing this.

AmplabJenkins · 2014-08-14T13:58:19Z

Can one of the admins verify this patch?

srowen · 2014-08-14T14:07:53Z

I've looked at this part of the build a lot and can say LGTM

bbossy · 2014-08-14T15:37:11Z

Should I also add the avro.mapred.classifier property to the yarn profile? Maybe even yarn-alpha and mapr?

Since now to build it according to the README one should run: sbt/sbt -Dhadoop.version=2.2.0 -Pyarn -Davro.mapred.classifier=hadoop2 assembly

srowen · 2014-08-14T15:40:31Z

You have to specify a Hadoop profile already, and you added the classifier to all of them. So that's fine. Building with YARN is orthogonal, so doesn't belong elsewhere I think.

bbossy · 2014-08-14T15:51:06Z

The problem I see, is that if you build according to the README:

# Apache Hadoop 2.2.X and newer
$ sbt/sbt -Dhadoop.version=2.2.0 -Pyarn assembly

avro.mapred.classifier will not be set to hadoop2

Either the README should be changed to account for this, or the property should be added to the yarn and yarn-alpha profile (not the mapr, I think)

Or is there a way to fix this with maven?

srowen · 2014-08-14T15:53:41Z

Yeah that's out of date I believe. For example -Phadoop-2.3 has to be specified with -Dhadoop.version=2.3.0. And I think mvn is the primary build now. I imagine you could correct this in the PR here. I wonder if the README should not just point to the web site rather than duplicate this info? the web docs are up to date.

bbossy · 2014-08-14T16:47:56Z

Yeah, you're right about yarn being orthogonal to the Hadoop version.

Apart from the maven/sbt question there is another issue: The Cloudera CDH 4.2.0 with MapReduce v2 case from the README is not covered by a hadoop profile right now. I would need to change it to
sbt/sbt -Dhadoop.version=2.0.0-cdh4.2.0 -Davro.mapred.classifier=hadoop2 -Pyarn assembly or the mvn equivalent.

srowen · 2014-08-15T09:06:15Z

I think it works with the invocation you describe. Honestly it's not a big priority, this version, but nice to get it right. Want to open a JIRA to track updating/deleting the info from README.md? I think it needs to be fixed one way or the other.

bbossy · 2014-08-15T09:40:40Z

Created the issue: https://issues.apache.org/jira/browse/SPARK-3069 (Build instructions in README are outdated)

@srowen: Thank you for your input!

SparkQA · 2014-09-05T23:43:16Z

Can one of the admins verify this patch?

pwendell · 2014-09-12T22:00:41Z

Yeah - LGTM pending tests.

SparkQA · 2014-09-12T22:04:17Z

QA tests have started for PR 1945 at commit c32ce59.

This patch merges cleanly.

SparkQA · 2014-09-12T23:49:07Z

QA tests have finished for PR 1945 at commit c32ce59.

This patch passes unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- throw new IllegalStateException("The main method in the given main class must be static")

SPARK-3039: Adds the maven property "avro.mapred.classifier" to build spark-assembly with avro-mapred with support for the new Hadoop API. Sets this property to hadoop2 for Hadoop 2 profiles. I am not very familiar with maven, nor do I know whether this potentially breaks something in the hive part of spark. There might be a more elegant way of doing this. Author: Bertrand Bossy <bertrandbossy@gmail.com> Closes #1945 from bbossy/SPARK-3039 and squashes the following commits: c32ce59 [Bertrand Bossy] SPARK-3039: Allow spark to be built using avro-mapred for hadoop2 (cherry picked from commit c243b21) Signed-off-by: Patrick Wendell <pwendell@gmail.com>

andrewor14 · 2014-11-12T19:03:51Z

Hey @pwendell @srowen @bbossy this is actually causing issues for SBT applications that use the spark-hive_2.10 module. More details can be found here: https://issues.apache.org/jira/browse/SPARK-4359. For now, I have reverted this in branch-1.1 to prepare for the Spark 1.1.1 release. It may need to be reverted in other branches as well. Just a heads up.

* Switch to ENABLE_COMET and ENABLE_COMET_SCAN_ONLY Previous environment variable did not include the `ENABLE_` prefix. * Use explicit string config name This avoids a ClassNotFoundException

SPARK-3039: Allow spark to be built using avro-mapred for hadoop2

c32ce59

asfgit closed this in c243b21 Sep 15, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPARK-3039: Allow spark to be built using avro-mapred for hadoop2 #1945

SPARK-3039: Allow spark to be built using avro-mapred for hadoop2 #1945

bbossy commented Aug 14, 2014

AmplabJenkins commented Aug 14, 2014

srowen commented Aug 14, 2014

bbossy commented Aug 14, 2014

srowen commented Aug 14, 2014

bbossy commented Aug 14, 2014

srowen commented Aug 14, 2014

bbossy commented Aug 14, 2014

srowen commented Aug 15, 2014

bbossy commented Aug 15, 2014

SparkQA commented Sep 5, 2014

pwendell commented Sep 12, 2014

SparkQA commented Sep 12, 2014

SparkQA commented Sep 12, 2014

andrewor14 commented Nov 12, 2014

SPARK-3039: Allow spark to be built using avro-mapred for hadoop2 #1945

SPARK-3039: Allow spark to be built using avro-mapred for hadoop2 #1945

Conversation

bbossy commented Aug 14, 2014

AmplabJenkins commented Aug 14, 2014

srowen commented Aug 14, 2014

bbossy commented Aug 14, 2014

srowen commented Aug 14, 2014

bbossy commented Aug 14, 2014

srowen commented Aug 14, 2014

bbossy commented Aug 14, 2014

srowen commented Aug 15, 2014

bbossy commented Aug 15, 2014

SparkQA commented Sep 5, 2014

pwendell commented Sep 12, 2014

SparkQA commented Sep 12, 2014

SparkQA commented Sep 12, 2014

andrewor14 commented Nov 12, 2014