Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARK-3039: Allow spark to be built using avro-mapred for hadoop2 #1945

Closed
wants to merge 1 commit into from
Closed

SPARK-3039: Allow spark to be built using avro-mapred for hadoop2 #1945

wants to merge 1 commit into from

Conversation

bbossy
Copy link
Contributor

@bbossy bbossy commented Aug 14, 2014

SPARK-3039: Adds the maven property "avro.mapred.classifier" to build spark-assembly with avro-mapred with support for the new Hadoop API. Sets this property to hadoop2 for Hadoop 2 profiles.

I am not very familiar with maven, nor do I know whether this potentially breaks something in the hive part of spark. There might be a more elegant way of doing this.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@srowen
Copy link
Member

srowen commented Aug 14, 2014

I've looked at this part of the build a lot and can say LGTM

@bbossy
Copy link
Contributor Author

bbossy commented Aug 14, 2014

Should I also add the avro.mapred.classifier property to the yarn profile? Maybe even yarn-alpha and mapr?

Since now to build it according to the README one should run: sbt/sbt -Dhadoop.version=2.2.0 -Pyarn -Davro.mapred.classifier=hadoop2 assembly

@srowen
Copy link
Member

srowen commented Aug 14, 2014

You have to specify a Hadoop profile already, and you added the classifier to all of them. So that's fine. Building with YARN is orthogonal, so doesn't belong elsewhere I think.

@bbossy
Copy link
Contributor Author

bbossy commented Aug 14, 2014

The problem I see, is that if you build according to the README:

# Apache Hadoop 2.2.X and newer
$ sbt/sbt -Dhadoop.version=2.2.0 -Pyarn assembly

avro.mapred.classifier will not be set to hadoop2

Either the README should be changed to account for this, or the property should be added to the yarn and yarn-alpha profile (not the mapr, I think)

Or is there a way to fix this with maven?

@srowen
Copy link
Member

srowen commented Aug 14, 2014

Yeah that's out of date I believe. For example -Phadoop-2.3 has to be specified with -Dhadoop.version=2.3.0. And I think mvn is the primary build now. I imagine you could correct this in the PR here. I wonder if the README should not just point to the web site rather than duplicate this info? the web docs are up to date.

@bbossy
Copy link
Contributor Author

bbossy commented Aug 14, 2014

Yeah, you're right about yarn being orthogonal to the Hadoop version.

Apart from the maven/sbt question there is another issue: The Cloudera CDH 4.2.0 with MapReduce v2 case from the README is not covered by a hadoop profile right now. I would need to change it to
sbt/sbt -Dhadoop.version=2.0.0-cdh4.2.0 -Davro.mapred.classifier=hadoop2 -Pyarn assembly or the mvn equivalent.

@srowen
Copy link
Member

srowen commented Aug 15, 2014

I think it works with the invocation you describe. Honestly it's not a big priority, this version, but nice to get it right. Want to open a JIRA to track updating/deleting the info from README.md? I think it needs to be fixed one way or the other.

@bbossy
Copy link
Contributor Author

bbossy commented Aug 15, 2014

Created the issue: https://issues.apache.org/jira/browse/SPARK-3069 (Build instructions in README are outdated)

@srowen: Thank you for your input!

@SparkQA
Copy link

SparkQA commented Sep 5, 2014

Can one of the admins verify this patch?

@pwendell
Copy link
Contributor

Yeah - LGTM pending tests.

@SparkQA
Copy link

SparkQA commented Sep 12, 2014

QA tests have started for PR 1945 at commit c32ce59.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 12, 2014

QA tests have finished for PR 1945 at commit c32ce59.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • throw new IllegalStateException("The main method in the given main class must be static")

@asfgit asfgit closed this in c243b21 Sep 15, 2014
asfgit pushed a commit that referenced this pull request Sep 15, 2014
SPARK-3039: Adds the maven property "avro.mapred.classifier" to build spark-assembly with avro-mapred with support for the new Hadoop API. Sets this property to hadoop2 for Hadoop 2 profiles.

I am not very familiar with maven, nor do I know whether this potentially breaks something in the hive part of spark. There might be a more elegant way of doing this.

Author: Bertrand Bossy <bertrandbossy@gmail.com>

Closes #1945 from bbossy/SPARK-3039 and squashes the following commits:

c32ce59 [Bertrand Bossy] SPARK-3039: Allow spark to be built using avro-mapred for hadoop2

(cherry picked from commit c243b21)
Signed-off-by: Patrick Wendell <pwendell@gmail.com>
@andrewor14
Copy link
Contributor

Hey @pwendell @srowen @bbossy this is actually causing issues for SBT applications that use the spark-hive_2.10 module. More details can be found here: https://issues.apache.org/jira/browse/SPARK-4359. For now, I have reverted this in branch-1.1 to prepare for the Spark 1.1.1 release. It may need to be reverted in other branches as well. Just a heads up.

szehon-ho pushed a commit to szehon-ho/spark that referenced this pull request Aug 7, 2024
* Switch to ENABLE_COMET and ENABLE_COMET_SCAN_ONLY

Previous environment variable did not include the `ENABLE_` prefix.

* Use explicit string config name

This avoids a ClassNotFoundException
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants