-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-2848] Shade Guava in uber-jars. #1813
Conversation
I used a tool I wrote to check the references in the final jars (check it out: https://gist.github.com/vanzin/bd9057fadf4a296220b7). Verified only references to Optional point at the old "com.google.common" package. Currently running unit tests (don't think they'd really be affected) and will double-check the sbt-generated jar to make sure asm dependencies are not changed from before. |
@@ -0,0 +1,243 @@ | |||
/* | |||
* Copyright (C) 2011 The Guava Authors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PS I checked and there is no additional change we need to make to LICENSE or NOTICE as a result of including this. Guava is already AL2 and has no NOTICE that pertains to this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we still list it in our LICENSE? We've listed all source files that are not copyright Apache, which would include this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not according to my reading of
http://www.apache.org/dev/licensing-howto.html , in the part covering
bundled AL2 dependencies, but this stuff is always less than intuitive to
me.
I suppose the reasoning is that Spark's AL2 license already exactly
describes the terms of the licensing for that file. That's not quite true
for BSD or MIT licenses.
The copyright owner doesn't matter; there are hundreds of them. It matters
just how whoever they are license their IP to be used.
On Aug 11, 2014 12:22 AM, "Matei Zaharia" notifications@github.com wrote:
In core/src/main/java/com/google/common/base/Optional.java:
@@ -0,0 +1,243 @@
+/*
- * Copyright (C) 2011 The Guava Authors
Shouldn't we still list it in our LICENSE? We've listed all source files
that are not copyright Apache, which would include this one.—
Reply to this email directly or view it on GitHub
https://github.com/apache/spark/pull/1813/files#r16034129.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW if we're comfortable with not locking this file to the 14.0.1 version (or remembering to do this when we change guava versions), I can remove this fork (and clean up some code elsewhere in this patch too).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I would still include it because the copyright is from another organization. I don't see why adding this would hurt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vanzin You're referring to just not including this source file and instead getting it in binary form from the Guava artifact right? b/c yes if it stays in source form it certainly must keep its copyright statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srowen yes that's the alternative.
QA tests have started for PR 1813. This patch merges cleanly. |
QA results for PR 1813: |
After a side-bar -- I still think EDIT: OK, is this accurate? A "compile" scope Guava is the simple thing to do, and allows the shading solution to proceed. It however causes Guava to leak into apps that depend on Spark. But, only if they don't do the normal thing and treat Spark as "provided". A "provided" scope Guava is more unusual, and requires some extra work in POM, but also enables the shading solution. Guava doesn't leak at all, but, people not depending on Spark as "provided" must remember to depend on Guava for Spark. A "compile" scope Guava that is also "optional" works with the shading solution (? I think). Guava doesn't leak. But people using not using Spark as "provided" still have to remember to include Guava for Spark. The last two seem the same, so, I suppose I can't really argue for "compile+optional" and against "provided" since the semantics are the same and both are not precisely what we desire (which would be something like "runtime by default for transitive dependers"). Proceed unless anyone else has an opinion to the contrary, IMHO. |
@pwendell I think you may have suggested in jira at some point to create a separate deliverable for the shaded guava jar, and have the spark build depend on that. Is that what you had in mind? If that's the preferred approach, it would require code changes (change imports) and I'd need pointers for how to create this separate deliverable (I don't think it would live inside the spark build? how is it done for hive?). |
hey @vanzin - so yeah, in the past we've done this by creating separate dependencies and publishing them to maven central, this is a bit more work on our side, but it plays nicer for downstream users because the correct functioning of spark does not depend on our distribution artifacts. For instance, users who write applications that embed spark, they don't have to try to recreate or mimic our distribution artifacts. For this reason I have a slight bias for the existing approach, just because we've used it for a long time without any issues. The way we do this type of shading has been ad-hoc, we've typically cloned the relevant projects, re-written class names and then published them to maven central under the org.spark-project namespace. On the dev list I recently explained to @avati that it would be nice to include scripts in the Spark repo that do this. He's actually starting working on this and you can see an example for protobuf - I think the stuff he's done so far looks great: https://github.com/avati/spark-shaded/blob/master/0001-protobuf.sh I downloaded guava the other day and played around, I think it will be pretty straightforward to do this for guava as well. I actually know several other projects that have started to use our shaded artifacts (we tend to shade dependencies that are a common pain point) - so I think there is some value in continuing to do this. |
@pwendell creating the shaded deliverable for Guava should be trivial, but where do I create a pull request for it? I can work on the pom file to enable that and publish it to my github in the meantime. |
Here's a repo with a pom file to shade guava: |
@pwendell I looked at that script but it looks overly complicated, at least for guava. A simple pom that invokes the shade plugin is much, much simpler. Would it be ok to have the script just run the pom? |
Does the shader plug-in work if we want to actually publish a jar to maven central? If that's the case including a POM seems like a solid option. We'd want to at least have some simple script though that downloads guava and then inserts that pom - rather than maintain it in a separate repo. |
shade plug-in works in many cases but not in all. From what I understand, if the dependency is direct, shade plugin works fine. In case of protobuf it was indirect dependency - akka-2.3 specifically needs protobuf 2.5 and hadoop1 specifically needs protobuf 2.4. So a custom rebuilt package of either one was inevitable. I haven't looked deep into the guava situation, just provided some context. |
* @param includes Regexes for classes to include inside the matching package (empty = all). | ||
* @param excludes Regexes for classes to exclude from the matching package (empty = none). | ||
*/ | ||
class Relocator(prefix: String, shaded: String, includes: Seq[Regex], excludes: Seq[Regex]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be useful as a standalone sbt plug-in, I'm guessing a lot of other projects would use this - did you think at all about doing that? (of course that's totally up to you, just had the thought).
Hey so after thinking about it a bit more, maybe it's worth experimenting with this simpler approach of in-lining the shaded dependency in our jars using build plug-ins. The main reason I think it's worth trying is that, in the worst case, if this has some random unintended consequence we can just revert to our existing method of publishing fully shaded dependencies to maven central. One thought for @vanzin and @srowen - what if instead of having our own copy of Optional, we just continue advertise a full compile dependency on guava and we get Optional from that. Then basically we have a "public" use of Gauva which is for this Optional class and a "private" use of Guava which we shade away. I think that will work as long as Optional does not change in guava... but that is the exact scenario in which the current approach works as well. |
@pwendell This is complicated, so let me summarize what I think I understand and then give an opinion. The plan at hand concerned changes to the assembly JARs, pretty much exclusively. Spark continues to depend on Guava in order to compile, and so that consumers of Spark also see this dependency in their builds. (You can see the discussion with @vanzin about whether to alter how downstream consumers of the Maven build see the Guava dependency -- So I think the proposal was already to continue showing a dependency on Guava, but, to alter the assembly to include only Inlining the shaded JARs is all but like removing use of Guava, as far as the world is concerned. That certainly takes it out of the picture. The little snag is Optional. It will now inevitably collide with any other app that builds Guava in too. And downstream consumers can't manage away the conflict normally. It may either be a harmless warning or cause a real problem if Optional changes later. In general I don't like pushing the shading into the compile-time world. Remember the I think your second paragraph is basically the same thing proposed already, with one difference. Yes, the build still continues to say it depends on Guava. The concrete assembly is put together differently. I think you can use Guava's Optional rather than embedding a source copy of it and compiling that. Right @vanzin? with some shading cleverness? |
Lots of comments, let me try to write a coherent response to all of them. :-) Shaded jars and maven centralShouldn't be a problem. You're publishing your project's artifact to maven central, which just happens to include the shaded classes. This is similar to how assembly jars are published (http://search.maven.org/#browse|-1363484021). (EDIT: yes, they's only a pom for the assembly, but if the pom's packaging type is "jar", you get the jar there. Well, you get the point.) The Optional classIt should be possible to use Guava's original Optional, without having to fork it. The only reason I forked it is to allow changing the version of Guava without affecting the Spark API. But that can be done on-demand if needed. I'd still package it in the spark-core jar, otherwise the guava dependency would need to have "compile" scope (see below). Compile vs. ProvidedMy main argument for using "provided" is to avoid leaking guava into the user's compilation classpath. Users depend on spark-core (e.g.), and if spark-core has a compile dependency on guava, guava will be available in the user's compilation classpath (regadless of whether spark-core is set up as compile or provided). If that dependency is provided (and thus not transitive), then it never shows up; if the user needs guava, he needs to explicitly depend on it. This does have effects on packaging, though: for people using an existing spark assembly to run their apps, nothing changes. But if the user is creating his own uber-jar with spark and everything else in it, and running it in some custom environment, he'll need to explicitly package guava (even if his app doesn't use it). My belief is that the former case is the overwhelming majority, and the latter case means the user has to care about too many things already (e.g. choosing compatible versions of everything between his app and spark and hadoop et al) for this little thing to be a real concern. Shading at compile time@srowen let me know if I misunderstood, but do you mean creating a spark-core jar with the shaded guava classes in it? I actually started with that alternative, but it seems messy. Either you have all downstream projects reference the shaded classes (something that would be needed if going with the separate shaded jar approach), or you get into a weird place, where spark-core is referencing the shaded classes but all downstream projects are not. The assembly would fix everything (by shading again), but someone not using the assembly could get really weird errors. Also, having duplicate class names in the classpath really confuses some IDEs that automatically add imports. More manual configuration for people to add... sbt shading pluginI think the code I wrote can be added to the sbt-assembly plugin (with some modifications). It probably is the best place for that, anyway, since I created it as a merge strategy for that plugin. But that's for later. |
@vanzin Yes there's no problem with publishing artifacts containing copies of other artifacts as far as Maven Central is concerned. It's just icky for downstream consumers. I think the non-assembly artifacts should be the 'pristine' raw ingredients and assemblies can be crazy remixes. I can see the argument for provided. It has the pros and cons you mention. It's a moot point for people that use Spark as I had understood that the goal was to alter the Spark assembly in some way, such that with some certain footwork in the Hive on Spark build, the Hive on Spark assembly would work with the Spark assembly as deployed on people's clusters. That's mostly about getting Guava out of the way. I don't think this entails making anyone depend on a shaded anything in Maven. I think that's avoidable and undesirable. So I think we are all basically agreeing on that? So to be clear, Spark continues to express a dependency on Guava in its POM. It must. The shading affects deployed assemblies. The only delta with what @pwendell suggests seems to be shading in Guava's Optional rather than keeping a source copy. But that seems fine. |
Just package the class from the guava dependency instead. Note the dependency-fu needed in core/pom.xml and assembly/pom.xml to not expose the guava dependency in the user's compilation classpath.
Conflicts: project/SparkBuild.scala
Updated to remove forked code. I'm not too proud of the code to package things in the sbt build, but it's how I got things to work without having to expose Guava in the "compile" scope. |
Jenkins, retest this please. |
QA tests have started for PR 1813. This patch merges cleanly. |
QA results for PR 1813: |
lazy val settings = assemblySettings ++ Seq( | ||
test in assembly := {}, | ||
jarName in assembly <<= (version, moduleName) map { (v, mName) => mName + "-"+v + "-hadoop" + | ||
Option(System.getProperty("hadoop.version")).getOrElse("1.0.4") + ".jar" }, | ||
mergeStrategy in assembly := { | ||
case PathList("org", "datanucleus", xs @ _*) => MergeStrategy.discard | ||
case PathList("org", "objectweb", "asm", xs @ _*) => MergeStrategy.discard |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this related to SPARK-2848?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, no, this is leftover from a previous change in this patch. I'll remove.
Hey Marcelo, I took a look through this - I also discussed in some detail with @srowen this past weekend. If we want to fix this problem using shading in our build, this seems like as good a way as any to do it. It does substantially increase the complexity of the build which is not great, so maybe longer term we could consider moving towards just publishing a shaded version of guava upstream. Regarding this specific implementation, I noticed there is a lot of effort to mirror this logic in the sbt build. I wonder if maybe we should just leave this out of the sbt build entirely to reduce the complexiy and in the sbt build simply have guava be a compile time dependency like it was before. We've said in the past that our official reference build for packagers is maven, so I think it's fine if some of the more esoteric build configurations do not exist in sbt. |
Hi Patrick, Regarding having the compile-scoped dependency in sbt only, the only way I can think of doing it is to create a profile that is always enabled from the sbt build, since the dependencies always come from the poms. I agree that the sbt build is a little hacky with this patch, but at least it generates the same artifacts as the maven one. So it would avoid "this works with maven but not with sbt" situations, even though they might be rare. |
Hi @pwendell, I reverted the sbt changes as we talked about offline; required a tiny change in the pom so that everything works as intended in both builds. |
QA tests have started for PR 1813 at commit
|
QA tests have finished for PR 1813 at commit
|
<dependency> | ||
<groupId>com.google.guava</groupId> | ||
<artifactId>guava</artifactId> | ||
<scope>compile</scope> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey just wondering - since spark core lists this a compile dependency, and everything else depends on spark core, what makes this necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry - I see - the issue is that elsewhere this is not in compile scope. However, why not just have it be compile scope by default in the root pom? Is the issue that we need to make sure it does not exist at compile scope in our published root pom?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See previous discussion I had with Sean in this PR.
There's really no good scope in maven for what we want to do. "compile" means that guava will show up in the user's compilation classpath, and since we're shading guava in the assembly, it may lead to runtime errors since guava won't be there. "provided" means that guava is not exposed to the user at all; so if the user wants to use guava, he'll need to add an explicit dependency and package it himself.
I like "provided" better because it means things fail for the user at build time, not runtime, if the user is using guava but hasn't declared the dependency. And that's why I had all those hacks in the sbt build that you asked me to remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just meant that, since we are using the shade plug-in, will it just remove the entire guava from our published pom's anyways... then I realized that it won't remove guava from the root pom most likely, since we only shade inside of spark-core.
Just a minor question but LGTM - we should merge this one into master only. |
Jenkins, test this please. |
QA tests have started for PR 1813 at commit
|
QA tests have finished for PR 1813 at commit
|
Okay - let's give this a try. Merging it into master. |
For further discussion, please check the JIRA entry. This change moves Guava classes to a different package so that they don't conflict with the user-provided Guava (or the Hadoop-provided one). Since one class (Optional) was exposed through Spark's public API, that class was forked from Guava at the current dependency version (14.0.1) so that it can be kept going forward (until the API is cleaned). Note this change has a few implications: - *all* classes in the final jars will reference the relocated classes. If Hadoop classes are included (i.e. "-Phadoop-provided" is not activated), those will also reference the Guava 14 classes (instead of the Guava 11 classes from the Hadoop classpath). - if the Guava version in Spark is ever changed, the new Guava will still reference the forked Optional class; this may or may not be a problem, but in the long term it's better to think about removing Optional from the public API. For the end user, there are two visible implications: - Guava is not provided as a transitive dependency anymore (since it's "provided" in Spark) - At runtime, unless they provide their own, they'll either have no Guava or Hadoop's version of Guava (11), depending on how they set up their classpath. Note that this patch does not change the sbt deliverables; those will still contain guava in its original package, and provide guava as a compile-time dependency. This assumes that maven is the canonical build, and sbt-built artifacts are not (officially) published. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #1813 from vanzin/SPARK-2848 and squashes the following commits: 9bdffb0 [Marcelo Vanzin] Undo sbt build changes. 819b445 [Marcelo Vanzin] Review feedback. 05e0a3d [Marcelo Vanzin] Merge branch 'master' into SPARK-2848 fef4370 [Marcelo Vanzin] Unfork Optional.java. d3ea8e1 [Marcelo Vanzin] Exclude asm classes from final jar. 637189b [Marcelo Vanzin] Add hacky filter to prefer Spark's copy of Optional. 2fec990 [Marcelo Vanzin] Shade Guava in the sbt build. 616998e [Marcelo Vanzin] Shade Guava in the maven build, fork Guava's Optional.java.
@pwendell any idea why this PR wasn't automatically closed after the merge? |
@vanzin Would you mind to have a look at SPARK-3217? I think this PR breaks Spark Maven build. Jenkins didn't complain because it uses SBT and thus doesn't shade Guava. |
For further discussion, please check the JIRA entry. This change moves Guava classes to a different package so that they don't conflict with the user-provided Guava (or the Hadoop-provided one). Since one class (Optional) was exposed through Spark's public API, that class was forked from Guava at the current dependency version (14.0.1) so that it can be kept going forward (until the API is cleaned). Note this change has a few implications: - *all* classes in the final jars will reference the relocated classes. If Hadoop classes are included (i.e. "-Phadoop-provided" is not activated), those will also reference the Guava 14 classes (instead of the Guava 11 classes from the Hadoop classpath). - if the Guava version in Spark is ever changed, the new Guava will still reference the forked Optional class; this may or may not be a problem, but in the long term it's better to think about removing Optional from the public API. For the end user, there are two visible implications: - Guava is not provided as a transitive dependency anymore (since it's "provided" in Spark) - At runtime, unless they provide their own, they'll either have no Guava or Hadoop's version of Guava (11), depending on how they set up their classpath. Note that this patch does not change the sbt deliverables; those will still contain guava in its original package, and provide guava as a compile-time dependency. This assumes that maven is the canonical build, and sbt-built artifacts are not (officially) published. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes apache#1813 from vanzin/SPARK-2848 and squashes the following commits: 9bdffb0 [Marcelo Vanzin] Undo sbt build changes. 819b445 [Marcelo Vanzin] Review feedback. 05e0a3d [Marcelo Vanzin] Merge branch 'master' into SPARK-2848 fef4370 [Marcelo Vanzin] Unfork Optional.java. d3ea8e1 [Marcelo Vanzin] Exclude asm classes from final jar. 637189b [Marcelo Vanzin] Add hacky filter to prefer Spark's copy of Optional. 2fec990 [Marcelo Vanzin] Shade Guava in the sbt build. 616998e [Marcelo Vanzin] Shade Guava in the maven build, fork Guava's Optional.java.
+1 |
For further discussion, please check the JIRA entry. This change moves Guava classes to a different package so that they don't conflict with the user-provided Guava (or the Hadoop-provided one). Since one class (Optional) was exposed through Spark's public API, that class was forked from Guava at the current dependency version (14.0.1) so that it can be kept going forward (until the API is cleaned). Note this change has a few implications: - *all* classes in the final jars will reference the relocated classes. If Hadoop classes are included (i.e. "-Phadoop-provided" is not activated), those will also reference the Guava 14 classes (instead of the Guava 11 classes from the Hadoop classpath). - if the Guava version in Spark is ever changed, the new Guava will still reference the forked Optional class; this may or may not be a problem, but in the long term it's better to think about removing Optional from the public API. For the end user, there are two visible implications: - Guava is not provided as a transitive dependency anymore (since it's "provided" in Spark) - At runtime, unless they provide their own, they'll either have no Guava or Hadoop's version of Guava (11), depending on how they set up their classpath. Note that this patch does not change the sbt deliverables; those will still contain guava in its original package, and provide guava as a compile-time dependency. This assumes that maven is the canonical build, and sbt-built artifacts are not (officially) published. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes apache#1813 from vanzin/SPARK-2848 and squashes the following commits: 9bdffb0 [Marcelo Vanzin] Undo sbt build changes. 819b445 [Marcelo Vanzin] Review feedback. 05e0a3d [Marcelo Vanzin] Merge branch 'master' into SPARK-2848 fef4370 [Marcelo Vanzin] Unfork Optional.java. d3ea8e1 [Marcelo Vanzin] Exclude asm classes from final jar. 637189b [Marcelo Vanzin] Add hacky filter to prefer Spark's copy of Optional. 2fec990 [Marcelo Vanzin] Shade Guava in the sbt build. 616998e [Marcelo Vanzin] Shade Guava in the maven build, fork Guava's Optional.java.
For further discussion, please check the JIRA entry. This change moves Guava classes to a different package so that they don't conflict with the user-provided Guava (or the Hadoop-provided one). Since one class (Optional) was exposed through Spark's public API, that class was forked from Guava at the current dependency version (14.0.1) so that it can be kept going forward (until the API is cleaned). Note this change has a few implications: - *all* classes in the final jars will reference the relocated classes. If Hadoop classes are included (i.e. "-Phadoop-provided" is not activated), those will also reference the Guava 14 classes (instead of the Guava 11 classes from the Hadoop classpath). - if the Guava version in Spark is ever changed, the new Guava will still reference the forked Optional class; this may or may not be a problem, but in the long term it's better to think about removing Optional from the public API. For the end user, there are two visible implications: - Guava is not provided as a transitive dependency anymore (since it's "provided" in Spark) - At runtime, unless they provide their own, they'll either have no Guava or Hadoop's version of Guava (11), depending on how they set up their classpath. Note that this patch does not change the sbt deliverables; those will still contain guava in its original package, and provide guava as a compile-time dependency. This assumes that maven is the canonical build, and sbt-built artifacts are not (officially) published. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes apache#1813 from vanzin/SPARK-2848 and squashes the following commits: 9bdffb0 [Marcelo Vanzin] Undo sbt build changes. 819b445 [Marcelo Vanzin] Review feedback. 05e0a3d [Marcelo Vanzin] Merge branch 'master' into SPARK-2848 fef4370 [Marcelo Vanzin] Unfork Optional.java. d3ea8e1 [Marcelo Vanzin] Exclude asm classes from final jar. 637189b [Marcelo Vanzin] Add hacky filter to prefer Spark's copy of Optional. 2fec990 [Marcelo Vanzin] Shade Guava in the sbt build. 616998e [Marcelo Vanzin] Shade Guava in the maven build, fork Guava's Optional.java.
For further discussion, please check the JIRA entry. This change moves Guava classes to a different package so that they don't conflict with the user-provided Guava (or the Hadoop-provided one). Since one class (Optional) was exposed through Spark's public API, that class was forked from Guava at the current dependency version (14.0.1) so that it can be kept going forward (until the API is cleaned). Note this change has a few implications: - *all* classes in the final jars will reference the relocated classes. If Hadoop classes are included (i.e. "-Phadoop-provided" is not activated), those will also reference the Guava 14 classes (instead of the Guava 11 classes from the Hadoop classpath). - if the Guava version in Spark is ever changed, the new Guava will still reference the forked Optional class; this may or may not be a problem, but in the long term it's better to think about removing Optional from the public API. For the end user, there are two visible implications: - Guava is not provided as a transitive dependency anymore (since it's "provided" in Spark) - At runtime, unless they provide their own, they'll either have no Guava or Hadoop's version of Guava (11), depending on how they set up their classpath. Note that this patch does not change the sbt deliverables; those will still contain guava in its original package, and provide guava as a compile-time dependency. This assumes that maven is the canonical build, and sbt-built artifacts are not (officially) published. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes apache#1813 from vanzin/SPARK-2848 and squashes the following commits: 9bdffb0 [Marcelo Vanzin] Undo sbt build changes. 819b445 [Marcelo Vanzin] Review feedback. 05e0a3d [Marcelo Vanzin] Merge branch 'master' into SPARK-2848 fef4370 [Marcelo Vanzin] Unfork Optional.java. d3ea8e1 [Marcelo Vanzin] Exclude asm classes from final jar. 637189b [Marcelo Vanzin] Add hacky filter to prefer Spark's copy of Optional. 2fec990 [Marcelo Vanzin] Shade Guava in the sbt build. 616998e [Marcelo Vanzin] Shade Guava in the maven build, fork Guava's Optional.java.
do we have an ETA to get this pull request merged to master? The guava shading issue is causing a problem for client libs that has a dependency on swift-service when spark is compiled with hadoop-2.4 |
@mfawzymkh As you can see this was merged into Spark a long time ago. What are you waiting on? |
Sorry , I confused this with some other issue. Thanks Sent from my iPhone
|
For further discussion, please check the JIRA entry.
This change moves Guava classes to a different package so that they don't conflict with the user-provided Guava (or the Hadoop-provided one). Since one class (Optional) was exposed through Spark's public API, that class was forked from Guava at the current dependency version (14.0.1) so that it can be kept going forward (until the API is cleaned).
Note this change has a few implications:
For the end user, there are two visible implications:
Note that this patch does not change the sbt deliverables; those will still contain guava in its original package, and provide guava as a compile-time dependency. This assumes that maven is the canonical build, and sbt-built artifacts are not (officially) published.