Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-17714][Core][test-maven][test-hadoop2.6]Avoid using ExecutorClassLoader to load Netty generated classes #16859

Closed
wants to merge 2 commits into from

Conversation

zsxwing
Copy link
Member

@zsxwing zsxwing commented Feb 8, 2017

What changes were proposed in this pull request?

Netty's MessageToMessageEncoder uses Javassist to generate a matcher class and the implementation calls Class.forName to check if this class is already generated. If MessageEncoder or MessageDecoder is created in ExecutorClassLoader.findClass, it will cause ClassCircularityError. This is because loading this Netty generated class will call ExecutorClassLoader.findClass to search this class, and ExecutorClassLoader will try to use RPC to load it and cause to load the non-exist matcher class again. JVM will report ClassCircularityError to prevent such infinite recursion.

Why it only happens in Maven builds

It's because Maven and SBT have different class loader tree. The Maven build will set a URLClassLoader as the current context class loader to run the tests and expose this issue. The class loader tree is as following:

bootstrap class loader ------ ... ----- REPL class loader ---- ExecutorClassLoader
|
|
URLClasssLoader

The SBT build uses the bootstrap class loader directly and ReplSuite.test("propagation of local properties") is the first test in ReplSuite, which happens to load io/netty/util/internal/__matchers__/org/apache/spark/network/protocol/MessageMatcher into the bootstrap class loader (Note: in maven build, it's loaded into URLClasssLoader so it cannot be found in ExecutorClassLoader). This issue can be reproduced in SBT as well. Here are the produce steps:

  • Enable hadoop.caller.context.enabled.
  • Replace Class.forName with Utils.classForName in object CallerContext.
  • Ignore ReplSuite.test("propagation of local properties").
  • Run ReplSuite using SBT.

This PR just creates a singleton MessageEncoder and MessageDecoder and makes sure they are created before switching to ExecutorClassLoader. TransportContext will be created when creating RpcEnv and that happens before creating ExecutorClassLoader.

How was this patch tested?

Jenkins

@@ -2599,14 +2599,10 @@ private[spark] object Utils extends Logging {

private[util] object CallerContext extends Logging {
val callerContextSupported: Boolean = {
SparkHadoopUtil.get.conf.getBoolean("hadoop.caller.context.enabled", false) && {
SparkHadoopUtil.get.conf.getBoolean("hadoop.caller.context.enabled", true) && {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change default value to false later. Just to test if this PR fixes the issue on Jenkins.

@@ -48,7 +47,7 @@
* on the channel for at least `requestTimeoutMs`. Note that this is duplex traffic; we will not
* timeout if the client is continuously sending but getting no responses, for simplicity.
*/
public class TransportChannelHandler extends SimpleChannelInboundHandler<Message> {
public class TransportChannelHandler extends ChannelInboundHandlerAdapter {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SimpleChannelInboundHandler also uses Javassist to generate a matcher class. Since SimpleChannelInboundHandler provides little value for us, I just changed to extend ChannelInboundHandlerAdapter directly.

@SparkQA
Copy link

SparkQA commented Feb 8, 2017

Test build #72598 has finished for PR 16859 at commit 1c88474.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public class TransportChannelHandler extends ChannelInboundHandlerAdapter

@dongjoon-hyun
Copy link
Member

Hi, @zsxwing .
Currently, SparkPullRequestBuilder is broken and HOTFIX is #16858 .

@zsxwing zsxwing changed the title [SPARK-17714][Core][maven]Avoid using ExecutorClassLoader to load Netty generated classes [SPARK-17714][Core][maven][test-hadoop2.6]Avoid using ExecutorClassLoader to load Netty generated classes Feb 8, 2017
@zsxwing
Copy link
Member Author

zsxwing commented Feb 8, 2017

retest this please

@SparkQA
Copy link

SparkQA commented Feb 8, 2017

Test build #72599 has finished for PR 16859 at commit 1c88474.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public class TransportChannelHandler extends ChannelInboundHandlerAdapter

@zsxwing
Copy link
Member Author

zsxwing commented Feb 8, 2017

cc @vanzin

Copy link
Contributor

@vanzin vanzin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok; you have a typo in the description (calls -> class).

public static final MessageDecoder INSTANCE = new MessageDecoder();

private MessageDecoder() {
super();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not really necessary (also in the other class)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

@vanzin
Copy link
Contributor

vanzin commented Feb 9, 2017

btw I think you want [test-maven] to run the builder on maven. (last one ran on sbt.)

@zsxwing zsxwing changed the title [SPARK-17714][Core][maven][test-hadoop2.6]Avoid using ExecutorClassLoader to load Netty generated classes [SPARK-17714][Core][test-maven][test-hadoop2.6]Avoid using ExecutorClassLoader to load Netty generated classes Feb 9, 2017
@SparkQA
Copy link

SparkQA commented Feb 9, 2017

Test build #3570 has finished for PR 16859 at commit 1c88474.

  • This patch fails from timeout after a configured wait of `250m`.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public class TransportChannelHandler extends ChannelInboundHandlerAdapter

@dongjoon-hyun
Copy link
Member

Retest this please

@SparkQA
Copy link

SparkQA commented Feb 9, 2017

Test build #72655 has finished for PR 16859 at commit 1c88474.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public class TransportChannelHandler extends ChannelInboundHandlerAdapter

@zsxwing
Copy link
Member Author

zsxwing commented Feb 9, 2017

Test build #3570 has finished for PR 16859 at commit 1c88474.

This is the known OOM issue.

Test build #72655 has finished for PR 16859 at commit 1c88474.

The only failure test is the kafka flaky test.

@zsxwing
Copy link
Member Author

zsxwing commented Feb 9, 2017

I'm going to change the default value back since ReplSuite passed in Test build #72655 has finished for PR 16859 at commit 1c88474..

@SparkQA
Copy link

SparkQA commented Feb 9, 2017

Test build #72660 has finished for PR 16859 at commit 20b6fd4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 10, 2017

Test build #72665 has finished for PR 16859 at commit 7fcb317.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member Author

zsxwing commented Feb 13, 2017

Thanks! Merging to master and 2.1.

asfgit pushed a commit that referenced this pull request Feb 13, 2017
…lassLoader to load Netty generated classes

## What changes were proposed in this pull request?

Netty's `MessageToMessageEncoder` uses [Javassist](https://github.com/netty/netty/blob/91a0bdc17a8298437d6de08a8958d753799bd4a6/common/src/main/java/io/netty/util/internal/JavassistTypeParameterMatcherGenerator.java#L62) to generate a matcher class and the implementation calls `Class.forName` to check if this class is already generated. If `MessageEncoder` or `MessageDecoder` is created in `ExecutorClassLoader.findClass`, it will cause `ClassCircularityError`. This is because loading this Netty generated class will call `ExecutorClassLoader.findClass` to search this class, and `ExecutorClassLoader` will try to use RPC to load it and cause to load the non-exist matcher class again. JVM will report `ClassCircularityError` to prevent such infinite recursion.

##### Why it only happens in Maven builds

It's because Maven and SBT have different class loader tree. The Maven build will set a URLClassLoader as the current context class loader to run the tests and expose this issue. The class loader tree is as following:

```
bootstrap class loader ------ ... ----- REPL class loader ---- ExecutorClassLoader
|
|
URLClasssLoader
```

The SBT build uses the bootstrap class loader directly and `ReplSuite.test("propagation of local properties")` is the first test in ReplSuite, which happens to load `io/netty/util/internal/__matchers__/org/apache/spark/network/protocol/MessageMatcher` into the bootstrap class loader (Note: in maven build, it's loaded into URLClasssLoader so it cannot be found in ExecutorClassLoader). This issue can be reproduced in SBT as well. Here are the produce steps:
- Enable `hadoop.caller.context.enabled`.
- Replace `Class.forName` with `Utils.classForName` in `object CallerContext`.
- Ignore `ReplSuite.test("propagation of local properties")`.
- Run `ReplSuite` using SBT.

This PR just creates a singleton MessageEncoder and MessageDecoder and makes sure they are created before switching to ExecutorClassLoader. TransportContext will be created when creating RpcEnv and that happens before creating ExecutorClassLoader.

## How was this patch tested?

Jenkins

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #16859 from zsxwing/SPARK-17714.

(cherry picked from commit 905fdf0)
Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
@zsxwing
Copy link
Member Author

zsxwing commented Feb 13, 2017

@vanzin I just merged this one. If you have other comments, I will address them in a separate PR. Thanks!

@asfgit asfgit closed this in 905fdf0 Feb 13, 2017
@zsxwing zsxwing deleted the SPARK-17714 branch February 13, 2017 20:06
@vanzin
Copy link
Contributor

vanzin commented Feb 13, 2017

Nope, don't have anything else.

cmonkey pushed a commit to cmonkey/spark that referenced this pull request Feb 15, 2017
…lassLoader to load Netty generated classes

## What changes were proposed in this pull request?

Netty's `MessageToMessageEncoder` uses [Javassist](https://github.com/netty/netty/blob/91a0bdc17a8298437d6de08a8958d753799bd4a6/common/src/main/java/io/netty/util/internal/JavassistTypeParameterMatcherGenerator.java#L62) to generate a matcher class and the implementation calls `Class.forName` to check if this class is already generated. If `MessageEncoder` or `MessageDecoder` is created in `ExecutorClassLoader.findClass`, it will cause `ClassCircularityError`. This is because loading this Netty generated class will call `ExecutorClassLoader.findClass` to search this class, and `ExecutorClassLoader` will try to use RPC to load it and cause to load the non-exist matcher class again. JVM will report `ClassCircularityError` to prevent such infinite recursion.

##### Why it only happens in Maven builds

It's because Maven and SBT have different class loader tree. The Maven build will set a URLClassLoader as the current context class loader to run the tests and expose this issue. The class loader tree is as following:

```
bootstrap class loader ------ ... ----- REPL class loader ---- ExecutorClassLoader
|
|
URLClasssLoader
```

The SBT build uses the bootstrap class loader directly and `ReplSuite.test("propagation of local properties")` is the first test in ReplSuite, which happens to load `io/netty/util/internal/__matchers__/org/apache/spark/network/protocol/MessageMatcher` into the bootstrap class loader (Note: in maven build, it's loaded into URLClasssLoader so it cannot be found in ExecutorClassLoader). This issue can be reproduced in SBT as well. Here are the produce steps:
- Enable `hadoop.caller.context.enabled`.
- Replace `Class.forName` with `Utils.classForName` in `object CallerContext`.
- Ignore `ReplSuite.test("propagation of local properties")`.
- Run `ReplSuite` using SBT.

This PR just creates a singleton MessageEncoder and MessageDecoder and makes sure they are created before switching to ExecutorClassLoader. TransportContext will be created when creating RpcEnv and that happens before creating ExecutorClassLoader.

## How was this patch tested?

Jenkins

Author: Shixiong Zhu <shixiong@databricks.com>

Closes apache#16859 from zsxwing/SPARK-17714.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants