-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-18646][REPL] Set parent classloader as null for ExecutorClassLoader #17074
Conversation
This change is to address a classloader problem under sbt environment more details can be found in Jira [https://issues.apache.org/jira/browse/SPARK-19675]
Thanks for working on this. It's been awhile since I looked at the classloader bits so I'd need to refresh my memory a bit, but regardless this PR probably needs better tests - specifically including something which wasn't working before and does work with the change. |
@holdenk the change is to stop passing 'parent' to the parent class(ClassLoader), however this introduced a link between ExecutorClassLoader to SystemClassLoader. Most of the time, SystemClassLoader is parent loader itself. Then it causes issues with userClassPathFirst (SPARK-18646) and if it is not parent loader, it causes issues like InvalidClassException(SPARK-19675) I added a UT for this |
ok to test |
Test build #73796 has finished for PR 17074 at commit
|
Test build #73799 has finished for PR 17074 at commit
|
Test build #73801 has finished for PR 17074 at commit
|
Test build #73805 has started for PR 17074 at commit |
@sknapp |
retest this please |
Test build #78258 has finished for PR 17074 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR looks good overall, only have some minor comments.
@@ -44,7 +44,7 @@ class ExecutorClassLoader( | |||
env: SparkEnv, | |||
classUri: String, | |||
parent: ClassLoader, | |||
userClassPathFirst: Boolean) extends ClassLoader with Logging { | |||
userClassPathFirst: Boolean) extends ClassLoader(null) with Logging { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add comment over this to explain why we are setting a null
value to parent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added to the description of this class
assert(scalaDir.mkdir(), s"Failed to create 'scala' directory in $tempDir1") | ||
|
||
// move the generated class into scala dir | ||
val filename = "Option.class" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use createCompiledClass()
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's because createCompiledClass doesn't handle package name, it needs to put the generated class under 'scala' package and also it needs to move the generated class to 'scala' folder.
Since createCompiledClass is widely used, I didn't want to touch it just for this very unique situation.
Also cc @cloud-fan |
In the jira, you mentioned |
Test build #78368 has finished for PR 17074 at commit
|
SPARK-18646 is not the one I created, SPARK-19675 is the one I filed. But it kept getting closed again and again, so I linked my PR to SPARK-18646 which is still open. But I think I know what it means. The author of that jira suggested that the fix would be to pass parent class loader to the constructor. If we call ExecutorClassLoader.getParent via debugger or something, it returns SystemClassLoader which is wrong and the person says that it should be the 'parent' class loader. Original change (fa0524f), which introduced this issue, was not to pass parent to the constructor so that this class loader takes precedence over parent class loader, however it looks to me that the change forgot to set parent as null then if system class loader has something different class from the current or parent class loader, problems like this can happen. I'm not sure if I explained well, .. let me know if you need more info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach looks good to merge, and it would be even better if a concise example of how this is causing a failure in production environment could be provided.
@@ -38,13 +38,15 @@ import org.apache.spark.util.{ParentClassLoader, Utils} | |||
* Allows the user to specify if user class path should be first. | |||
* This class loader delegates getting/finding resources to parent loader, | |||
* which makes sense until REPL never provide resource dynamically. | |||
* This class does not set parent classloader since this class loader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about rephrase to:
[[ClassLoader]] will preferentially load class from parent. Only when parent is null or the load failed, that it will call the overridden `findClass` function. To avoid the potential issue caused by loading class using inappropriate class loader, we should set the parent of ClassLoader to null, so that we can fully control which class loader is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add a empty line before this statement.
ping @taroplus |
…oader ## What changes were proposed in this pull request? `ClassLoader` will preferentially load class from `parent`. Only when `parent` is null or the load failed, that it will call the overridden `findClass` function. To avoid the potential issue caused by loading class using inappropriate class loader, we should set the `parent` of `ClassLoader` to null, so that we can fully control which class loader is used. This is take over of #17074, the primary author of this PR is taroplus . Should close #17074 after this PR get merged. ## How was this patch tested? Add test case in `ExecutorClassLoaderSuite`. Author: Kohki Nishio <taroplus@me.com> Author: Xingbo Jiang <xingbo.jiang@databricks.com> Closes #18614 from jiangxb1987/executor_classloader. (cherry picked from commit e08d06b) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…oader ## What changes were proposed in this pull request? `ClassLoader` will preferentially load class from `parent`. Only when `parent` is null or the load failed, that it will call the overridden `findClass` function. To avoid the potential issue caused by loading class using inappropriate class loader, we should set the `parent` of `ClassLoader` to null, so that we can fully control which class loader is used. This is take over of apache#17074, the primary author of this PR is taroplus . Should close apache#17074 after this PR get merged. ## How was this patch tested? Add test case in `ExecutorClassLoaderSuite`. Author: Kohki Nishio <taroplus@me.com> Author: Xingbo Jiang <xingbo.jiang@databricks.com> Closes apache#18614 from jiangxb1987/executor_classloader. (cherry picked from commit e08d06b) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…oader ## What changes were proposed in this pull request? `ClassLoader` will preferentially load class from `parent`. Only when `parent` is null or the load failed, that it will call the overridden `findClass` function. To avoid the potential issue caused by loading class using inappropriate class loader, we should set the `parent` of `ClassLoader` to null, so that we can fully control which class loader is used. This is take over of apache#17074, the primary author of this PR is taroplus . Should close apache#17074 after this PR get merged. ## How was this patch tested? Add test case in `ExecutorClassLoaderSuite`. Author: Kohki Nishio <taroplus@me.com> Author: Xingbo Jiang <xingbo.jiang@databricks.com> Closes apache#18614 from jiangxb1987/executor_classloader.
What changes were proposed in this pull request?
By default, a class loader will have java's system class loader as its parent, however this ExecutorClassLoader should always resolve classes from given 'paren't class loader.
Problem happens when an application has a class loader structure like below
System-Classloader [ClassA, Runtime]
|
Application-Classloader [ClassA, Spark, Jackson etc etc] : parent
|
Executor Class Loader
The application serializes ClassA using Application class loader however Spark deserializes the class using System Class loader, this leads to various problems.
The change here to pass null to the parent class so that a link between ExecutorClassLoader and SystemClassLoader no longer exists.
For most of situations, the parent class loader is the system class loader so this wouldn't have any impact.
How was this patch tested?
UTs