BaseRecalibratorSpark fails on a cluster due to system classloader issue #5979

tomwhite · 2019-06-04T10:42:48Z

The problem is that Spark executors can't rely on the system classloader to load resources. This change falls back to the current classloader if the resource can't be loaded from the system classloader. I've tested it successfully on a cluster.

… find a resource.

tomwhite · 2019-06-04T10:43:03Z

@LeeTL1220 would you mind taking a look at this please (it came about from changes in #5941).

tomwhite · 2019-06-04T10:45:21Z

Stack trace from failed job:

Caused by: org.broadinstitute.hellbender.exceptions.GATKException: Null value when trying to read system resource.  Cannot find: org/broadinstitute/hellbender/tools/copynumber/utils/annotatedinterval/annotated_region_default.config
	at org.broadinstitute.hellbender.utils.io.Resource.getResourceContentsAsFile(Resource.java:90)
	at org.broadinstitute.hellbender.utils.codecs.AnnotatedIntervalCodec.<init>(AnnotatedIntervalCodec.java:55)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at java.lang.Class.newInstance(Class.java:442)
	at org.broadinstitute.hellbender.engine.FeatureManager.getCandidateCodecsForFile(FeatureManager.java:511)
	at org.broadinstitute.hellbender.engine.FeatureManager.getCodecForFile(FeatureManager.java:464)
	at org.broadinstitute.hellbender.engine.FeatureDataSource.getCodecForFeatureInput(FeatureDataSource.java:324)
	at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:304)
	at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:256)
	at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:230)
	at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:214)
	at org.broadinstitute.hellbender.utils.spark.JoinReadsWithVariants.openFeatureSource(JoinReadsWithVariants.java:63)
	at org.broadinstitute.hellbender.utils.spark.JoinReadsWithVariants.lambda$null$0(JoinReadsWithVariants.java:44)
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
	at org.broadinstitute.hellbender.utils.spark.JoinReadsWithVariants.lambda$join$60e5b476$1(JoinReadsWithVariants.java:44)
	at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186)
	at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
	at org.apache.spark.scheduler.Task.run(Task.scala:109)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

codecov · 2019-06-04T11:15:50Z

Codecov Report

Merging #5979 into master will not change coverage.
The diff coverage is 50%.

@@             Coverage Diff             @@
##              master     #5979   +/-   ##
===========================================
  Coverage     86.929%   86.929%           
  Complexity     32721     32721           
===========================================
  Files           2013      2013           
  Lines         151306    151306           
  Branches       16610     16610           
===========================================
  Hits          131529    131529           
  Misses         13720     13720           
  Partials        6057      6057

Impacted Files	Coverage Δ	Complexity Δ
...g/broadinstitute/hellbender/utils/io/Resource.java	`62.963% <50%> (ø)`	`7 <0> (ø)`	⬇️

LeeTL1220

Simple comments, please address.

No need to have me re-review unless you want it. So I have approved the review ...

LeeTL1220 · 2019-06-04T13:19:28Z

src/main/java/org/broadinstitute/hellbender/utils/io/Resource.java

-        if (systemResourceAsStream == null) {
-            throw new GATKException("Null value when trying to read system resource.  Cannot find: " + resourcePath);
+        if (resourceAsStream == null) {
+            resourceAsStream = Resource.class.getClassLoader().getResourceAsStream(resourcePath);


Can't we just use this statement above and eliminate the if statement?

I.e. replace InputStream resourceAsStream = ClassLoader.getSystemResourceAsStream(resourcePath); with
InputStream resourceAsStream = Resource.class.getClassLoader().getResourceAsStream(resourcePath);

I would think that the ClassLoader would be equivalent to Resource.class.getClassLoader()

Apologies if I am forgetting something....

I think you're right. A test seems to be failing now, but I'm not sure it's related. I'll investigate.

LeeTL1220 · 2019-06-04T13:20:28Z

src/main/java/org/broadinstitute/hellbender/utils/io/Resource.java


-        if (systemResourceAsStream == null) {
-            throw new GATKException("Null value when trying to read system resource.  Cannot find: " + resourcePath);
+        if (resourceAsStream == null) {


Can you add a one-line comment why we need this? "For Spark, etc etc"

tomwhite · 2019-06-04T16:19:26Z

Thank you for taking a look @LeeTL1220.

Fall back to the current classloader if the system classloader cannot…

3c184d7

… find a resource.

LeeTL1220 approved these changes Jun 4, 2019

View reviewed changes

Address feedback

0ac76a7

tomwhite merged commit d0d4ca7 into master Jun 5, 2019

tomwhite deleted the tw_resource_classloader_bug branch June 5, 2019 07:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BaseRecalibratorSpark fails on a cluster due to system classloader issue #5979

BaseRecalibratorSpark fails on a cluster due to system classloader issue #5979

tomwhite commented Jun 4, 2019

tomwhite commented Jun 4, 2019

tomwhite commented Jun 4, 2019

codecov bot commented Jun 4, 2019 •

edited

Loading

LeeTL1220 left a comment

LeeTL1220 Jun 4, 2019

tomwhite Jun 4, 2019

LeeTL1220 Jun 4, 2019

tomwhite Jun 4, 2019

tomwhite commented Jun 4, 2019

BaseRecalibratorSpark fails on a cluster due to system classloader issue #5979

BaseRecalibratorSpark fails on a cluster due to system classloader issue #5979

Conversation

tomwhite commented Jun 4, 2019

tomwhite commented Jun 4, 2019

tomwhite commented Jun 4, 2019

codecov bot commented Jun 4, 2019 • edited Loading

Codecov Report

LeeTL1220 left a comment

Choose a reason for hiding this comment

LeeTL1220 Jun 4, 2019

Choose a reason for hiding this comment

tomwhite Jun 4, 2019

Choose a reason for hiding this comment

LeeTL1220 Jun 4, 2019

Choose a reason for hiding this comment

tomwhite Jun 4, 2019

Choose a reason for hiding this comment

tomwhite commented Jun 4, 2019

codecov bot commented Jun 4, 2019 •

edited

Loading