-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ADAM-1011] Refactor to add GenomicRDDs for all Avro types #1051
Conversation
Test PASSed. |
replaceRdd(tFn(rdd)) | ||
} | ||
|
||
protected def replaceRdd(newRdd: RDD[T]): U | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain (again, sorry) what transform and replaceRdd are for?
In the context of #1040, I was thinking the pattern would be
val foos: FooRDD = sc.loadFoo("...")
val rdd: RDD[Foo] = foos.rdd
rdd.map(//... general Spark RDD methods
foos.adamSpecificMethods() // ADAM-specific methods from FooRDD
foos.saveAsBar()
I.e. that the caller wouldn't mix RDD and FooRDD method calls. If I understand correctly, transform
allows the caller to
foos.adamSpecificMethod().transform(_.map(/*... general Spark RDD methods */))
Is that a common enough pattern in scala? Will this be obvious to clients of our APIs? Will this work when adapted for Java APIs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This riffs off the spark.ml pipelines work: http://spark.apache.org/docs/latest/ml-guide.html#transformers
In:
val foos: FooRDD = sc.loadFoo("...")
val rdd: RDD[Foo] = foos.rdd
rdd.map(//... general Spark RDD methods
foos.adamSpecificMethods() // ADAM-specific methods from FooRDD
foos.saveAsBar()
You'd need to create a new FooRDD
with the product of rdd.map(...)
.
Ping @jpdna @akmorrow13 @ryan-williams for review |
+1 |
* Add NucleotideContigFragmentRDD to replace RDD[NucleotideContigFragment] and updated various ADAMContext functions. * Added test suite for Fasta2ADAM to validate round trip loading.
* Added Feature and Gene RDDs and updated loading functions. * Modified Features2ADAMSuite to use sc.loadFeatures to test end-to-end conversion.
* Added FragmentRDD to replace RDD[Fragment]; refactored load/save methods * Moved toFragments method from implicit on RDD[AlignmentRecord] to AlignmentRecordRDD * Added `-single` option to Fragments2Reads. * Added test suites for Fragments2Reads and Reads2Fragments.
5af2659
to
a4a94ff
Compare
Test PASSed. |
* Made region join implementations package private to `org.bdgenomics.adam.rdd`. * Added support for `broadcastRegionJoin`, `shuffleRegionJoin`, and `filterByOverlappingRegion` to `GenomicRDD`. * Added private `GenericGenomicRDD` class. * Cleaned up `CalculateDepth`, which used the old `broadcastRegionJoin` implementation.
Test PASSed. |
Test FAILed. Build result: FAILUREGitHub pull request #1051 of commit 5fc859b automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > /home/jenkins/git2/bin/git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1051/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 89c7196aa36e4344beceac2be6d127015b394ede # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1051/merge^{commit} # timeout=10Checking out Revision 89c7196aa36e4344beceac2be6d127015b394ede (origin/pr/1051/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 89c7196aa36e4344beceac2be6d127015b394edeFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
5fc859b
to
6ba6686
Compare
Poof! AlignmentRecordRDDFunctions is gone! @heuermh your hunch about this making the Java API much easier to implement seems to be accurate. |
Test PASSed. |
Just |
Test PASSed. |
Almost done! Have all the |
I would say that this is good for final review. The work for #1040 is going to be a single commit and will not be very intrusive. It will touch a lot of files, but not in any sort of important way, and it mostly touches test suites. |
Test PASSed. |
1218d3f
to
5cd09f5
Compare
Test PASSed. |
Nice work! This is a big one, will take a day or two to review. @tdanford @jpdna @akmorrow13 @ryan-williams @laserson @massie Calling in for backup! |
Yeah, with a change this big, I definitely want to get it right. Perhaps we can target having it reviewed by Wednesday in time for the next BDG standup? |
+1 on this based on my review so far. |
5aae6a8
to
9eee9ca
Compare
@heuermh all review comments should be addressed. Thanks for making a last pass! |
* Removed `AlignmentRecordRDDFunctions` and moved all functions into `AlignmentRecordRDD`. Moved test suite along with the main class move. * Removed `countTag` functions and `PrintTags` CLI. * Removed `JavaAlignmentRecordRDD`. The Java API just uses the `AlignmentRecordRDD` now. To do this, I moved the Java friendly save methods and JavaRDD to `GenomicRDD`.
* Removed `NucleotideContigFragmentRDDFunctions` and moved all functions to `NucleotideContigFragmentRDD`. Moved test suite to accompany refactor. * Added Java helper functions, and added functions for loading sequence data from the `JavaADAMContext`. * Refactored JavaADAMContext test code to make it easier to test Java API. * Fixed `ADAMContext.loadSequences` so that it correctly identifies `.fa.gz` and `.fasta.gz` file extensions. * Fixed `ADAMFunSuite.copyResource` so that it correctly copies resource files with compound file extensions.
* Adding documentation to FragmentRDD. * Added Java-friendly save method to FragmentRDD. * Added method for loading Fragments from the `JavaADAMContext`.
* Removed `FeatureRDDFunctions` class and moved all methods into `FeatureRDD`. Renamed test suite to match, and added Java helper functions. * Moved all `adam-*/src/test/resources/features/*` files into their respective `resources` directory. This is necessary due to an issue when using `copyResources` with subdirectories. * Removed `ignore` on test in `org.bdgenomics.adam.cli.Features2ADAMSuite`.
* Removed `VariationRDDFunctions` and moved remaining methods to `VariantContextRDD`. Moved test suite code as well. * Removed abstract class `ADAMSequenceDictionaryRDDAggregator`. * Added Java helper methods to various variation RDDs and accompanying loader methods in `JavaADAMContext`. * Miscellaneous documentation cleanup in `org.bdgenomics.adam.rdd.variation`. Resolves bigdatagenomics#1011.
Test PASSed. |
Test PASSed. |
Test FAILed. Build result: FAILURE[...truncated 24 lines...]Triggering ADAM-prb ? 2.6.0,2.11,1.4.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.4.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.4.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.5.2,centosTriggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.3.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.3.1,centosADAM-prb ? 2.6.0,2.10,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.4.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.3.1,centos completed with result SUCCESSNotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
Jenkins, retest this please. |
Did the pull request rename retrigger J enkins? sorry if so . . . |
Seems so! Alas. |
Can't run too many builds, though, right? |
Test PASSed. |
LGTM. Shall I merge now or did you want to squash some commits first? |
@heuermh I think I'd like to keep the history as is here. I'm cool with this one being merged via the good ol' green merge button. |
|
Thank you, @fnothaft! |
w00t! Thank you, as well @heuermh! |
A bit of work towards #1011. I'll be dropping the rest of #1011 onto here. Not ready for merge yet, but ready for first review.