Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up org.bdgenomics.adam.rich package. #1263

Merged
merged 3 commits into from
Nov 16, 2016

Conversation

fnothaft
Copy link
Member

Towards #1083.

  • Added scaladoc to all classes/methods that were missing docs.
  • Deprecated ReferenceSequenceContext
  • Deprecated implicit conversions to/from RichAlignmentRecord and eliminated dead code.
  • Made RichCigar package private to org.bdgenomics.adam. It is only used in org.bdgenomics.algorithms.consensus. I considered moving it over, but decided to keep all "enriched" classes in org.bdgenomics.adam.rich. I did eliminate the RichCigar singleton and make RichCigar a case class, while moving the only non-apply method over to org.bdgenomics.adam.algorithms.consensus.ConsensusGenerator as a private method.

b4f0eeb deletes org.bdgenomics.adam.rich.RichGenotype, which is unused. I wasn't sure if we wanted to do this, as RichGenotype is kinda useful. Thoughts?

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1593/

Build result: FAILURE

[...truncated 3 lines...]Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prbWiping out workspace first.Cloning the remote Git repositoryCloning repository https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1263/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 55addc94198bce4b105c33c15976d7529cb63dc3 # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1263/merge^{commit} # timeout=10Checking out Revision 55addc94198bce4b105c33c15976d7529cb63dc3 (origin/pr/1263/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 55addc94198bce4b105c33c15976d7529cb63dc3First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@fnothaft fnothaft mentioned this pull request Nov 12, 2016
6 tasks
@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1594/

Build result: FAILURE

[...truncated 3 lines...]Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prbWiping out workspace first.Cloning the remote Git repositoryCloning repository https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1263/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains d724aca562c48dd2edb6f8ee38adb65ce39c02b8 # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1263/merge^{commit} # timeout=10Checking out Revision d724aca562c48dd2edb6f8ee38adb65ce39c02b8 (origin/pr/1263/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f d724aca562c48dd2edb6f8ee38adb65ce39c02b8First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1596/
Test PASSed.

@@ -80,7 +80,7 @@ class VariantContextConverterSuite extends ADAMFunSuite {

assert(adamVC.genotypes.size === 0)

val variant = adamVC.variant
val variant = adamVC.variant.variant
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I have it right in that these nested accesses weren't necessary before because of the implicit conversion?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

*/

case class ReferenceSequenceContext(pos: Option[ReferencePosition], referenceBase: Option[Char], cigarElement: CigarElement, cigarElementOffset: Int)
@deprecated("don't use ReferenceSequenceContext in new development",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the time limit on this deprecation? Is there a reason not to remove it now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've got a ticket #577 for this. The long and short of it is that there's a lot of nasty code that came in from DecadentRead when BQSR was added, and it is a fairly involved refactor.

Copy link
Member

@heuermh heuermh Nov 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'm good since there is an issue to track

@@ -92,7 +92,7 @@ object VariantContext {
* @return VariantContext corresponding to the data above.
*/
def apply(kv: (ReferencePosition, Variant, Iterable[Genotype], Option[DatabaseVariantAnnotation])): VariantContext = {
new VariantContext(kv._1, kv._2, kv._3, kv._4)
new VariantContext(kv._1, RichVariant(kv._2), kv._3, kv._4)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to always wrap in RichVariant or should it be left to a later map after load if the caller needs it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could drop the rich variant, but I'm not entirely sure.


@deprecated("Use explicit coversion wherever possible in new development.",
since = "0.21.0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, what is the time limit on these deprecations? Is there a reason not to remove them now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above/#577.


def ploidy: Int = genotype.getAlleles.size

def getType: GenotypeType = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is so much difficulty explaining what the different types mean I don't find GenotypeType very useful.

Hail defines even more genotype flags

genotype field definition
isHomRef true if this call is 0/0
isHet true if this call is heterozygous
isHetRef true if this call is 0/k with k>0
isHetNonRef true if this call is j/k with j>0
isHomVar true if this call is j/j with j>0
isCalledNonRef true if either isHet or isHomVar is true
isCalled true if the genotype is not ./.
isNotCalled true if the genotype is ./.

and still doesn't distinguish partial no-calls.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So are you saying we should get rid of GenotypeType? I would generally agree.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As is, yeah. If these flags (or even extended ones as in Hail) turn out to be useful, then perhaps they could be added to the schema as flags instead of an enum.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

fnothaft added a commit to fnothaft/adam that referenced this pull request Nov 15, 2016
Along with bigdatagenomics#1263 and bigdatagenomics#1264, this resolves bigdatagenomics#1083.

* Removing unused org.bdgenomics.adam.models.ReadBucket class.
* Move org.bdgenomics.adam.models.ReferencePositionPair and
  org.bdgenomics.adam.models.SingleReadBucket in to org.bdgenomics.adam.rdd.read
  and make package private.
* Clean up duplicated methods and methods that were incorrectly in companion
  singleton for SequenceDictionary and ReadGroupDictionary.
* Removed all SamReader references.
* Make writable file headers private to ADAM.
* Eliminated manual VCF parsing code in SnpTable.
* Cleaned up scaladoc for all classes and singleton objects.
* Moved `NonoverlappingRegions` test code out of `InnerBroadcastRegionJoinSuite`.
fnothaft added a commit to fnothaft/adam that referenced this pull request Nov 15, 2016
Along with bigdatagenomics#1263 and bigdatagenomics#1264, this resolves bigdatagenomics#1083.

* Removing unused org.bdgenomics.adam.models.ReadBucket class.
* Move org.bdgenomics.adam.models.ReferencePositionPair and
  org.bdgenomics.adam.models.SingleReadBucket in to org.bdgenomics.adam.rdd.read
  and make package private.
* Clean up duplicated methods and methods that were incorrectly in companion
  singleton for SequenceDictionary and ReadGroupDictionary.
* Removed all SamReader references.
* Make writable file headers private to ADAM.
* Eliminated manual VCF parsing code in SnpTable.
* Cleaned up scaladoc for all classes and singleton objects.
* Moved `NonoverlappingRegions` test code out of `InnerBroadcastRegionJoinSuite`.
@fnothaft
Copy link
Member Author

fnothaft commented Nov 15, 2016

Rebased. @heuermh can you review and see if there are any changes needed? As far as I can tell, we are good to go.

fnothaft added a commit to fnothaft/adam that referenced this pull request Nov 15, 2016
Along with bigdatagenomics#1263 and bigdatagenomics#1264, this resolves bigdatagenomics#1083.

* Removing unused org.bdgenomics.adam.models.ReadBucket class.
* Move org.bdgenomics.adam.models.ReferencePositionPair and
  org.bdgenomics.adam.models.SingleReadBucket in to org.bdgenomics.adam.rdd.read
  and make package private.
* Clean up duplicated methods and methods that were incorrectly in companion
  singleton for SequenceDictionary and ReadGroupDictionary.
* Removed all SamReader references.
* Make writable file headers private to ADAM.
* Eliminated manual VCF parsing code in SnpTable.
* Cleaned up scaladoc for all classes and singleton objects.
* Moved `NonoverlappingRegions` test code out of `InnerBroadcastRegionJoinSuite`.
@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1609/

Build result: FAILURE

[...truncated 3 lines...]Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prbWiping out workspace first.Cloning the remote Git repositoryCloning repository https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1263/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains b68d8c0 # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1263/merge^{commit} # timeout=10Checking out Revision b68d8c0 (origin/pr/1263/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f b68d8c01403f41aa5ffe2a99c0ea89446166e21dFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

fnothaft added a commit to fnothaft/adam that referenced this pull request Nov 16, 2016
Along with bigdatagenomics#1263 and bigdatagenomics#1264, this resolves bigdatagenomics#1083.

* Removing unused org.bdgenomics.adam.models.ReadBucket class.
* Move org.bdgenomics.adam.models.ReferencePositionPair and
  org.bdgenomics.adam.models.SingleReadBucket in to org.bdgenomics.adam.rdd.read
  and make package private.
* Clean up duplicated methods and methods that were incorrectly in companion
  singleton for SequenceDictionary and ReadGroupDictionary.
* Removed all SamReader references.
* Make writable file headers private to ADAM.
* Eliminated manual VCF parsing code in SnpTable.
* Cleaned up scaladoc for all classes and singleton objects.
* Moved `NonoverlappingRegions` test code out of `InnerBroadcastRegionJoinSuite`.
fnothaft added a commit to fnothaft/adam that referenced this pull request Nov 16, 2016
Along with bigdatagenomics#1263 and bigdatagenomics#1264, this resolves bigdatagenomics#1083.

* Removing unused org.bdgenomics.adam.models.ReadBucket class.
* Move org.bdgenomics.adam.models.ReferencePositionPair and
  org.bdgenomics.adam.models.SingleReadBucket in to org.bdgenomics.adam.rdd.read
  and make package private.
* Clean up duplicated methods and methods that were incorrectly in companion
  singleton for SequenceDictionary and ReadGroupDictionary.
* Removed all SamReader references.
* Make writable file headers private to ADAM.
* Eliminated manual VCF parsing code in SnpTable.
* Cleaned up scaladoc for all classes and singleton objects.
* Moved `NonoverlappingRegions` test code out of `InnerBroadcastRegionJoinSuite`.
* Added scaladoc to all classes/methods that were missing docs.
* Deprecated `ReferenceSequenceContext`
* Deprecated implicit conversions to/from `RichAlignmentRecord` and eliminated
  dead code.
* Made `RichCigar` package private to `org.bdgenomics.adam`. It is only used
  in `org.bdgenomics.algorithms.consensus`. I considered moving it over, but
  decided to keep all "enriched" classes in `org.bdgenomics.adam.rich`. I did
  eliminate the `RichCigar` singleton and make `RichCigar` a case class, while
  moving the only non-`apply` method over to
  `org.bdgenomics.adam.algorithms.consensus.ConsensusGenerator` as a private
  method.
@fnothaft
Copy link
Member Author

Rebased and fixed conflicts and compile issue.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1613/
Test PASSed.

@heuermh heuermh merged commit f5cd15e into bigdatagenomics:master Nov 16, 2016
@heuermh
Copy link
Member

heuermh commented Nov 16, 2016

Thank you, @fnothaft!

heuermh pushed a commit that referenced this pull request Nov 16, 2016
Along with #1263 and #1264, this resolves #1083.

* Removing unused org.bdgenomics.adam.models.ReadBucket class.
* Move org.bdgenomics.adam.models.ReferencePositionPair and
  org.bdgenomics.adam.models.SingleReadBucket in to org.bdgenomics.adam.rdd.read
  and make package private.
* Clean up duplicated methods and methods that were incorrectly in companion
  singleton for SequenceDictionary and ReadGroupDictionary.
* Removed all SamReader references.
* Make writable file headers private to ADAM.
* Eliminated manual VCF parsing code in SnpTable.
* Cleaned up scaladoc for all classes and singleton objects.
* Moved `NonoverlappingRegions` test code out of `InnerBroadcastRegionJoinSuite`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants