-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added a check to ensure previous caching information doesn't affect referenceConfidenceCalls #5911
Added a check to ensure previous caching information doesn't affect referenceConfidenceCalls #5911
Conversation
…e Indel cache has been cleared from underlying reads
…e Indel cache has been cleared from underlying reads
Codecov Report
@@ Coverage Diff @@
## master #5911 +/- ##
==============================================
- Coverage 86.84% 86.823% -0.018%
- Complexity 32326 32348 +22
==============================================
Files 1991 1993 +2
Lines 149342 149470 +128
Branches 16482 16505 +23
==============================================
+ Hits 129689 129774 +85
- Misses 13646 13679 +33
- Partials 6007 6017 +10
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Back to @jamesemery with a question
@@ -189,6 +189,10 @@ public ReferenceConfidenceModel(final SampleList samples, | |||
final int ploidy = ploidyModel.samplePloidy(0); // the first sample = the only sample in reference-confidence mode. | |||
|
|||
final SimpleInterval refSpan = activeRegion.getSpan(); | |||
// Ensuring that if the underlying reads have any cached indel informativeness data it gets purged before calling the next region. | |||
if (USE_CACHED_READ_INDEL_INFORMATIVENESS_VALUES) { | |||
readLikelihoods.sampleReads(0).forEach(r -> r.clearTransientAttribute(INDEL_INFORMATIVE_BASES_CACHE_ATTRIBUTE_NAME)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is sampleReads(0)
correct here? Do we have to worry about the multi-sample case in this context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behavior is acceptable by virtue of the fact that above there is the following line: Utils.validateArg(readLikelihoods.numberOfSamples() == 1, () -> "readLikelihoods must contain exactly one sample but it contained " + readLikelihoods.numberOfSamples());
Furthermore when we create pileups on the next in AssemblyBasedCallerUtils.getPileupsOverReference()
this is the exact mechanism by which we access the reads.
Note to self: move the purging of the cache to the end of execution and test that leftover cache values are never present after invoking referenceConfidenceModel |
@droazen I have pushed the cache removal step down to a more testable point in the code and added the assertion to the existing testing infrastructure. Can you take a quick look at this branch so it can go in at some point? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Back to @jamesemery with another round of comments.
@@ -217,6 +217,11 @@ public ReferenceConfidenceModel(final SampleList samples, | |||
} | |||
} | |||
|
|||
// Ensuring that we remove any indel informativeness data we may have attached to the underlying reads for caching purposes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add an additional comment explaining briefly why it is necessary to clear the cached data.
@@ -508,6 +508,10 @@ public void testRefConfidenceBasic(final int nReads, final int extension) { | |||
final IndependentSampleGenotypesModel genotypingModel = new IndependentSampleGenotypesModel(); | |||
final List<Integer> expectedDPs = Collections.nCopies(data.getActiveRegion().getSpan().size(), nReads); | |||
final List<VariantContext> contexts = model.calculateRefConfidence(data.getRefHap(), haplotypes, data.getPaddedRefLoc(), data.getActiveRegion(), likelihoods, ploidyModel, calls, false, Collections.emptyList()); | |||
// Asserting that none of the reads after calculateRefConfidence have indel informativeness caching values attached. | |||
for (GATKRead read : data.getActiveRegion().getReads()) { | |||
Assert.assertTrue(read.getTransientAttribute(ReferenceConfidenceModel.INDEL_INFORMATIVE_BASES_CACHE_ATTRIBUTE_NAME) == null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Assert.assertNull()
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, should checking for the clearing of this attribute be its own test case, instead of piggybacking on other test cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried about clearing the cache in every case so I would prefer to assert that it isn't leaving the reads cached in any case, thus I attached this to all of our tests.
@@ -529,6 +533,10 @@ public void testRefConfidencePartialReads() { | |||
final List<Integer> expectedDPs = new ArrayList<>(Collections.nCopies(data.getActiveRegion().getSpan().size(), 0)); | |||
for ( int i = start; i < readLen + start; i++ ) expectedDPs.set(i, 1); | |||
final List<VariantContext> contexts = model.calculateRefConfidence(data.getRefHap(), haplotypes, data.getPaddedRefLoc(), data.getActiveRegion(), likelihoods, ploidyModel, calls); | |||
// Asserting that none of the reads after calculateRefConfidence have indel informativeness caching values attached. | |||
for (GATKRead read : data.getActiveRegion().getReads()) { | |||
Assert.assertTrue(read.getTransientAttribute(ReferenceConfidenceModel.INDEL_INFORMATIVE_BASES_CACHE_ATTRIBUTE_NAME) == null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assert.assertNull()
@@ -565,6 +573,10 @@ public void testRefConfidenceWithCalls() { | |||
|
|||
final List<Integer> expectedDPs = Collections.nCopies(data.getActiveRegion().getSpan().size(), nReads); | |||
final List<VariantContext> contexts = model.calculateRefConfidence(data.getRefHap(), haplotypes, data.getPaddedRefLoc(), data.getActiveRegion(), likelihoods, ploidyModel, calls); | |||
// Asserting that none of the reads after calculateRefConfidence have indel informativeness caching values attached. | |||
for (GATKRead read : data.getActiveRegion().getReads()) { | |||
Assert.assertTrue(read.getTransientAttribute(ReferenceConfidenceModel.INDEL_INFORMATIVE_BASES_CACHE_ATTRIBUTE_NAME) == null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assert.assertNull()
@droazen Responded to your comments. Are there lingering objections to getting this branch in? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good now -- merging!
I don't doubt that there could be issues caused by reads with previously filled caches. Ultimately this shouldn't have too significant an impact except in very pathological circumstances with highly repetitive regions or reads that hang beyond a certain length into the next region and happen to have had good looking indel sites without the cirgar actually containing any indels for that read. This should eliminate any of these circumstances entirely so we can be sure the cache is clear before every call.
Fixes #5908