-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement allele counting methods #2393
Conversation
No tests yet. Is the interface right @benjeffery ? |
Ah, yes, they are useful for #2384. |
They are also useful for some information-theoretic imputation quality metrics that I want to implement soon. |
Codecov Report
@@ Coverage Diff @@
## main #2393 +/- ##
==========================================
- Coverage 93.34% 93.34% -0.01%
==========================================
Files 28 28
Lines 26988 26982 -6
Branches 1246 1245 -1
==========================================
- Hits 25193 25187 -6
Misses 1761 1761
Partials 34 34
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Interface looks good to me, although I'm not sure about raising an error on there being zero samples, what about an empty dict as the return in that case? That's what It would also be nice to reuse |
Yeah, I wasn't sure about that. Another possibility would be to return NaN?
Would it. That seems a mistake to me. I think it should return all the alleles with a count of zero?
I could do that. It seemed a bit more work, but probably about the same. |
Oh yes, I think I coded it wrongly. I suspect it should return a value for everything in |
6d67bde
to
ecae7cf
Compare
I coded this to raise a warning if frequencies are calculated on variant with all missing samples or no samples, but output NaN (which is mathematically the correct thing, I think). Is this right, and should I create a separate |
177a999
to
b5c301c
Compare
This looks very nice. Note: this overlaps with #504. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a mistake in the denominator for frequencies, I think!
Hmm, an exciting segfault here. WTF? Edit - failing due to #2400 |
@hyanwong Did you want to get this in for 0.5.1? |
Maybe release 5.1 first, as we might want to think about how it interacts with @petrelharp's suggestions in #504, and nothing is waiting on this PR. If @szhan wants to use it for #2384, he can simply wait until we decide on the right API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, implementation comment. Should also consider options in the stdlib for representating frequency mapping
4d9aec2
to
1dcad55
Compare
I had a hunt around and couldn't see anything terribly obvious. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Needs a few explicit tests, as it's quite hard to see what's really being tested when checking against the ts_fixture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Can we merge this now? It would be useful for @szhan I think (and @benjeffery is on holiday) |
Merging - I'm assuming the changes Ben requested have been made? (Not obvious what they were here, I came in late to the conversation) |
bincounts = np.bincount(self.genotypes) | ||
for i, allele in enumerate(self.alleles): | ||
counts[allele] = bincounts[i] if i < len(bincounts) else 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Late here, but I think you could just do this?
bincounts = np.bincount(self.genotypes, minlength=self.num_alleles)
for i, allele in enumerate(self.alleles):
counts[allele] = bincounts[i]
This is excellent! |
Description
Add
variant.num_missing
,variant.counts()
, andvariant.frequencies()
. Ping @szhan - is this useful for #2384 ?Fixes #2390
PR Checklist: