-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow sex dedup determination #114
base: develop
Are you sure you want to change the base?
allow sex dedup determination #114
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #114 +/- ##
===========================================
+ Coverage 92.98% 94.36% +1.37%
===========================================
Files 8 8
Lines 1468 1419 -49
Branches 252 252
===========================================
- Hits 1365 1339 -26
+ Misses 65 41 -24
- Partials 38 39 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @afaulconbridge . What are your thoughts about adding a new kwarg
instead, perhaps determine_sex_chrom="X"
, which would set a member variable (e.g., _determine_sex_chrom
), that would be used for this and calls to determine_sex
via the sex
property?
Hmm, interesting idea. I like it because it only has to be set in one place, and can be used in many as a default. It would still have to be overridable on specific invocations, but that shouldn't be a problem. I'll have a try with that approach and see how it works in practice. |
@apriha I've implemented the selection of sex detection method as an instance variable, so it can be shared between methods easily. Theres are some further improvements to be made (e.g. a separate threshold to detect low heterozygous X chromosomes as male, number of non-null Y snps as a proportion of the total snps) but they are separate issues for the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @afaulconbridge - I like how the sex determination method and thresholds are accessible via the SNPs
object.
Eventually it'd be nice to have a determination method (e.g., all
) that checks all methods. A dict
could store the sex determined for each method (perhaps accessible by a new property for later inspection).
s._sex_method = "Y" | ||
self.assertEqual(s.sex, "") # no Y chromosome | ||
s._sex_method = "XY" | ||
self.assertEqual(s.sex, "") # can't detect sex, no Y present & X is not male |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the four occurrences of this comment be # can't detect sex, no Y present & X is not female
?
@@ -95,6 +98,12 @@ def __init__( | |||
processes to launch if multiprocessing | |||
rsids : tuple, optional | |||
rsids to extract if loading a VCF file | |||
sex_method : string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sex_method : string | |
sex_method : {"X", "Y", "XY", "YX"} |
@@ -95,6 +98,12 @@ def __init__( | |||
processes to launch if multiprocessing | |||
rsids : tuple, optional | |||
rsids to extract if loading a VCF file | |||
sex_method : string | |||
sex detection method code: "X", "Y", "XY" and "YX supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd briefly describe the behavior of each method here, e.g., to help understand when "XY" should be used vs. "X".
percentage Y SNPs that are not null; above this threshold, Male is determined | ||
chrom : {"X", "Y"} | ||
use X or Y chromosome SNPs to determine sex | ||
def determine_sex(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is only called from the sex
property and it takes no parameters, this could be a private method.
When deduplicating sex chromosomes is a useful thing to do. However, in some conditions the default methods of sex detection can give incorrect results e.g. when homozygous reference alleles are omitted from the source such as in whole-exome or whole-genome. Therefore it is useful to be able to specify the sex detection method used when deciding how to deduplicate the sex chromosomes.
This PR both supports the existing use of a boolean for
deduplicate_XY_chrom
as well as using a String that can be passed todetermine_sex()
as thechrom
parameter.( I'll probably have another PR soon for improvements to
determine_sex()
itself. )