Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Added methods for sex determination tool #40

Merged
merged 15 commits into from
Sep 16, 2019
10 changes: 10 additions & 0 deletions content/03.methods.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,3 +87,13 @@ We annotated putative driver fusions and prioritized fusions lists with kinases,
We also added chimerDB [@doi:10.1093/nar/gkw1083] annotations to both driver and prioritized fusion list.

### Clinical Data Harmonization

#### Sex Determination

We wrote a sex determination tool for sequenced DNA samples to determine concordance to reported gender and identify contaminated samples.
We set up a tool using CWL on Cavatica to identify sex based on read coverage on Chromosome X and Chromosome Y.
We used the idxstats utility from SAMTOOLS [@pmid:19505943] to generate read lengths, the number of mapped reads, and the corresponding chromosomal location.
We calculated the normalized read counts for the X and Y chromosome using the generated statistics and the fraction of total normalized X and Y chromosome reads that were on the Y chromosome.
We determined that if this fraction was less than 0.2, the sample's germline sex prediction was Female by reviewing the fractions for samples whose had reported genders.
If this fraction was between 0.4 and 0.6, the sample's germline sex prediction was Male.
Any remaining uncategorized fraction was marked Unknown.