-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
-add new callers and regions -improve existing callers -add grch37/hg19 support
- Loading branch information
1 parent
2fe2cdc
commit 63cc3ff
Showing
99 changed files
with
28,623 additions
and
60,484 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# F8 | ||
|
||
In F8, our goal is to detect Intron 22 inversion involved in [Hemophilia A](https://www.ncbi.nlm.nih.gov/books/NBK1404/). Paraphase does this by phasing haplotypes for the homology region where inversion breakpoints happen (a region that encodes F8A1, F8A2 and F8A3), and then checking the sequences flanking each haplotype for signals suggesting inversion. Another possible structural variant (SV) in F8, deletion of Exon1-22 (whose breakpoint also falls into the same homology region), is also called by Paraphase, although this variant is relatively easy to call with a standard depth based CNV caller. | ||
|
||
## Fields in the `json` file | ||
|
||
- `sv_called`: reports deletion between int22h-1 and int22h-2 (which suggests Exon1-22 deletion), or inversion between int22h-1 and int22h-3 (which suggests Intron 22 inversion) | ||
|
||
Note that the inversion and the deletion are also reported in the VCF as SVs. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# HBA1/HBA2 | ||
|
||
For this [region](https://www.ncbi.nlm.nih.gov/books/NBK1435/), Paraphase calls the total copy number of HBA1 and HBA2. Variants are called in the VCF, against HBA2 reference sequence. | ||
|
||
## Fields in the `json` file | ||
|
||
- `genotype`: reports the genotype of this family. Possible alleles include `aa`, `aaa` (duplication), `-a` (deletion) or `--` (double deletion). | ||
- `alleles_final`: when possible, different copies of HBA are phased into alleles with read based phasing. | ||
|
||
## Visualizing haplotypes | ||
|
||
To visualize phased haplotypes, load the output bam file in IGV, group reads by the `HP` tag and color alignments by `YC` tag. Green and purple represent two alleles, i.e. all haplotypes in green are on one one allele and all haplotypes in purple are on the other allele. | ||
|
||
Reads in gray are either unassigned or consistent with more than one possible haplotype. When two haplotypes are identical over a region, there can be more than one haplotype consistent with a read, and the read is randomly assigned to a haplotype and colored in gray. | ||
|
||
![HBA example](figures/HBA.png) | ||
|
||
- The top panel shows a sample with two copies of HBA1 and two copies of HBA2, one on each allele. | ||
- The bottom panel shows a sample with a `-a` allele, where there is a deletion, leaving only one copy of HBA (`hba_del_hap1`). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
# IKBKG | ||
|
||
In this region, Paraphase calls small variants in [IKBKG](https://www.ncbi.nlm.nih.gov/books/NBK1472/). In addition, there is a known 11.7kb deletion that can occur in either IKBKG or the pseudogene. Paraphase calls this deletion and locates it to IKBKG or the pseudogene. | ||
|
||
## Fields in the `json` file | ||
|
||
- `deletion_haplotypes`: haplotypes carrying the 11.7kb deletion | ||
|
||
Note that this deletion is also reported in the VCF as a structural variant (SV). | ||
|
||
## Visualizing haplotypes | ||
|
||
To visualize phased haplotypes, load the output bam file in IGV, group reads by the `HP` tag and color alignments by `YC` tag. Reads are realigned to IKBKG. Green represents phased copies on one allele if there is duplication of the 11.7kb region. | ||
|
||
Reads in gray are either unassigned or consistent with more than one possible haplotype. When two haplotypes are identical over a region, there can be more than one haplotype consistent with a read, and the read is randomly assigned to a haplotype and colored in gray. | ||
|
||
![IKBKG examples](figures/IKBKG.png) | ||
|
||
- In this set of examples, the top panel shows a female sample without structural variants, i.e. two copies of IKBKG and two copies of IKBKGP1. | ||
- The middle panel shows a female sample with a copy of IKBKGP1 that carries the 11.7kb deletion. | ||
- The bottom panel shows a female sample where there is a duplication (duplicated three times) of the 11.7kb region on a copy of IKBKGP1 (in green). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
# NCF1 | ||
|
||
NCF1 is differentiated from its pseudogenes NCF1B and NCF1C by the presence of GT at the begining of Exon 2 ([c.75_76del or p.Tyr26fs](https://www.ncbi.nlm.nih.gov/clinvar/variation/2249/)). | ||
|
||
## Fields in the `json` file | ||
|
||
- `total_cn`: total copy number of the family | ||
- `gene_cn`: copy number of the gene of interest, i.e. NCF1 | ||
- `two_copy_haplotypes`: haplotypes that are present in two copies based on depth. This happens when (in a small number of cases) two haplotypes are identical and we infer that there exist two of them instead of one by checking the read depth. | ||
|
||
## Visualizing haplotypes | ||
|
||
To visualize phased haplotypes, load the output bam file in IGV, group reads by the `HP` tag and color alignments by `YC` tag. Reads are realigned to the main gene, NCF1. | ||
|
||
Reads in blue are confidently consistent with a single haplotype. Reads in gray are either unassigned or consistent with more than one possible haplotype. When two haplotypes are identical over a region, there can be more than one haplotype consistent with a read, and the read is randomly assigned to a haplotype and colored in gray. | ||
|
||
![NCF1 example](figures/NCF1.png) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# NEB | ||
|
||
Paraphase resolves the triplicate (TRI) repeat region in NEB, where copy number variants are common. | ||
|
||
## Fields in the `json` file | ||
|
||
- `total_cn`: total copy number of the triplicate repeat | ||
- `two_copy_haplotypes`: haplotypes that are present in two copies based on depth. This happens when (in a small number of cases) two haplotypes are identical and we infer that there exist two of them instead of one by checking the read depth. | ||
- `alleles_final`: when possible, different copies of TRI are phased into alleles with read based phasing. | ||
|
||
## Visualizing haplotypes | ||
|
||
To visualize phased haplotypes, load the output bam file in IGV, group reads by the `HP` tag and color alignments by `YC` tag. Reads are realigned to the first copy of TRI in the reference genome. | ||
|
||
Green and purple represent two alleles, i.e. all haplotypes in green are on one one allele and all haplotypes in purple are on the other allele. Reads in gray are either unassigned or consistent with more than one possible haplotype. When two haplotypes are identical over a region, there can be more than one haplotype consistent with a read, and the read is randomly assigned to a haplotype and colored in gray. | ||
|
||
![NEB example](figures/NEB.png) | ||
|
||
This example has three copies of TRI on one allele and another three copies of TRI on the other allele. |
Oops, something went wrong.