Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Commit

Permalink
Fusion Filtering ("other" reading frame ) edits (#553)
Browse files Browse the repository at this point in the history
* edits as per issue #552

* add IGH@/IGL@

* Update analyses/fusion_filtering/run_fusion_merged.sh

add TODO

Co-Authored-By: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>

Co-authored-by: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>
  • Loading branch information
kgaonkar6 and jaclyn-taroni authored Feb 21, 2020
1 parent 329bbe3 commit 31fe795
Show file tree
Hide file tree
Showing 9 changed files with 22,738 additions and 20,216 deletions.
25 changes: 12 additions & 13 deletions analyses/fusion_filtering/04-project-specific-filtering.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,11 @@ params:
input: string
dataStranded:
label: "Input filtered fusion dataframe"
value: scratch/standardFusionStrandedExp_QC_expression_GTExComparison_annotated.RDS
value: scratch/standardFusionStrandedExp_QC_expression_filtered_annotated.RDS
input: file
dataPolya:
label: "Input filtered fusion dataframe"
value: scratch/standardFusionPolyaExp_QC_expression_GTExComparison_annotated.RDS
value: scratch/standardFusionPolyaExp_QC_expression_filtered_annotated.RDS
input: file
numCaller:
label: "Least Number of callers to have called fusion"
Expand Down Expand Up @@ -111,7 +111,6 @@ table(fusion_calls$Caller)
# aggregate caller
fusion_caller.summary <- fusion_calls %>%
dplyr::filter(Fusion_Type != "other") %>%
dplyr::select(Sample,FusionName,Caller,Fusion_Type) %>%
group_by(FusionName, Sample ,Fusion_Type) %>%
unique() %>%
Expand All @@ -122,17 +121,20 @@ fusion_caller.summary <- fusion_calls %>%
fusion_calls <- fusion_calls %>%
# remove local rearrangement/adjacent genes
dplyr::filter(!grepl("LOCAL_REARRANGEMENT|LOCAL_INVERSION",annots))
#to add aggregated caller from fusion_caller.summary
fusion_calls<-fusion_calls %>%
dplyr::filter(Fusion_Type != "other") %>% dplyr::select(-Caller,-annots) %>%
dplyr::select(-Caller,-annots) %>%
left_join(fusion_caller.summary,by=(c("Sample","FusionName","Fusion_Type"))) %>%
dplyr::select(-JunctionReadCount,-SpanningFragCount,-Confidence,-LeftBreakpoint,-RightBreakpoint) %>% unique()
#merge with histology file
fusion_calls<-merge(fusion_calls,clinical,by.x="Sample",by.y="Kids_First_Biospecimen_ID")
# filter for putative driver genes and mutifused genes per sample
putative_driver_annotated_fusions <- fusion_calls %>%
dplyr::filter(!is.na(Gene1A_anno) | !is.na(Gene1B_anno) | !is.na(Gene2A_anno) | !is.na(Gene2B_anno)) %>%
unique()
Expand All @@ -141,7 +143,9 @@ fusion_calls<-merge(fusion_calls,clinical,by.x="Sample",by.y="Kids_First_Biospec


```{r}
# Gene fusion should be in-frame
# Gene fusion should be in-frame/frameshift
fusion_calls<-fusion_calls %>%
dplyr::filter(Fusion_Type != "other")
# AND
#
# 1. Called by at least n callers
Expand Down Expand Up @@ -237,15 +241,10 @@ QCGeneFiltered_recFusionUniq<-QCGeneFiltered_recFusion %>%


```{r}
# filter for putative driver genes and mutifused genes per sample
putative_driver_annotated_fusions <- fusion_calls %>%
dplyr::filter(!is.na(Gene1A_anno) | !is.na(Gene1B_anno) | !is.na(Gene2A_anno) | !is.na(Gene2B_anno)) %>%
unique()
# merge putative annotated oncogenic and scavenged back non-oncogenic annotated, recurrent fusions
putative_driver_fusions<-rbind(QCGeneFiltered_recFusionUniq,putative_driver_annotated_fusions) %>%
unique() %>% select (-broad_histology) %>%
unique() %>% dplyr::select (-broad_histology) %>%
as.data.frame()
write.table(putative_driver_fusions,file.path(outputfolder,"pbta-fusion-putative-oncogenic.tsv"),sep="\t",quote=FALSE,row.names = FALSE)
Expand Down
2 changes: 1 addition & 1 deletion analyses/fusion_filtering/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ We also gather counts for recurrent fusions and fused genes found in more than 3
* pbta-gene-expression-rsem-fpkm.stranded.rds : aggregated stranded fpm data

#### Inputs used as reference
* genelistreference.txt : known kinases, oncogenes, tumor suppressors, curated transcription factors [@doi:10.1016/j.cell.2018.01.029], COSMIC Cancer Gene Census list[https://cancer.sanger.ac.uk/census] . MYBL1 [@doi:10.1073/pnas.1300252110], SNCAIP [@doi:10.1038/nature11327], FOXR2 [@doi:10.1016/j.cell.2016.01.015], TTYH1 [@doi:10.1038/ng.2849], and TERT [@doi:10.1038/ng.3438; @doi:10.1002/gcc.22110; @doi:10.1016/j.canlet.2014.11.057; @doi:10.1007/s11910-017-0722-5] were added to the oncogene list and BCOR [@doi:10.1016/j.cell.2016.01.015] and QKI [@doi:10.1038/ng.3500] were added to the tumor suppressor gene list based on pediatric cancer literature review.
* genelistreference.txt : known kinases, oncogenes, tumor suppressors, curated transcription factors [@doi:10.1016/j.cell.2018.01.029], COSMIC Cancer Gene Census list[https://cancer.sanger.ac.uk/census] . MYBL1 [@doi:10.1073/pnas.1300252110], SNCAIP [@doi:10.1038/nature11327], FOXR2 [@doi:10.1016/j.cell.2016.01.015], TTYH1 [@doi:10.1038/ng.2849], and TERT [@doi:10.1038/ng.3438; @doi:10.1002/gcc.22110; @doi:10.1016/j.canlet.2014.11.057; @doi:10.1007/s11910-017-0722-5] were added to the oncogene list and BCOR [@doi:10.1016/j.cell.2016.01.015] and QKI [@doi:10.1038/ng.3500] were added to the tumor suppressor gene list based on pediatric cancer literature review. IGH-@,IGH@ , IGL-@ and IGL@ were also added to reference list as oncogenic genes because StarFusion output contains these gene symbols instead of IGL/IGH as per public databases.
* fusionreference.txt : known TCGA fusions
* Brain_FPKM_hg38_matrix.txt.zip : GTex brain samples FPKM data
The code to generate genelistreference.txt and fusionreference.txt is available here: https://gist.github.com/kgaonkar6/02b3fbcfeeddfa282a1cdf4803704794#file-format_reference_gene_list-r
Expand Down
4 changes: 4 additions & 0 deletions analyses/fusion_filtering/references/genelistreference.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6752,3 +6752,7 @@
"6751" "GAK" "PfamKinase" "Kinase"
"6752" "PIK3C2B" "PfamKinase" "Kinase"
"6753" "HUNK" "PfamKinase" "Kinase"
"6754" "IGH@" "addedToallOnco_Feb2017.tsv" "Oncogene"
"6755" "IGH-@" "addedToallOnco_Feb2017.tsv" "Oncogene"
"6756" "IGL@" "addedToallOnco_Feb2017.tsv" "Oncogene"
"6757" "IGL-@" "addedToallOnco_Feb2017.tsv" "Oncogene"
Loading

0 comments on commit 31fe795

Please sign in to comment.