-
Notifications
You must be signed in to change notification settings - Fork 83
Reciprocal and kinase #821
Reciprocal and kinase #821
Conversation
Are the changes in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good! I had a few questions before I approve.
# check for fusions have reciprocal fusions in the same Sample | ||
# works only for GeneY -- GeneX ; GeneX -- GeneY matches | ||
recirpocal_fusion <- function(FusionName,Sample,standardFusioncalls ){ | ||
Gene1A <- strsplit(FusionName,"--")[[1]][1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remind me why we're not looking at the Gene2A
and Gene2B
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For intergenic fusions, which has Gene1A/Gene2A--Gene1B or Gene1A--Gene1B/Gene2B and similar fusions it get's a little complicated since then we need to check the distance is the same between Gene1A/Gene2A in the reciprocal so I've just stuck to fusions between genes.
Co-authored-by: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>
Co-authored-by: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>
Co-authored-by: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>
Co-authored-by: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>
This is expected because of the different sample selection issues by running sample() when we have multiple samples per Kids_First_Participant_ID. |
I would have expected sorting + setting a seed to have prevented that, but there may be some subtlety I'm missing. Either way, beyond the scope of this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Thanks for the review @jaclyn-taroni ! |
Purpose/implementation Section
What scientific question is your analysis addressing?
Add kinase domain retention status for fused genes since this information will be needed to filter BRAF and other kinase gene fusions that we use for LGAT subtyping. We also want to check if the fusion is a reciprocal that is if the fusion callers called GeneX--GeneY and GeneY--GeneX.
What was your approach?
First, I added the
LeftBreakpoint
andRightBreakpoint
column since we need this information to annotate domain retention. (Earlier ,we had removed these columns so that 1 unique fusion row per Sample could be retained)Then, we will be using fusion_driver function from annoFuse to add kinase domain status per Gene1A (5
Gene) and Gene1B (3
Gene) in columnsDomainRetainedGene1A
andDomainRetainedGene1B
.For each kinase gene the Domain retention annotation will be as followsWithin the function the base function pfam domain annotation annotates the retention status of domains per breakpoint and domain ID & Location information from :
For reciprocal status I've added a function to add that information as logical values to a separate column
reciprocal_exists
. For sample in Sample BS_044XZ8ST we have reciprocal fusionANTXR1--BRAF
andBRAF --ANTXR1
so these fusions will be reciprocal_exists== TRUEWhat GitHub issue does your pull request address?
#812
Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.
Which areas should receive a particularly close look?
I've tried to organize the chunks in 04-project-specific-filtering.Rmd so that there are minimal code changes, please let me know if it is easy enough to follow.
Is there anything that you want to discuss further?
Since I've now added the
LeftBreakpoint
andRightBreakpoint
columns to pbta-fusion-putative-oncogenic.tsv there can be multiple rows per FusionName and Sample if they have multiple breakpoints for the fusion. It doesn't affect the *recurrent-fusion-byhistology.tsv, *recurrent-fused-genes-byhistology.tsv, *recurrent-fusion-bysamplee.tsv and *recurrent-fused-genes-bysampletsv but might affect some other modules that don't unique for FusionName SampleIs the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?
Yes
Results
What types of results are included (e.g., table, figure)?
table
What is your summary of the results?
Kinase domain retention information is added per kinase gene fusion
Reproducibility Checklist
Documentation Checklist
README
and it is up to date.analyses/README.md
and the entry is up to date.