-
Notifications
You must be signed in to change notification settings - Fork 83
Update focal CN file prep to use exons again and cover the consensus SEG case #479
Conversation
Add ploidy!
Also rerun + use $XYFLAG for the consensus step
Rendered version of the notebook here: https://jaclyn-taroni.github.io/openpbta-notebook-concept/02-add-ploidy-consensus.nb.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. The only comments I have are regarding a couple code comments and README documentation.
# of the repository as follows: | ||
# | ||
# Rscript 'analyses/oncoprint-landscape/01-prepare-cn-file.R' | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless this script would work without specifying anything, I suspect this usage example is incomplete?
The mapping is limited to _coding sequences_. | ||
Mapping to cytobands is performed with the [`UCSC hg38 cytoband file`](http://hgdownload.cse.ucsc.edu/goldenpath/hg38/database/cytoBand.txt.gz). | ||
_Note: The decision to implement the `UCSC file` was made based on a comparison done between the cytoband calls in the `org.Hs.eg.db` package and the calls in the `UCSC file`. We found that they disagreed in ~11,800 calls out of ~800,000 and the `UCSC file` contains more cytoband calls._ | ||
* `02-add-ploidy-consensus.Rmd` - This is very similar to the CNVkit file prep. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unsure what this is referring to:similar to CNVKit file prep
can you put a link to what this is referring to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the 01
notebook file name here to make it more clear.
|
||
* `02-rna-expression-validation.R` - This script examines RNA-seq expression levels (RSEM FPKM) of genes that are called as deletions. | ||
* `rna-expression-validation.R` - This script examines RNA-seq expression levels (RSEM FPKM) of genes that are called as deletions. | ||
It is not currently run via the shell script. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the plan for this script then? Is it being dropped or revamped or is it supposed to be called somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plan is to rework it #387
Co-Authored-By: Candace Savonen <cansav09@gmail.com>
Purpose/implementation Section
We used to use exons to annotate CNV data files the prior to #452, which made the switch to using coding regions. That looked like it was too restrictive and inconsistent with what is typically done (some references and quotes supplied by @cbethell below).
Monlong et al. NAR. 2018.:
Bailey et al. Cytogenet Genome Res. 2009.:
What was your approach?
Here we are:
analyses/focal-cn-file-preparation/02-add-ploidy-consensus.Rmd
). There is a lot of overlap with how we prepare the CNVkit file, so this can probably be refactored in a subsequent pull request. This basically means this code was reviewed before in Update CNV segment to gene mapping: support both formats, use GTF, etc. #253 and Add X and Y chromosomes to CNV segment to gene symbol mapping #259.analyses/focal-cn-file-preparation/03-prepare-cn-file.R
file prior to Update/Revampfocal-cn-file-preparation
module #452, with some minor changes (making the option name for SEG files more general): https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/2af3f350a247d66f68fe9305224d937ee0e90a8c/analyses/focal-cn-file-preparation/01-prepare-cn-file.R (This script has been renumbered as part of this pull request.)master
at the moment. We're also skipping the expression step currently.What GitHub issue does your pull request address?
Closes #473; related to #186
Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.
Which areas should receive a particularly close look?
This looks a bit worse than it is, as most of the code has gone through review before (sorry!)
Reproducibility Checklist
Documentation Checklist
README
and it is up to date.analyses/README.md
and the entry is up to date.