Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Update focal CN file prep to use exons again and cover the consensus SEG case #479

Merged
merged 27 commits into from
Jan 28, 2020

Conversation

jaclyn-taroni
Copy link
Member

@jaclyn-taroni jaclyn-taroni commented Jan 28, 2020

Purpose/implementation Section

We used to use exons to annotate CNV data files the prior to #452, which made the switch to using coding regions. That looked like it was too restrictive and inconsistent with what is typically done (some references and quotes supplied by @cbethell below).

Monlong et al. NAR. 2018.:

Exons of protein-coding genes and promoter regions (10 kb upstream of the transcription start site) were extracted from the Gencode annotation v19. We counted how many genes overlapped a CNV in the population when considering exons only, exons and promoter region, or gene body and promoter region

Bailey et al. Cytogenet Genome Res. 2009.:

Our detection strategy was developed as a simple threshold-per-exon test with chaining of exons to predict the extent of the variation.

What was your approach?

Here we are:

What GitHub issue does your pull request address?

Closes #473; related to #186

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

This looks a bit worse than it is, as most of the code has gone through review before (sorry!)

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

Documentation Checklist

  • This analysis module has a README and it is up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date.
  • The analytical code is documented and contains comments.

@jaclyn-taroni jaclyn-taroni marked this pull request as ready for review January 28, 2020 18:55
@jaclyn-taroni
Copy link
Member Author

Copy link
Collaborator

@cansavvy cansavvy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. The only comments I have are regarding a couple code comments and README documentation.

# of the repository as follows:
#
# Rscript 'analyses/oncoprint-landscape/01-prepare-cn-file.R'

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless this script would work without specifying anything, I suspect this usage example is incomplete?

The mapping is limited to _coding sequences_.
Mapping to cytobands is performed with the [`UCSC hg38 cytoband file`](http://hgdownload.cse.ucsc.edu/goldenpath/hg38/database/cytoBand.txt.gz).
_Note: The decision to implement the `UCSC file` was made based on a comparison done between the cytoband calls in the `org.Hs.eg.db` package and the calls in the `UCSC file`. We found that they disagreed in ~11,800 calls out of ~800,000 and the `UCSC file` contains more cytoband calls._
* `02-add-ploidy-consensus.Rmd` - This is very similar to the CNVkit file prep.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure what this is referring to:similar to CNVKit file prep can you put a link to what this is referring to?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the 01 notebook file name here to make it more clear.


* `02-rna-expression-validation.R` - This script examines RNA-seq expression levels (RSEM FPKM) of genes that are called as deletions.
* `rna-expression-validation.R` - This script examines RNA-seq expression levels (RSEM FPKM) of genes that are called as deletions.
It is not currently run via the shell script.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the plan for this script then? Is it being dropped or revamped or is it supposed to be called somewhere else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plan is to rework it #387

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Annotated CNVkit data no longer shows SMARCB1 deletions in ATRT
2 participants