-
Notifications
You must be signed in to change notification settings - Fork 83
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments on what I have seen so far, but I am getting confused about where the NA data is introduced and such, which hopefully you can clarify.
@@ -92,6 +103,18 @@ option_list <- list( | |||
'OpenPBTA-analysis') that specifies the BED regions file that indicates the | |||
effectively surveyed regions of the genome for the WXS samples.", | |||
metavar = "character" | |||
), | |||
make_option( | |||
opt_str = "--gap", default = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this catch adjacent breaks, like a break at position 10 and another at 11? Because those should really be the same, but I can't tell if they are considered gap of 0 or 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From GenomicRanges docs: The gap between 2 ranges is the number of positions that separate them. The gap between 2 adjacent ranges is 0.
I interpret this to mean that maxgap = 0
, means they are right next to each other. Is this what we want?
# Exclude the small regions | ||
end - start > 200000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to exclude the small regions if we are going to calculate overlaps using them later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can keep them.
ggplot2::theme_classic() + | ||
ggplot2::xlab("Rank") + | ||
ggplot2::ylab("Chromosomal Breaks per Mb") + | ||
# Transform to log10 make non-log y-axis labels | ||
ggplot2::scale_y_continuous(trans = "log10", breaks = c(0, 1, 10, 30)) + | ||
ggplot2::scale_y_continuous(trans = "log10", breaks = c(0, 1, 10, 30)) + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why these ticks aren't showing up in the rendered image?
analyses/chromosomal-instability/01-plot-chromosomal-instability.Rmd
Outdated
Show resolved
Hide resolved
analyses/chromosomal-instability/01-plot-chromosomal-instability.Rmd
Outdated
Show resolved
Hide resolved
analyses/chromosomal-instability/01-plot-chromosomal-instability.Rmd
Outdated
Show resolved
Hide resolved
…into tinier steps
Per our in person discussion, @jashapiro , I will work on splitting this up into more manageable PRs and perhaps in a reorganized set of notebooks. Stay tuned. 📺 |
After doing reorganizing of these changes, I'm going to break up this into multiple PRs. Closing this original PR now. |
Purpose/implementation Section
What scientific question is your analysis addressing?
The items on #487 :
- For the denominator for calculating densities
- For marking NA on heatmap
What was your approach?
Excluded regions BED:
Regions are considered
NA
based on the percentage of the bin that is covered by ancnv_excluded_regions.bed
. The percent cutoff for each bin can be set byperc_cutoff
argument forbreak_density
function calculation.Flexibility parameters:
CNV and SV data are now combined by an intersection the flexibility of which is controlled by
--gap 5
parameter. This gap parameter is passed directly to GenomicRanges'mergeByOverlaps
function'smaxgap
parameter. E.g. for a gap parameter of 5, a CNV or SV breakpoint within 5 bp of each other for the same sample will be considered the same.Somewhat related but not really, I also added a
drop-sex
argument to the 00-setup script.What GitHub issue does your pull request address?
#487
Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.
Which areas should receive a particularly close look?
--gap
,perc_overlap
and--drop-sex
clear and seem like the logic that surrounds them is sensible?Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?
Yes. Let me know what changes you suggest for the plots.
Results
What types of results are included (e.g., table, figure)?
The plots are updated to be like this:
data:image/s3,"s3://crabby-images/82b4e/82b4edf4221ac34c737e4b2b9747365b4fe4ead1" alt="cnv_breaks_heatmap"
What is your summary of the results?
Here's the rendered html: https://cansavvy.github.io/openpbta-notebook-concept/chromosomal-instability/01-plot-chromosomal-instability.nb.html
Reproducibility Checklist
These items were done previously.
Documentation Checklist
README
and it is up to date.analyses/README.md
and the entry is up to date.