Chr instability: PR 3 of 3: Histology plots #532

cansavvy · 2020-02-10T16:29:31Z

Purpose/implementation Section

This includes redone versions of the material that was included in #492.

This last PR has the last of three notebooks which contain the material that was originally all in the previous 01 notebook.

What scientific question is your analysis addressing?

How does chromosomal instability relate to histology group?

What was your approach?

This takes the binned counts from the 01-localization notebook and makes histology plots for them.

What GitHub issue does your pull request address?

#487

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Most of this material has been reviewed previously, but it now is in its own notebook.
How do you feel about the readability of it? Are there other aesthetic changes that need to be made to the plots?

What is your summary of the results?

Here's the rendered html: https://cansavvy.github.io/openpbta-notebook-concept/chromosomal-instability/02b-plot-chr-instability-by-histology.nb.html

Reproducibility Checklist

These items were done previously.

The dependencies required to run the code in this pull request have been added to the project Dockerfile.
This analysis has been added to continuous integration.

Documentation Checklist

These items were done previously.

This analysis module has a README and it is up to date (Note it includes documentation for the upcoming notebooks as well.
This analysis is recorded in the table in analyses/README.md and the entry is up to date.
The analytical code is documented and contains comments.

…ed arguments

…dea why cnv heatmap wasn't being updated????

…-inst-3

cansavvy · 2020-02-12T20:45:58Z

Alrighty, @jashapiro , let me know if we think everything is addressed now.

analyses/chromosomal-instability/02b-plot-chr-instability-by-histology.Rmd

jashapiro

Aside from my last little comments, this looks good!

analyses/chromosomal-instability/util/chr-break-plot.R

jashapiro · 2020-02-13T09:26:20Z

analyses/chromosomal-instability/00-setup-breakpoint-data.R

@@ -356,7 +358,7 @@ breaks_density_list <- lapply(breaks_list, function(breaks_df) {
      samples, experimental_strategy, genome_size
    ) %>%
    # Count number of mutations for that sample
-    dplyr::summarize(breaks_count = dplyr::n()) %>%
+    dplyr::summarize(breaks_count = sum(!is.na(chrom))) %>%


I just realized, while thinking about #490 (comment), that this may make some samples that should be NA into zeros. If the sample was not in the consensus seg file, it should be NA for CNV breaks and SV breaks I think such a sample here would end up with a zero here.

Those do end up with zeroes. If NA is preferred I can make those changes. I’ll just have to convert them back to zeroes for the CDF plots, unless we think those samples should be dropped from the CDF plots?

Yes, drop them if they are missing. If you don't drop them, it says to your audience that there are n samples that have 0 breakpoints when we don't have evidence either way.

So is it correct to say that all WGS samples have been ran through both CNV detection pipelines, so if they don't show up in consensus CNV file or the SV file then they should be NAs for break density?

If I'm understanding this correctly, then there will be no such thing as a 0, only NAs and the minimum break_counts would 1.

@cansavvy take a look at the documentation for https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/copy_number_consensus_call and also see https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/master/analyses/copy_number_consensus_call/results/uncalled_samples.tsv. I would also check the IDs in the original files to be sure.

The docs linked to above by @jaclyn-taroni should explain, but yes, there are both zeros and NAs. Some samples were deemed “uncallable” and should be NA. Others were called but had no CNVs and should be 0.

jashapiro · 2020-02-14T21:10:39Z

analyses/chromosomal-instability/00-setup-breakpoint-data.R

+    dplyr::summarize(is_na = any(is.na(chrom)), 
+                     breaks_count = dplyr::n()) %>%
+    # Calculate breaks density, but put NA for breaks_count if the sample was not 
+    # in the SV or CNV data originally
+    dplyr::mutate(breaks_count = dplyr::case_when(
+      !is_na ~ as.numeric(breaks_count), 
+      is_na ~ as.numeric(NA)
+      ), 


Unfortunately, this isn't actually doing what we really want it to do.
The trouble is that anything with a true zero count will get an NA in chrom column, just like something that is actually missing.

I think the thing to do is to add a column to metadata called surveyed which would indicate if the sample is in unique(c(cnv_samples, sv_samples)). Then if surveyed is TRUE, set the is_na break_counts to 0, and to NA otherwise.

something like:

dplyr::mutate( breaks_count = dplyr::case_when( !is_na ~ as.numeric(breaks_count), is_na & surveyed ~ as.numeric(0), TRUE ~ as.numeric(NA) ),

jashapiro · 2020-02-18T15:20:42Z

Okay, I think this is good to go.

AlexsLemonade#532 changed results.

@cansavvy

* Initial files added to ependymoma subtyping folder * Added bash script and changed the paths for all files to run from OpenPBTA directory * Added bash script and changed the paths for all files to run from OpenPBTA directory * Add Ependymoma subtyping to CI * Add to analyses/README.md * Test removing rpy2 and using pyreadr exclusively * Change to use pyreadr properly * Revert pyreadr changes * Add subset flag to CI * Use R to generate subset file & shell to specify filenames * Add results file * Move Ependymoma subtyping up in CI * Responding to pull request reviews * Adding jupyter notebook * Typo fixes * Update gistic filename * Changed implemented as suggested on feb 7 2020 * Small review changes remove unused imports simplify use of `stats` * Update 00-subset-for-EPN.R with changes from @cansavvy code review * Zscore column names changed * Changed how merge is done between RNA and DNA tables * Removed comment lines * remove duplicate commented code. * Added some columns as per comments from 02-12-2020 * update invocation of 02_ependymoma_generate_all_data.py Add line continuation characters to bash script Remove no longer used --breakpoints option * Handle missing data, and some refactoring I made some substantial changes here in structure, but the results should be largely unchanged. I did some transposing when constructing data frames so we can use the same function (fill_df) to extract data more often, and moved the ID column specification out of the function so that RNA and DNA-derived data are handled the same way. The function then allows a set of samples to be specified, and if the request is for a a sample that does not fall in there, it is set as NA for that column in the output data, otherwise it is filled in with a default value. * Delete unused full table zscore * Rerun with updated data #532 changed results. Co-authored-by: jashapiro <jashapiro@gmail.com>

cansavvy added 30 commits February 5, 2020 13:07

Reorganize to localization first

942c927

Missed some README edits

adef2d9

Forgot I can push the results files too

294567f

Merge branch 'master' into reorg-chr-insta-1

d1264b7

Make intersect_cnv_sv its own function with more specific ally declar…

31c81f8

…ed arguments

make it snazzy with purrr::transpose

970dba7

change results file names

599913f

Update words

d41e7e4

Fix bin_indices thing

beba0bc

Setting up the next notebook

7e90984

Move file names to their own section for @jashapiro relinter and rerun

6a7291d

Remove remnant knit file

d433b94

Merge branch 'reorg-chr-insta-1' into reog-chr-insta-2

df7e363

Update heatmaps and run it

68f2a71

Add to bash script

64b98f1

Relinter and refresh notebooks

d48b4f6

Add in histology nb

15e54ed

Set up and run histology notebook

d3f1b03

Delete union file and take out some unnecssary pastes still have no i…

f91b5a5

…dea why cnv heatmap wasn't being updated????

Refresh notebook

846716f

Implement @jashapiro 's fixes and finds!

f307785

Merge remote-tracking branch 'upstream/master' into reog-chr-insta-2

f937439

toTitleCase !

162f15b

Update file name in bash script

575a69d

Merge branch 'reog-chr-insta-2' into reorg-chr-inst-3

34412b2

update plots and refresh notebook

234dd89

Add the two doc suggestions from @jashapiro

36066e7

Merge branch 'reog-chr-insta-2' into reorg-chr-inst-3

815c061

refresh notebooks and update bash script

936958b

Refresh notebooks and rearrange circleCI

dd00106

cansavvy added 4 commits February 12, 2020 14:28

Update some docs

b28056f

Merge branch 'master' into reorg-chr-inst-3

3aa00cd

Fix column name thing that I changed elsewhere

cdea6d6

Merge remote-tracking branch 'origin/reorg-chr-inst-3' into reorg-chr…

6af1808

…-inst-3

jashapiro reviewed Feb 12, 2020

View reviewed changes

analyses/chromosomal-instability/02b-plot-chr-instability-by-histology.Rmd Show resolved Hide resolved

jashapiro reviewed Feb 12, 2020

View reviewed changes

analyses/chromosomal-instability/02b-plot-chr-instability-by-histology.Rmd Show resolved Hide resolved

jashapiro approved these changes Feb 12, 2020

View reviewed changes

analyses/chromosomal-instability/util/chr-break-plot.R Outdated Show resolved Hide resolved

cansavvy added 2 commits February 12, 2020 17:18

Address @jashapiro comments

4a46e3c

Fix outdated "median" comment.

a7ad558

jashapiro mentioned this pull request Feb 13, 2020

Ependymoma subtyping #490

Merged

jashapiro reviewed Feb 13, 2020

View reviewed changes

cansavvy added 3 commits February 13, 2020 11:03

Push NA vs 0 fix

c4c938e

Re-run everything after making the fixes

f35c8a8

Suppress some NA messages

2057202

jashapiro reviewed Feb 14, 2020

View reviewed changes

jashapiro and others added 9 commits February 14, 2020 16:18

Add uncalled file to options

39d0320

Make adjustment to unsurveyed samples per @jashapiro 's suggestion

1268fba

Merge branch 'master' into reorg-chr-inst-3

cdd42af

Update to handle surveyed samples

751d8ba

Rerun with latest changes

e992d8b

Merge remote and rerun

a8d9c50

Merge branch 'master' into reorg-chr-inst-3

795c612

Merge branch 'master' into reorg-chr-inst-3

9d2e90d

Merge branch 'master' into reorg-chr-inst-3

27c6440

jaclyn-taroni merged commit dcc9bf0 into AlexsLemonade:master Feb 18, 2020

jashapiro added a commit to tkoganti/OpenPBTA-analysis that referenced this pull request Feb 18, 2020

Rerun with updated data

5d05922

AlexsLemonade#532 changed results.

cansavvy deleted the reorg-chr-inst-3 branch February 28, 2020 14:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chr instability: PR 3 of 3: Histology plots #532

Chr instability: PR 3 of 3: Histology plots #532

cansavvy commented Feb 10, 2020

cansavvy commented Feb 12, 2020

jashapiro left a comment

jashapiro Feb 13, 2020

cansavvy Feb 13, 2020

jaclyn-taroni Feb 13, 2020

cansavvy Feb 13, 2020

cansavvy Feb 13, 2020 •

edited

Loading

jaclyn-taroni Feb 13, 2020

jashapiro Feb 13, 2020 •

edited

Loading

jashapiro Feb 14, 2020

jashapiro commented Feb 18, 2020

Chr instability: PR 3 of 3: Histology plots #532

Chr instability: PR 3 of 3: Histology plots #532

Conversation

cansavvy commented Feb 10, 2020

Purpose/implementation Section

What scientific question is your analysis addressing?

What was your approach?

What GitHub issue does your pull request address?

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

What is your summary of the results?

Reproducibility Checklist

Documentation Checklist

cansavvy commented Feb 12, 2020

jashapiro left a comment

Choose a reason for hiding this comment

jashapiro Feb 13, 2020

Choose a reason for hiding this comment

cansavvy Feb 13, 2020

Choose a reason for hiding this comment

jaclyn-taroni Feb 13, 2020

Choose a reason for hiding this comment

cansavvy Feb 13, 2020

Choose a reason for hiding this comment

cansavvy Feb 13, 2020 • edited Loading

Choose a reason for hiding this comment

jaclyn-taroni Feb 13, 2020

Choose a reason for hiding this comment

jashapiro Feb 13, 2020 • edited Loading

Choose a reason for hiding this comment

jashapiro Feb 14, 2020

Choose a reason for hiding this comment

jashapiro commented Feb 18, 2020

cansavvy Feb 13, 2020 •

edited

Loading

jashapiro Feb 13, 2020 •

edited

Loading