This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Molecular Subtyping - ATRT Compare GISTIC results #344

Closed

cbethell wants to merge 17 commits into AlexsLemonade:master from cbethell:atrt-subtyping-gistic-comparison

Contributor

cbethell commented Dec 17, 2019

Purpose/implementation Section

To compare the GISTIC calls for SMARCB1 deletions in ATRT samples with the current calls.

What was your approach?

I merged the relevant metadata to categorize the GISTIC data and plotted the GISTIC calls with the current calls in ATRT samples.

What GitHub issue does your pull request address?

This PR addresses issue #244.

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Does this analysis appear to be correct?
Should there be additional plots/results?

Which areas should receive a particularly close look?

Is there anything that the analysis may be missing?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes, this analysis is ready for review.

Results

What types of results are included (e.g., table, figure)?

This analysis produces a plot within the html output of the R notebook in this PR.

What is your summary of the results?

The GISTIC calls seem to consist of many gain values and do not significantly agree with the current calls, thus making me suspicious of the methods used in this analysis.

Reproducibility Checklist

The dependencies required to run the code in this pull request have been added to the project Dockerfile.
This analysis has been added to continuous integration.


          Compare GISTIC results

8484b86

- add nb to compare GISTIC results
- add nb to shell script
- add GISTIC input file to `data` directory in this module

jaclyn-taroni assigned cansavvy and unassigned cansavvy

jaclyn-taroni requested a review from cansavvy

December 17, 2019 17:39

cansavvy reviewed

View reviewed changes

Collaborator

cansavvy left a comment

This looks like a great start! I have a few suggestions and requests for some more comments. This may be in part because I haven't seen your first two notebooks and I'm jumping in right now and I'm not sure what these different files are. Secondly, would you be able to provide a link to the rendered html? I'm curious to what your plot looks like and may have more comments based on seeing the rendered version.

analyses/molecular-subtyping-ATRT/03-ATRT-molecular-subtyping-gistic-comparison.Rmd Outdated

+              ## Directories and Files
+              ```{r}
+              # Detect the ".git" folder -- this will in the project root directory.

Collaborator

cansavvy Dec 17, 2019

This shouldn't be necessary since you are using a notebook right? I would just hard code the file path using a ..

analyses/molecular-subtyping-ATRT/03-ATRT-molecular-subtyping-gistic-comparison.Rmd Outdated

+                              Kids_First_Biospecimen_ID,
+                              tumor_ploidy)
+              # Read in gistic broad value data

Collaborator

cansavvy Dec 17, 2019

I assume this will have to change whenever the GISTIC files are added to the official data? Maybe add a TODO so we don't forget.

analyses/molecular-subtyping-ATRT/03-ATRT-molecular-subtyping-gistic-comparison.Rmd Outdated Show resolved Hide resolved

analyses/molecular-subtyping-ATRT/03-ATRT-molecular-subtyping-gistic-comparison.Rmd Outdated

+                dplyr::filter(sample_id %in% final_df$sample_id)
+              # Make the GISTIC data numeric
+              transposed_gistic$V1 <- as.numeric(transposed_gistic$V1)

Collaborator

cansavvy Dec 17, 2019

Is this something you could do using mutate and add it to the steps above?

analyses/molecular-subtyping-ATRT/03-ATRT-molecular-subtyping-gistic-comparison.Rmd Outdated

+              # Read in gistic broad value data
+              gistic_focal_data <-
+                data.table::fread(

Collaborator

cansavvy Dec 17, 2019

Are these files really large? Is that why you are using fread for these CSV files? IDK if this will help, but if you want them to be a data.frame off the bat you can use data.table = FALSE not yet sure if this will influence anything downstream though.

Contributor Author

cbethell Dec 17, 2019

Yes, that is the reason why I am using fread for these files. I added the data.table = FALSE argument in the most recent commit but I still needed the as.data.frame function in one instance downstream.

analyses/molecular-subtyping-ATRT/03-ATRT-molecular-subtyping-gistic-comparison.Rmd Outdated

+              date: 2019
+              ---
+              This notebook addresses the issue of molecular subtyping ATRT samples.

Collaborator

cansavvy Dec 17, 2019

Can you make this more specific to the purpose of this exact notebook? It looks like its purpose is to wrangle the data for this overall purpose?

analyses/molecular-subtyping-ATRT/03-ATRT-molecular-subtyping-gistic-comparison.Rmd

+                )
+              ```
+              # Filter GISTIC data for ATRT samples

Collaborator

cansavvy Dec 17, 2019

A lot is happening in this section, can you add some more comments as to the requirements of what you are trying to end up with? I think we might be able to make this a tad easier to follow or more streamlined but its hard to say at first glance without my having looked at the previous notebooks.

cbethell added 2 commits

December 17, 2019 15:01


          @cansavvy suggested changes

b4f9d72

- Add more descriptive comments
- Change file paths 
- Add a TODO for reading in GISTIC files 
- Make purpose of nb more specific
- Add `data.table = FALSE` argument


          Merge branch 'master' of https://github.com/cbethell/OpenPBTA-analysis …

de08044

…into atrt-subtyping-gistic-comparison

Contributor Author

cbethell commented Dec 17, 2019

This looks like a great start! I have a few suggestions and requests for some more comments. This may be in part because I haven't seen your first two notebooks and I'm jumping in right now and I'm not sure what these different files are. Secondly, would you be able to provide a link to the rendered html? I'm curious to what your plot looks like and may have more comments based on seeing the rendered version.

Thank you for the review @cansavvy! I tried to make the comments more descriptive but I am not sure how successful I was, so let me know if there is anything I can be more clear on.

Here is the rendered html output.

Collaborator

cansavvy commented Dec 17, 2019 •

edited

Loading

@cbethell ! Fantastic job at incorporating my comments! Now that I have a better idea of what's going on and have seen the html, I have more questions.

I want to firstly confirm the scientific question of this notebook (since I haven't been following this analysis or its accompanying issues). You are trying to compare the Focal CN analysis conclusion with the GISTIC CN conclusion? Do you know what is expected here and how these methods might be different or the same?
Assuming that this comparison is the meat of your scientific question I think the preferred presentation of this data would contingency table instead of your stacked barplot.
I'm not very familiar with the output of GISTIC or the focal CN analysis but would it make sense to also compare how many copy numbers that each method calls and do this with some kind of scatterplot? Or more generally, is there some other ways we can be looking at this data beyond just a categorical "gain", "loss", or "neutral"? May be a good idea to take multiple approaches to compare the output if this is possible, that way we can be a tad more informed about how these methods agree or disagree.

Contributor Author

cbethell commented Dec 17, 2019

@cansavvy to answer your above questions,

I want to first confirm the scientific question of this notebook (since I haven't been following this analysis or its accompanying issues). You are trying to compare the Focal CN analysis conclusion with the GISTIC CN conclusion? Do you know what is expected here and how these methods might be different or the same?

I am not 100% sure what is expected here as there is not much documentation on the methods used by GISTIC but I would have expected them to agree more than they do in the stacked barplot.

Assuming that this comparison is the meat of your scientific question I think the preferred presentation of this data would contigency table instead of your stacked barplot.

That assumption would be correct for this PR. I can implement this change in the upcoming commit.

I'm not very familiar with the output of GISTIC or the focal CN analysis but would it make sense to also compare how many copy numbers that each method calls and do this with some kind of scatterplot? Or more generally, is there some other ways we can be looking at this data beyond just a categorical "gain", "loss", or "neutral"? May be a good idea to take multiple approaches to compare the output if this is possible, that way we can be a tad more informed about how these methods agree or disagree.

Good idea. I can dive deeper into this comment and implement the scatterplot comparing the copy numbers and other approaches to compare the output of the two methods.

cbethell and others added 8 commits

December 17, 2019 15:59


          Merge branch 'master' into atrt-subtyping-gistic-comparison

6e58725


          Save and display contigency table of results

7cff309

- change plot to scatterplot of copy number values


          Merge branch 'master' into atrt-subtyping-gistic-comparison

f3c0ccf


          Merge branch 'master' into atrt-subtyping-gistic-comparison

9256aa3


          Merge branch 'master' of https://github.com/cbethell/OpenPBTA-analysis …

7d27e26

…into atrt-subtyping-gistic-comparison


          Merge branch 'atrt-subtyping-gistic-comparison' of https://github.com…

3e7eb29

…/cbethell/OpenPBTA-analysis into atrt-subtyping-gistic-comparison


          Make lintr format changes

33ad556

- Add titles to plots 
- Use the same CN data as `00-subset-files-for-ATRT.R` script


          Merge branch 'master' into atrt-subtyping-gistic-comparison

a9188d5

Contributor Author

cbethell commented Dec 19, 2019

Here is the updated rendered output.


          Merge branch 'master' into atrt-subtyping-gistic-comparison

80d5602

Collaborator

cansavvy commented Dec 19, 2019 •

edited

Loading

Per our in person discussion, @cbethell.

I think because of the small number of samples here, a contingency table is the best way to represent this even though it still is hard to interpret.
Additionally, if you can make sure to put the categories in the order of Loss, Neutral, Gain, that would help interpretability.
I think the CN scatterplot is good to see so you should keep it, but I think the "status "scatterplot is not super helpful so I would drop that one.
One last super minor style comment, can you apply a theme so the ugly gray background isn't there?

cbethell and others added 5 commits

December 19, 2019 13:02


          Merge branch 'master' of https://github.com/cbethell/OpenPBTA-analysis …

538f9ac

…into atrt-subtyping-gistic-comparison


          Attempt at a contigency table

75b87ec

- remove second plot
- order calls 
- applied a classic theme to plot


          Add dplyr:: where needed

3cd5db5


          Merge branch 'master' into atrt-subtyping-gistic-comparison

4a86add


          Merge branch 'master' into atrt-subtyping-gistic-comparison

a52937a

Member

jaclyn-taroni commented Dec 30, 2019

My interpretation of #244 (comment)

@jaclyn-taroni do you and @cbethell want to see if the results from gistic for CNVkit: s3://kf-openaccess-us-east-1-prd-pbta/data/2019-12-10-gistic-results-cnvkit.zip broad_values_by_arm.txt results make sense with the current SMARCB1 deletions found/be good enough for this analysis? If so, we can release these results in the next data release.

Is that we should be using the broad_values_by_arm.txt file to look at (from the ATRT TYR section)

broad SMARCB1 deletions (most have chr22q loss/monosomy 22)

Here you are using ControlFreeC files here for the comparison and GISTIC used CNVkit as input, so I'm not necessarily surprised if you don't see a lot of agreement. That is part of the rationale for #128. There is also the question of how the GISTIC gene symbol mapping step happens, which may contribute to any discrepancies.

I think we want to subset broad_values_by_arm.txt to ATRT samples and relevant chromosome arms only to 00-subset-files-for-ATRT.R. You may still need to use the ploidy information in the histologies file. (It also may make sense to then use the CNVkit file to generate atrt_subset/atrt_focal_cn.tsv.gz instead of the ControlFreeC file until we have consensus calls.) We want to include the chr 22 information in the final table generated in 01-ATRT-molecular-subtyping-data-prep.Rmd.

I am going to close this pull request in the interest of focusing on getting the chr 22 information into the final ATRT subtyping table.

jaclyn-taroni closed this

cbethell deleted the atrt-subtyping-gistic-comparison branch

February 6, 2020 20:42

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet