-
Notifications
You must be signed in to change notification settings - Fork 83
Molecular Subtyping - ATRT Compare GISTIC results #344
Molecular Subtyping - ATRT Compare GISTIC results #344
Conversation
- add nb to compare GISTIC results - add nb to shell script - add GISTIC input file to `data` directory in this module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a great start! I have a few suggestions and requests for some more comments. This may be in part because I haven't seen your first two notebooks and I'm jumping in right now and I'm not sure what these different files are. Secondly, would you be able to provide a link to the rendered html? I'm curious to what your plot looks like and may have more comments based on seeing the rendered version.
## Directories and Files | ||
|
||
```{r} | ||
# Detect the ".git" folder -- this will in the project root directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be necessary since you are using a notebook right? I would just hard code the file path using a ..
Kids_First_Biospecimen_ID, | ||
tumor_ploidy) | ||
|
||
# Read in gistic broad value data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this will have to change whenever the GISTIC files are added to the official data? Maybe add a TODO so we don't forget.
analyses/molecular-subtyping-ATRT/03-ATRT-molecular-subtyping-gistic-comparison.Rmd
Outdated
Show resolved
Hide resolved
dplyr::filter(sample_id %in% final_df$sample_id) | ||
|
||
# Make the GISTIC data numeric | ||
transposed_gistic$V1 <- as.numeric(transposed_gistic$V1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something you could do using mutate
and add it to the steps above?
|
||
# Read in gistic broad value data | ||
gistic_focal_data <- | ||
data.table::fread( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these files really large? Is that why you are using fread
for these CSV files? IDK if this will help, but if you want them to be a data.frame off the bat you can use data.table = FALSE
not yet sure if this will influence anything downstream though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is the reason why I am using fread
for these files. I added the data.table = FALSE
argument in the most recent commit but I still needed the as.data.frame
function in one instance downstream.
date: 2019 | ||
--- | ||
|
||
This notebook addresses the issue of molecular subtyping ATRT samples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make this more specific to the purpose of this exact notebook? It looks like its purpose is to wrangle the data for this overall purpose?
) | ||
``` | ||
|
||
# Filter GISTIC data for ATRT samples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot is happening in this section, can you add some more comments as to the requirements of what you are trying to end up with? I think we might be able to make this a tad easier to follow or more streamlined but its hard to say at first glance without my having looked at the previous notebooks.
- Add more descriptive comments - Change file paths - Add a TODO for reading in GISTIC files - Make purpose of nb more specific - Add `data.table = FALSE` argument
…into atrt-subtyping-gistic-comparison
Thank you for the review @cansavvy! I tried to make the comments more descriptive but I am not sure how successful I was, so let me know if there is anything I can be more clear on. Here is the rendered html output. |
@cbethell ! Fantastic job at incorporating my comments! Now that I have a better idea of what's going on and have seen the html, I have more questions.
|
@cansavvy to answer your above questions,
I am not 100% sure what is expected here as there is not much documentation on the methods used by GISTIC but I would have expected them to agree more than they do in the stacked barplot.
That assumption would be correct for this PR. I can implement this change in the upcoming commit.
Good idea. I can dive deeper into this comment and implement the scatterplot comparing the copy numbers and other approaches to compare the output of the two methods. |
- change plot to scatterplot of copy number values
…into atrt-subtyping-gistic-comparison
…/cbethell/OpenPBTA-analysis into atrt-subtyping-gistic-comparison
- Add titles to plots - Use the same CN data as `00-subset-files-for-ATRT.R` script
Here is the updated rendered output. |
Per our in person discussion, @cbethell.
|
…into atrt-subtyping-gistic-comparison
- remove second plot - order calls - applied a classic theme to plot
My interpretation of #244 (comment)
Is that we should be using the
Here you are using ControlFreeC files here for the comparison and GISTIC used CNVkit as input, so I'm not necessarily surprised if you don't see a lot of agreement. That is part of the rationale for #128. There is also the question of how the GISTIC gene symbol mapping step happens, which may contribute to any discrepancies. I think we want to subset I am going to close this pull request in the interest of focusing on getting the chr 22 information into the final ATRT subtyping table. |
Purpose/implementation Section
To compare the GISTIC calls for
SMARCB1
deletions in ATRT samples with the current calls.What was your approach?
I merged the relevant metadata to categorize the GISTIC data and plotted the GISTIC calls with the current calls in ATRT samples.
What GitHub issue does your pull request address?
This PR addresses issue #244.
Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.
Which areas should receive a particularly close look?
Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?
Yes, this analysis is ready for review.
Results
What types of results are included (e.g., table, figure)?
This analysis produces a plot within the html output of the R notebook in this PR.
What is your summary of the results?
The GISTIC calls seem to consist of many
gain
values and do not significantly agree with the current calls, thus making me suspicious of the methods used in this analysis.Reproducibility Checklist