-
Notifications
You must be signed in to change notification settings - Fork 83
Mapping high level histology color labels -- Part 1 #899
Mapping high level histology color labels -- Part 1 #899
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a good start. I think my main comment is that we should try to be a bit more intentional about grouping things before resorting to "Other". There may well be reasonable groupings that take some of the small-count histologies and link them to large-count histologies that they are related to rather than just to other small-count groups. It may be helpful to have a table which shows the integrated_diagnosis
values for each broad_histology
to assist with such grouping.
My other suggestions are mostly about display. This is a notebook for internal evaluation, so having more easily visible data at each stage in tables is more valuable than some of the summaries or figures.
I also suggest treating "Other" as a special group, with a defined color that remains constant.
With regards to your question:
For ease of downstream use, I'm wondering if we should drop everything that's not Kids_First_Biospecimen_ID summary_group and hex_codes, because downstream we'll need to drop everything.
I think we should keep as much information as possible for now. It is potentially easier to evaluate groupings using the full table, and the cost of keeping it is essentially zero.
figures/mapping-histology-labels.Rmd
Outdated
summary_groups <- histology_table %>% | ||
dplyr::count(summary_group) | ||
|
||
nrow(summary_groups) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would here again print out the full table to make it easier to evaluate what has happened.
set.seed(2021) | ||
|
||
# Sample from the 18 colors | ||
subset_colors <- sample(color_palette, n_colors) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing when making the colors is that I feel like we should treat "Other" as special, and always assign it a standard color, something neutral & grey like #A0A0A0
(though I am not sure how that fits with colorblind-friendly schemes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, we could drop this "Other" group from the large summary figures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We consider treating "Benign" as the grey group @jashapiro and @cansavvy - because it's more like a control and we would be less interested (maybe) in those genomics.
@jharenza , we could use some more domain knowledge about these histologies at this point.
|
Without looking at the code, I wanted to point out that the |
I did have that previously but with the changes I made most recently with Josh's review we lost that. I'll add it back. |
|
||
There's handful of very small groups (many are n = 2). | ||
|
||
## Declare new equivalent groups |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jharenza this is where we set up the equivalent groups which are then recoded in the next section.
Hi @cansavvy -
This is not true - These are three separate entities and really should not be combined. The neuronal and mixed glial tumors comprise of many distinct tumors, so they had a wider array of
I think we can combine:
to I think you can combine I think that will leave us with 15. |
@jharenza , thanks for these comments. I'm going to implement them and then re-request you for a review when its ready. |
This brings us to 16 + Normal. Are there any other combos that seem reasonable or should we leave it at that? |
If we are talking about dropping that completely from the figures, than I think we'd probably keep this data in the table for now, but change that at the figure scripts? But I was more talking about groups we could combine into other groups. |
Ah, yeah I think that is as far as we can combine right now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @cansavvy! Looks good- made a few minor comments, but otherwise it looks good!
set.seed(2021) | ||
|
||
# Sample from the 18 colors | ||
subset_colors <- sample(color_palette, n_colors) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We consider treating "Benign" as the grey group @jashapiro and @cansavvy - because it's more like a control and we would be less interested (maybe) in those genomics.
This makes sense to me. Maybe make both |
Sure! |
…ansavvy/mapping-table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just a question about what to do with "Normal" colorwise, and some concern about capitalization consistency between files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One typo, but looks good!
Purpose/implementation Section
What scientific question is your analysis addressing?
This PR addresses Step 1 of 4 discussed in #897. This PR adds that main notebook described where we attempt to group biospecimens into "high level" histology groups that will be more practical for plotting purposes.
Note that subsequent PRs will deal with the updating of the README and retirement of
histology_color_palette.tsv
in favor for the mapping table,palettes/histology_label_color_table.tsv
, that is created in this notebook.What was your approach?
Copied from the notebook added in this PR itself:
What GitHub issue does your pull request address?
#897
Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.
Which areas should receive a particularly close look?
Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?
Yes. The ones included can be.
Results
What is your summary of the results?
We have a list of colors and 16 + Normal groups. I have not tried it on an official plot yet. (upcoming PR)
Reproducibility Checklist
Documentation Checklist
README
and it is up to date. -- No, this will be in a subsequent PR. The README will require quite a bit of updating.analyses/README.md
and the entry is up to date. -- it is not because its not located in the analyses folder.