Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Using mapping histology groups for plotting -- Part 2 implemention (PR 1 of 4) #904

Merged
merged 43 commits into from
Jan 20, 2021

Conversation

cansavvy
Copy link
Collaborator

@cansavvy cansavvy commented Jan 14, 2021

Purpose/implementation Section

What scientific question is your analysis addressing?

This PR addresses Step 3 and part Step 4 discussed in #897 and #898. This PR uses the palettes/histology_label_color_table.tsv table that is created by the notebook added in #899

What was your approach?

This PR is 1 of a few PRs where I am updating things in response to the table added in #898

Here's what's accomplished in this PR:

Note that the palettes/histology_color_palette.tsv file itself is not deleted yet, I will do this in the next PR so the other modules that use this file don't break in CI yet.

What GitHub issue does your pull request address?

Closes #897 and
begins to address #898

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

  • How do we feel about how the updated chromosomal-instability heatmap plots look? (see zip file below if you don't want to check out the branch)
  • Is the figures/README updated appropriately and with enough detail?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes. Take a look at the chromosomal-instability plots here to see what you think:

chromosomal-instability-heatmaps.zip

Results

What is your summary of the results?

Results are the same, the labels are changed.

Reproducibility Checklist

Documentation Checklist

  • This analysis module has a README and it is up to date. --the README in figures is now up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date. -- it is not because its not located in the analyses folder.
  • The analytical code is documented and contains comments.

@cansavvy cansavvy marked this pull request as ready for review January 19, 2021 12:13
@cansavvy cansavvy requested a review from jashapiro January 19, 2021 14:09
Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks good to me, but I had a thought that will will likely change some of the readme content if implemented:

One thing that I didn't comment on in previous PRs but I am now thinking more heavily about here is that we should not only have the display histology order standardized as well as the color. Arranging alphabetically doesn't really make sense, as we will want Other and probably Benign to be at the end, and there may be other ordering we want to use (largest to smallest?)

I'm guessing this should happen when reading in the data, but we should probably make display_group a factor. My first thought was to just use `forcats::fct_relevel(display_group, "Other tumor", "Benign", after = Inf) to push those two categories to the end.

On further reflection, it may be worth adding a display_order column to the data table to make this easier to standardize across scripts: Then you could use forcats::fct_reorder(display_group, display_order) to make changes that would easily propagate, and can allow us to specify ordering of all factors just as easily. (Do we want to keep LGAT and HGAT next to each other, for example, or order by # of samples)

@cansavvy
Copy link
Collaborator Author

On further reflection, it may be worth adding a display_order column to the data table to make this easier to standardize across scripts: Then you could use forcats::fct_reorder(display_group, display_order) to make changes that would easily propagate, and can allow us to specify ordering of all factors just as easily.

I like this idea. Will implement.

(Do we want to keep LGAT and HGAT next to each other, for example, or order by # of samples)

I'd say small to large makes sense with the exception that Other and Benign should be at the end?

@jashapiro
Copy link
Member

I'd say small to large makes sense with the exception that Other and Benign should be at the end?

Large to small, but yeah, I think that is the place to start until somebody expresses a different preference.

@cansavvy
Copy link
Collaborator Author

I'm not sure how much this is or isn't an issue, but do different data types have different sample sizes so display_order would technically not always be large to small if we set it based on the full set of biospecimens?But my hunch is the consistency is still good so this is still a good plan?

@cansavvy
Copy link
Collaborator Author

@jashapiro, I added a display_order column to the original mapping notebook and then updated the instructions and chromosomal instability notebook accordingly. See what you think.

@cansavvy cansavvy requested a review from jashapiro January 19, 2021 17:58
Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a suggestion for streamlining, because I was confused by the max_level stuff for longer than I want to admit.

file.path(figures_dir, "palettes", "histology_label_color_table.tsv")
file.path(figures_dir, "palettes", "histology_label_color_table.tsv"),
# Read in each column as a specific type - skip extra columns
col_types = paste0(c("c", rep("-", 6), "f", "i", "c"), collapse = "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need this? I guess it is helpful for skipping columns, but the factor will be automatically converted when you call fct_reorder, and the order should get read in correctly as numeric. You are implicitly specifying column order, which is something I try to avoid as it can be brittle.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that I'm not happy about this thing which is rigged with respect to column order. However, fct_reorder will not as the table here is read in. display_group does not get read in as a factor which is what is causing the problem.

I can instead get rid of this col_types line and add back a mutate where we force it to be a factor first and then also add back a select to get to the four columns we want. (this is what I had previously. It includes more steps but it is less brittle with respect to column order).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I've never had forcats complain about being given a character vector. Maybe I haven't used whatever old version we currently have.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched back to using a select and as.factor(). It may be more official "step" but it is more robust.

Copy link
Collaborator Author

@cansavvy cansavvy Jan 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I must be imagining things. I dropped the as.factor and it works fine. Never mind everything I said. will push that change.

@cansavvy
Copy link
Collaborator Author

Things have been streamlined. Thanks for those suggestions. @jashapiro

@cansavvy cansavvy requested a review from jashapiro January 19, 2021 19:53
Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a few little cleanup suggestions.

@jaclyn-taroni jaclyn-taroni merged commit 4d96449 into AlexsLemonade:master Jan 20, 2021
@cansavvy cansavvy deleted the cansavvy/mapping-README branch January 25, 2021 16:25
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make mapping histology groups for plotting -- Part 1 creating the table
3 participants