-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explore expression of marker genes for tumor cell states in 2 samples #981
Explore expression of marker genes for tumor cell states in 2 samples #981
Conversation
…plore-cell-states-ewings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think this is a really reasonable exploration of results! I very much agree that if you want to consider AUCell, that's a separate notebook; this is long enough.
One additional item I wanted to point out beyond the comments I left was that, I'm not sure we can rule out proliferative being useful based on 2 libraries alone, although I do agree based on the density plots that there seems to be a lot less going in within that gene set here. I did leave a comment about the relationship between proliferative and high, which might also help to get a sense of how much we want to use proliferative going forward if at all!
) | ||
|
||
# set seed | ||
set.seed(2024) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the future is now 🎉 (obviously this is not a needed change though)
set.seed(2024) | |
set.seed(2025) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe by February, I'll remember it's a new year 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am very much still at the <types 2024...deletes the 4 and types 5> stage! 🥴
analyses/cell-type-ewings/exploratory_analysis/06-tumor-cell-state-assignment.Rmd
Outdated
Show resolved
Hide resolved
analyses/cell-type-ewings/exploratory_analysis/06-tumor-cell-state-assignment.Rmd
Outdated
Show resolved
Hide resolved
aspect.ratio = 1 | ||
) | ||
}) |> | ||
patchwork::wrap_plots(ncol = 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I'd wrap these vs just print them one on each line, since they are a bit small in the HTML compared to other plots, and these are important ones to be able to see
I'm using the same parameters we use for the whole sample, leiden, modularity, resolution of 0.5, and nearest neighbors of 20. | ||
I think if we really wanted to be stringent we would evaluate clusters here and find the best parameters to use, but I think starting with these is good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there are "perfect" answers here since clustering is its own special can of worms. But, I do want to say that I wonder what the difference would be between using the same parameters on subsetted data, vs using the full dataset with a higher resolution. I agree this is a decent enough place to start though, since it's anyone's guess how to subset clusters anyways!
analyses/cell-type-ewings/exploratory_analysis/06-tumor-cell-state-assignment.Rmd
Outdated
Show resolved
Hide resolved
analyses/cell-type-ewings/exploratory_analysis/06-tumor-cell-state-assignment.Rmd
Outdated
Show resolved
Hide resolved
|
||
``` | ||
|
||
It looks like for both samples we have some clear division between EWS high and EWS low cells. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.. what do we make of the fact that it barely looks like annotation
(which is the sub clusters - can you modify the heatmap code to get that labeled as "sub_cluster" and not as "annotation"?) is clustered on the 2nd heatmap? Of course, the hierachical clustering algorithm that the heatmap is using is decently different from the one used to cluster in the first place, but the total lack of relationship I find odd. I'm curious, what does the heatmap look like if we first arrange the data by subcluster, and then turn off x-axis clustering so that the subclusters indeed are grouped together?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, now I'm looking at the next version of this heatmap after you group into putative cell states, and I feel less concerned! The rainbow look, shall we say, was probably just brought on by some very similar sub-clusters.
analyses/cell-type-ewings/exploratory_analysis/06-tumor-cell-state-assignment.Rmd
Outdated
Show resolved
Hide resolved
Co-authored-by: Stephanie Spielman <stephanie.spielman@gmail.com>
This comment made me take another look at this and I do see some expression of the proliferative gene set, but it looks so much lower because the values are lower. This is probably a side effect of having marker gene sets with varying numbers of genes in them. Because of that, I actually think we want to use the mean here. It doesn't change the overall conclusions and the plots mostly look the same, but it does keep things on a similar scale. It also makes it easier to see all the gene sets in the heatmaps. In revisiting this, I think there is one group of cells that are "proliferative" in SCPCL000824 that also line up with EWS-high. I feel comfortable labeling these as such. The one thing that does change when using mean is that the differences between EWS-mid and EWS-high are not as striking to see in the heatmap, but I see it in the UMAP. Maybe that's okay and we could also just label cells as low, high, or proliferative? Just an FYI that I wrote a new function to calculate means rather than modifying the existing function to calculate sums because we use that function in lots of places and I was trying not to break any previous things that use it. I also updated the heatmap to not cluster the columns, but I wasn't able to change the name of the New rendered notebook: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes you've made here seem good to me (and totally fine about the annotation label, it's good enough!)!
The one thing that does change when using mean is that the differences between EWS-mid and EWS-high are not as striking to see in the heatmap, but I see it in the UMAP. Maybe that's okay and we could also just label cells as low, high, or proliferative?
I like the idea of labeling as high-proliferative as you have done (for that one cluster at least), but I think we should bear in mind the caveat that this does mean other high cells aren't also proliferative. That said, I was intrigued that a cluster came out for proliferative here at all, and in reminding myself of the marker genes we're using, I wondered if we just don't have enough proliferative markers for sufficient power. So I did another quick search and uncovered* the contender VRK1
from https://doi.org/10.1016/j.ccell.2014.10.004. Locally, I added this to the marker gene TSV and re-rendered the notebook. I think this may be a contender to include and achieve a bit more signal, what do you think?
06-tumor-cell-state-assignment-with-VRK1.html.zip
analyses/cell-type-ewings/scripts/utils/tumor-validation-helpers.R
Outdated
Show resolved
Hide resolved
|
||
df <- data.frame( | ||
barcodes = names(mean_exp), | ||
sum_exp = mean_exp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this name will change anyway, but for clarity in the meantime...
sum_exp = mean_exp | |
mean_exp = mean_exp |
@@ -20,6 +20,9 @@ This includes comparing annotations to those obtained from marker gene annotatio | |||
5. `05-cluster-exploration.Rmd`: This notebook looks at clustering across different parameters to choose the most optimal clusters for `SCPCL000822` and `SCPCL000824`. | |||
Additionally, we look at expression of marker genes from `references/visser-all-marker-genes.tsv` across all clusters and use that to refine tumor cell annotations obtained from running `aucell-singler-annotation.sh`. | |||
|
|||
6. `06-tumor-cell-state-assignments.Rmd`: This notebook looks at assigning tumor cell states in `SCPCL000822` and `SCPCL000824`. | |||
Tumor cells are grouped into `EWS-FLI` high, `EWS-FLI` low, and `EWS-FLI` middle based on expression of marker genes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and high proliferative?
## Conclusions | ||
|
||
- There is a clear division between cells that are `EWS-FLI` high and `EWS-FLI1` low. | ||
- Most tumor cells in both samples are `EWS-FLI` high, which is consistent with the literature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sub-bullet for proliferative
Based on this I would make the following assignments: | ||
|
||
`SCPCL000822`: | ||
`EWS-FLI1` low - 2 | ||
`EWS-FLI1` middle - 1 & 4 | ||
`EWS-FLI1` high - 3 | ||
|
||
`SCPCL000824` : | ||
`EWS-FLI1` low - 1 | ||
`EWS-FLI1` middle - 6 | ||
`EWS-FLI1` high - 3, 4, 5, 7 | ||
`EWS-FLI1` high proliferative - 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to add this comment before - this is each appearing on one line so let's actually make it bullets
Based on this I would make the following assignments: | |
`SCPCL000822`: | |
`EWS-FLI1` low - 2 | |
`EWS-FLI1` middle - 1 & 4 | |
`EWS-FLI1` high - 3 | |
`SCPCL000824` : | |
`EWS-FLI1` low - 1 | |
`EWS-FLI1` middle - 6 | |
`EWS-FLI1` high - 3, 4, 5, 7 | |
`EWS-FLI1` high proliferative - 2 | |
Based on this I would make the following assignments: | |
`SCPCL000822`: | |
* `EWS-FLI1` low - 2 | |
* `EWS-FLI1` middle - 1 & 4 | |
* `EWS-FLI1` high - 3 | |
`SCPCL000824` : | |
* `EWS-FLI1` low - 1 | |
* `EWS-FLI1` middle - 6 | |
* `EWS-FLI1` high - 3, 4, 5, 7 | |
* `EWS-FLI1` high proliferative - 2 |
I think we should figure out what exactly we mean as "proliferative". It's been shown that EWS-FLI1 high cells have a more proliferative phenotype and that EWS-FLI1 low cells have a more metastatic phenotype (see https://www.nature.com/articles/onc2016498). So if we use that logic, then all EWS-FLI1 high cells should be proliferative. And then if you add in VRK1, an EWS-FLI1 target which has been shown to contribute to tumor cell growth and proliferation in the paper you posted, then you see more of the EWS-FLI1 high cells overlapping with proliferative cells. I think if we want to use that logic, then we don't need to have the "proliferative" category at all and just categorize by EWS-FLI1 levels. However, those studies were all done in bulk and all cell lines. The "proliferative" class that I was trying to characterize is based on the "Ewing sarcoma proliferating cells" labeled in https://www.biorxiv.org/content/10.1101/2024.01.18.576251v2.full. These were specifically identified by looking at the two markers that we have on the list and are a subset of Ewing tumor cells in patient tumor cells using single-cell. That doesn't mean it should be the gold standard, but I think if we want to isolate a specific "proliferating" population, then we can't just use EWS-FLI1 targets, since those should be high regardless. I did revisit Figure 2 from Goodspeed et al,. and I wonder if we want to add Also, I think we should add ICAM1 based on Franzetti et al. I completely forgot about that paper until now! Here's an example of the report with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't mean it should be the gold standard, but I think if we want to isolate a specific "proliferating" population, then we can't just use EWS-FLI1 targets, since those should be high regardless.
Got it, thanks for the biology! Totally makes sense.
I did revisit Figure 2 from Goodspeed et al,. and I wonder if we want to add TOP2A to the list? I think if we want to add VRK1 it should be to the EWS-FLI high list.
If we're including the others from Goodspeed, then I agree TOP2A should be in there per that linked heatmap, and I agree with adding VRK1 to the high list.
Looking at the updated report, it all looks good to me at this point actually and I think we have a pretty solid gene set here to move forward with. The only changes to make are to add the additional DOI we have to the references/README.md
where the marker gene file is described, and we do want to get bullets in that one spot I commented on since otherwise it's all on the same line in the report -
Otherwise, good to go! 🐈
Sorry about that, I thought I had added the bullets but they should be there now! And then I updated the README to include the new references we're using. Since I made some small changes to functions that are used in other places used by some of the scripts run in CI, I'm going to make sure checks pass before merging this. |
Purpose/implementation Section
Please link to the GitHub issue that this pull request addresses.
Closes #940
What is the goal of this pull request?
Here I'm adding a notebook that looks at expression of marker genes for tumor cell states in 2 different Ewing libraries, SCPCL000822 and SCPCL000824. Generally I am using this notebook to answer the following questions:
Briefly describe the general approach you took to achieve this goal.
The first thing I did was pick out the tumor cells. To do this I'm looking at the output from
aucell-singler-annotation.sh
andevaluate-clusters.sh
. Theaucell-singler-annotation.sh
workflow identifies tumor cells using bothSingleR
andAUCell
and the clustering workflow looks at different cluster assignments and gene expression for tumor marker genes across the clusters. I use the output of both to pick out clusters of cells that are tumor cells.After getting the cell type assignments, I looked at the expression of the marker genes in the
tumor-cell-state-markers.tsv
file and the genes in the custom gene signatures we added as part of #971. To do this I calculated the sum of all marker genes in the gene set and used that for plotting throughout the notebook. I noticed that for both samples there was a cluster of cells that had high expression of EWS-FLI1 low targets, so I pulled those out as tumor cells too.Once I had the tumor cells, I re-did clustering on just those cells. Then I looked at expression of the gene sets in each of those clusters to see if I could assign clusters to different cell states (mainly EWS-FLI1 low/high) based on expression of that gene set. I was able to easily pick out low vs. high cells, but there looks like there might be some variation and potentially a group of "EWS-FLI1 middle" cells that express EWS-FLI1 high marker genes but to a lesser degree.
A few key observations that I made along the way:
If known, do you anticipate filing additional pull requests to complete this analysis module?
Yes
I think the first thing we want to do is finalize cell type annotations using the SingleR/ clustering results and then classify cells as EWS-FLI1 high/low. I think we can add in the "middle" group later.
Result
What types of results does your code produce (e.g., table, figure)?
Here's the rendered copy of the notebook for review:
06-tumor-cell-state-assignment.html.zip
As part of the notebook I also saved the annotations to a TSV file.
What is your summary of the results?
Author checklists
Check all those that apply.
Note that you may find it easier to check off these items after the pull request is actually filed.
Analysis module and review
README.md
has been updated to reflect code changes in this pull request.Reproducibility checklist
Dockerfile
.environment.yml
file.renv.lock
file.