Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve_07_annotation #994

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

maud-p
Copy link
Contributor

@maud-p maud-p commented Jan 17, 2025

Purpose/implementation Section

Please link to the GitHub issue that this pull request addresses.

This PR is linked to the issue 856.

What is the goal of this pull request?

I aim here to improve the workflow of the annotations performed in 07_combined_annotation_across_samples_exploration.Rmd.

The suggested changes aim to overcome some misclassification that have been spotted in the first version of the annotation while working on the integration of the data.

  1. In the first version of the annotation, I realized that a lot of immune cells were filtered out because of the cnv-threshold. See the expression of PTPRC = CD45 below:
    image

We requested immune and endothelial cells to have no infered cnv to be labeled as normal. I think that the few CNV that were infered for immune cells are cell type specific expression of some genes that colocalized on one chromosome. Thus I decided to allow a bit of flexibility in regards to CNV for immune and endothelial cells and only based the annotation of immune and endothelial cells on the predicted.cell-type and associated prediction.score.

  1. Similar to the immune cells, there is a high probability that we also infer false positive CNV in normal epithelial/stroma cells. I am not sure that it will be possible to differentiate between normal cells with false positive CNV from cancer cells. For that reason, I decided to label as normal cells with a cnv-score <= cnv-threshold and as cancer cells with a cnv-score > cnv-threshold + 2.
    The choice of +2 is a bit arbitrary, but based on the approximate number of CNV that can be detected in endothelial and immune cells.

  2. There is the possibility of cancer cell without any CNV. It would be extremly difficult in that case to differentiate them from normal cells. The strategy I decided to go with to reduce the risk of misclassification is to use the prediction score of label transfer. Indeed, we expect normal cells to resemble the cells from the fetal kidney reference and thus have a high prediction.score.

  3. While working on the integration (see umap reduction here colored by cnv_score), I realized that samples -177, -180, -181, -190, -197 have strikingly very low cnv_score. The cnv_score is simply the number of chromosome presenting a CNV for each patient. Of note, these 5 samples are the ones with no immune and/or endothelial cells for which we couldn’t run infercnv properly. We ran infercnv without any reference. In that case, the mean over all cellular expression profiles is taken as a reference. If the sample is mostly composed of cancer cell, the cancer-associated CNV are taken as the normal reference...

image

I am afraid not to have a solution for this, but I am quite sure that the cells for these 5 patients are mis-classified. This will affect the downstream differential expression analysis cancer versus normal and the potential finding of Wilms tumor histology specific markers. This is why I would strongly recommand forcing the annotations for these patients to unknown.
image

  1. I slightly changed the summarized CNV heatmap with do_Feature_mean. Initially, the feature ploted with do_Feature_mean was prop_cnv_chr[i], the proportion of the chromosome i affected by a CNV. The function do_Feature_mean took then the mean over all cells in a group. This can be misleading are it is then impossible to interpret the mean value. Example with a mean of 0.5, we could say:
  • half of the cells in the group gained/lossed the entire chromosome, half are unaffected
  • all cells gained/lossed half of the chromosome
  • ... all other combinations possible!

For that reason, I changed the feature to has_cnv_chr[i], which is a binary information of the presence/absence of CNV in the chromosome i for the given cell. While taking the mean over all cells in a group, we then have the information of the percentage of cells within the group having any kind of CNV in this chromosome.

If known, do you anticipate filing additional pull requests to complete this analysis module?

-[ ] one for the integration of the samples
-[ ] one for differential expression analysis, looking for Wilms tumor, histology specific markers

Results

The annotation file is generated automatically while running the script and will be saved in the results folder.
The notebook is saved in the notebook folder.

Provide directions for reviewers

What are the software and computational requirements needed to be able to run the code in this PR?

Are there particularly areas you'd like reviewers to have a close look at?

Is there anything that you want to discuss further?

Author checklists

Analysis module and review

Reproducibility checklist

  • Code in this pull request has been added to the GitHub Action workflow that runs this module.
  • The dependencies required to run the code in this pull request have been added to the analysis module Dockerfile.
  • If applicable, the dependencies required to run the code in this pull request have been added to the analysis module conda environment.yml file.
  • If applicable, R package dependencies required to run the code in this pull request have been added to the analysis module renv.lock file.

@maud-p maud-p requested a review from jaclyn-taroni as a code owner January 17, 2025 12:32
@jaclyn-taroni jaclyn-taroni requested review from sjspielman and removed request for jaclyn-taroni January 17, 2025 12:36
@sjspielman
Copy link
Member

Hi @maud-p, just wanted to give you a heads-up for time frame! I anticipate getting to review this next Tuesday 1/21, since we are out of the office Monday 1/20 for a holiday and I don't think I'll have a chance to get to this today. Have a good weekend in the meantime! :)

@maud-p
Copy link
Contributor Author

maud-p commented Jan 17, 2025

Hi @sjspielman ,
Thank you for letting me know. Please no rush for it, as I wanted to send the PR before leaving for vacations ;)
I'll be back Monday 26/01.
I also wish you a great weekend!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants