non-Rhabdomyosarcoma Soft Tissue Sarcoma Dataset Annotation (SCPCP000013) #604
yutarohtanaka
started this conversation in
Propose a new analysis
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Proposed analysis
We plan to perform the annotation of snRNA-seq samples of different non-rhabdomyosarcoma soft tissue sarcomas in the SCPCP000013 (n=34) dataset. Our processing and cell type annotation will include filtering for ambient and background RNA, filtering for low quality nuclei and doublets, cell type annotation, and malignant cell annotation.
Scientific goals
To share a validated, curated set of cell type annotations for the wilms tumor samples in this dataset.
Methods or approach
Filtering for Ambient RNA
CellBender is a computational tool that is able to remove the ambient / background RNA from count matrices. We will compare the performance of CellBender to the DropletUtils::emptyDropsCellRanger() (which we understand has been performed for the “filtered counts” provided) to evaluate the best performing method on this data, and remove all potential background RNA.
(More than happy to skip this step if it would be preferable to use the emptyDrops-filtered matrices)
Filtering for Low Quality Nuclei
Here, “low quality nuclei” are defined as nuclei with less than 300 genes or 500 UMI counts expressed, or more than 6000 genes or 50,000 UMI counts expressed. Additionally, we filter out nuclei that have no ribosomal gene expression, more than 20% and 5% of mitochondrial and hemoglobin genes respectively, over total expressed genes. We use scanpy built-in functions to perform this.
We will also filter out any sparsely expressed genes that are expressed in less than 5 cells.
Filtering for Doublets
We have primarily used scrublet in our prior work, and found that it is able to identify doublets (and multiplets) with reasonable confidence. Here, we plan to use scrublet to call and filter out any potential doublets in each sample.
Annotating Cell Types
We will perform two separate methods of cell type annotation - a manually curated marker cell identification based approach, and a supervised machine learning approach - to increase confidence and granularity of cell types annotated.
Existing modules
This processing and cell type annotation workflow largely follows the existing documentation in #292 (Ewing Sarcoma), with some adaptations. We expect to follow the same processes as described in #601 and #602.
Input data
We will start with the count matrices (the .h5ad “unfiltered counts file”) provided in the SCPCP000006 data repository. The analysis will be conducted using publicly available packages, and we will provide a final curated table of cell type markers including references used along with the cell types annotation files.
Scientific literature
(CellBender) https://www.nature.com/articles/s41592-023-01943-7
(scanpy) https://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1382-0
(decouplerpy) https://doi.org/10.1093/bioadv/vbac016
(CellTypist) https://www.cell.com/cell/fulltext/S0092-8674(23)01312-0
(CellxGene) https://doi.org/10.1101/2023.10.30.563174
(inferCNVpy) https://github.com/icbi-lab/infercnvpy
Other details
All of this analysis will be able to be performed on our local and cloud environments, and will predominantly be conducted in Python.
We plan to have all of this annotation performed and available to share within the next two months.
Beta Was this translation helpful? Give feedback.
All reactions