BF528 | Applications in Translational Bioinformatics Final Project
The pancreas is a complex organ comprised of a diverse set of cell types. Proper function of the pancreas is required to maintain healthy metabolism, and pancreatic dysfunction leads to serious illnesses. In their 2016 study, A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-Cell Population Structure, Baron et al. performed single cell RNA sequencing in a set of post-mortem human donor pancreatic cells from four subjects and two mouse models to better understand the cellular diversity in the pancreas. Analysis of the data identified previously known cell types as well as rare and novel cell type subpopulations, and created a more detailed characterization of the diversity of those cell types. In this project, we will attempt to replicate their primary findings using current analytical methodology and software packages.
This is a continuation of BF528 Project 4, which can be found here
- Process the barcode reads of a single cell sequencing dataset
- Perform cell-by-gene quantification of UMI counts
- Perform quality control on a UMI counts matrix
- Analyze the UMI counts to identify clusters and marker genes for distinct cell type populations
- Ascribe biological meaning to the clustered cell types and identify novel marker genes associated with them
Baron M, Veres A, Wolock SL, et al. A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure. Cell Syst. 2016;3(4):346-360.e4. doi:10.1016/j.cels.2016.08.011
salmon_prep.qsub
- Using the Gencode v.37 humange genome, this file creates a reference index of the human transcriptome and transcript-to-gene map, to be used for Salmon Alevinwhitelist.qsub
- This file generates a whitelist of barcodes that meet a particular minimum sequencing depth threshold. The mean number of reads for each file was used as the thresholdsalmon_alevin.qusb
- Runs Salmon Aleven program, both on individual files and on all files simultaneously. More information on Salmon Alevin can be found here.data_curator.R
- Generates summary statistics on the UMI counts matrices generated by Salmon Alevin, including dumulative distribution plots of the distinct UMIs per barcodeprogrammer.R
- The UMI count matrix generated above is loaded, and processed using the Seurat standard pre-processing workflow. Low quality reads are filtered, and cells are clustered into subpopulationsanalyst.R
- The cell subpopulations are classified into distinct cell types, using the marker genes provided in the Supplementary Data of Baron et al.. Marker genes for each of these cell types are retained, with a list of novel marker genes exported for further analysis
Final Report: A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure