snATAC-seq processing and manual annotation pipeline for part of ENCODE single-cell data collection Please see the Experiment-ID.txt for more information.
Please contact Yu Fu (yu.fu2@umassmed.edu) if you have any question!
- install conda
- install python
conda env create -f environment.yml
python downloadData.py $EXP $File $User $pswd
tar -xvf $EXP.tar.gz
Fragment File path: encode_scatac_dcc_2/results/[EXP]-1/fragments/fragments.tsv.gz )
Rscript run-ArchR.R $file $EXP $biosampleID
mkdir $EXP/MarkerFeatures
Can add "PanglaoDB" for more canonical marker genes)
awk -v tissue=$tissue '{if (($2==tissue) || ($2=="multiple")) print $1}' > markers
two ways of cell type annotation
Step 3A: Directly overlap marker genes list with marker features passing a certain FDR and Log2FC threshold
Rscript Pull-Marker-Gene.R $EXP/Save-ArchR-Project.rds $EXP/MarkerFeatures
./predict-cell-type-normax.sh $EXP/MarkerFeatures/ markers > overlap-results
Rscript Draw-UMAP.R $EXP/Save-ArchR-Project.rds $EXP
Rscript Draw-Marker-Heatmap.R $EXP/Save-ArchR-Project.rds markers $EXP
Rscript Draw-Dot-Plot $Exp/Save-ArchR-Project.rds markers $EXP
Saved in proj$Labels and write to master table
Rscript Draw-UMAP-with-Label.R $EXP/Save-ArchR-Project.rds $EXP