- install openst
- clone local fork (https://github.com/saluic/openst) - contains alot of fixes for the barcode preprocessing code
- create conda environment from
<openst_repo>/environment.yaml
- install in editable mode
pip install -e <openst_repo>
- generate barcode coordinate files for each tile (tile = puck).
openst barcode_preprocessing \ --in-fastq fc/barcode_registration_R1.fastq.gz \ --out-path fc/raw_tiles \ --out-prefix "L3_tile_" \ --out-suffix ".txt.gz" \ --crop-seq 5:30 \ --rev-comp
- NB make sure
--out-suffix
is.txt
or.txt.gz
(because of this line - only hinted at in the openst docs)
- NB make sure
- post process the barcode files for a given sample
- a convenient way to do this is to make a folder of symlinks to the specific tiles for a sample
mkdir adult_mouse_hippocampus/data/raw_tiles for F in {tile1,tile2,[...],tileN}; \ do ln -s fc/raw_tiles/$F adult_mouse_hippocampus/data/raw_tiles/$F; \ done
- run deduplication/filtering
openst filter_sample_barcodes \ --sample-barcode-files adult_mouse_hippocampus/data/raw_tiles/* \ --out-path adult_mouse_hippocampus/data/tiles
- a convenient way to do this is to make a folder of symlinks to the specific tiles for a sample
- install, initialize, and configure spacemake. spacemake processes fastqs into the expression matrix.
- spacemake >=
v0.7.4
ships with openst configurations (PR #83), but this version is not on pypi yet. need to install from source, clone repo: https://github.com/rajewsky-lab/spacemake - update openst environment with
<spacemake_repo>/environment.yaml
conda env update --name openst --file environment.yaml
- install pulp ==
v2.7.0
(issue with snakemake dependency)pip install pulp==2.7.0
- download Dropseq-tools version 2.5.1. unzip somewhere
- download genome & annotation files
- install in editable mode
pip install -e <spacemake_repo>
- make a directory for spacemake runs. run all subsequent steps from within this directory.
- initialize spacemake
spacemake init \ --dropseq_tools <unzipped_dropseq_tools>
- copies
config.yaml
andpuck_data
into the current directory.puck_data/openst_coordinate_system.csv
has the coordinate offset for each tile relative to the flowcell, so tile coordinates can be translated to flowcell coordinates. replaceopenst_coordinate_system.csv
as necessary. also disable hexagonal meshing...
- copies
- configure spacemake with the
species
. I usedGRCm39.genoma.fa
andgencode.vM34.annotation.gtf
from gencode M34.spacemake config add_species \ --name mouse \ --sequence <genome.fa> \ --annotation <annotation.gtf>
- spacemake >=
- add samples to project and run spacemake. i.e. generate expression matrix.
- add samples to project
spacemake projects add_sample \ --project_id adult_mouse_hippocampus \ --sample_id sample1 \ --R1 adult_mouse_hippocampus/data/adult_mouse_hippocampus_R1_001.fastq.gz \ --R2 adult_mouse_hippocampus/data/adult_mouse_hippocampus_R2_001.fastq.gz \ --species mouse \ --puck openst \ --run_mode openst \ --barcode_flavor openst \ --puck_barcode_file adult_mouse_hippocampus/data/tiles/* \ --map_strategy "STAR:genome:final"
- run spacemake
spacemake run \ --cores <n_cores>
- add samples to project
- stitch pucks together (
openst spatial_stitch
) to generate one expression matrixopenst spatial_stitch \ --tiles adult_mouse_hippocampus/processed_data/sample1/illumina/complete_data/dge/dge.all.polyA_adapter_trimmed.mm_included.spatial_beads_*.h5ad \ --tile-coordinates adult_mouse_hippocampus/data/fc_2_coordinate_system.csv \ --output adult_mouse_hippocampus/spacemake/stitched.h5ad
- alignment
- use anndata.obsm["spatial"] for capture-area coordinates (vs anndata.obs["x/y_pos"])