Skip to content

Latest commit

 

History

History
98 lines (97 loc) · 4.93 KB

stevens_notes.md

File metadata and controls

98 lines (97 loc) · 4.93 KB

Documentation

  1. openst
  2. spacemake
  3. Flow cell layout and tile naming (page 73-74)
  4. FastQ file format

Steps

  1. install openst
    1. clone local fork (https://github.com/saluic/openst) - contains alot of fixes for the barcode preprocessing code
    2. create conda environment from <openst_repo>/environment.yaml
    3. install in editable mode
      pip install -e <openst_repo>
      
  2. generate barcode coordinate files for each tile (tile = puck).
    openst barcode_preprocessing \
    --in-fastq fc/barcode_registration_R1.fastq.gz \
    --out-path fc/raw_tiles \
    --out-prefix "L3_tile_" \
    --out-suffix ".txt.gz" \
    --crop-seq 5:30 \
    --rev-comp
    
    1. NB make sure --out-suffix is .txt or .txt.gz (because of this line - only hinted at in the openst docs)
  3. post process the barcode files for a given sample
    1. a convenient way to do this is to make a folder of symlinks to the specific tiles for a sample
      mkdir adult_mouse_hippocampus/data/raw_tiles
      for F in {tile1,tile2,[...],tileN}; \
      do ln -s fc/raw_tiles/$F adult_mouse_hippocampus/data/raw_tiles/$F; \
      done
      
    2. run deduplication/filtering
      openst filter_sample_barcodes \
      --sample-barcode-files adult_mouse_hippocampus/data/raw_tiles/* \
      --out-path adult_mouse_hippocampus/data/tiles
      
  4. install, initialize, and configure spacemake. spacemake processes fastqs into the expression matrix.
    1. spacemake >= v0.7.4 ships with openst configurations (PR #83), but this version is not on pypi yet. need to install from source, clone repo: https://github.com/rajewsky-lab/spacemake
    2. update openst environment with <spacemake_repo>/environment.yaml
      conda env update --name openst --file environment.yaml
      
    3. install pulp == v2.7.0 (issue with snakemake dependency)
      pip install pulp==2.7.0
      
    4. download Dropseq-tools version 2.5.1. unzip somewhere
    5. download genome & annotation files
    6. install in editable mode
      pip install -e <spacemake_repo>
      
    7. make a directory for spacemake runs. run all subsequent steps from within this directory.
    8. initialize spacemake
      spacemake init \
      --dropseq_tools <unzipped_dropseq_tools>
      
      1. copies config.yaml and puck_data into the current directory. puck_data/openst_coordinate_system.csv has the coordinate offset for each tile relative to the flowcell, so tile coordinates can be translated to flowcell coordinates. replace openst_coordinate_system.csv as necessary. also disable hexagonal meshing...
    9. configure spacemake with the species. I used GRCm39.genoma.fa and gencode.vM34.annotation.gtf from gencode M34.
      spacemake config add_species \
      --name mouse \
      --sequence <genome.fa> \
      --annotation <annotation.gtf>
      
  5. add samples to project and run spacemake. i.e. generate expression matrix.
    1. add samples to project
      spacemake projects add_sample \
      --project_id adult_mouse_hippocampus \
      --sample_id sample1 \
      --R1 adult_mouse_hippocampus/data/adult_mouse_hippocampus_R1_001.fastq.gz \
      --R2 adult_mouse_hippocampus/data/adult_mouse_hippocampus_R2_001.fastq.gz \
      --species mouse \
      --puck openst \
      --run_mode openst \
      --barcode_flavor openst \
      --puck_barcode_file adult_mouse_hippocampus/data/tiles/* \
      --map_strategy "STAR:genome:final"
      
    2. run spacemake
      spacemake run \
      --cores <n_cores>
      
  6. stitch pucks together (openst spatial_stitch) to generate one expression matrix
    openst spatial_stitch \
    --tiles adult_mouse_hippocampus/processed_data/sample1/illumina/complete_data/dge/dge.all.polyA_adapter_trimmed.mm_included.spatial_beads_*.h5ad \
    --tile-coordinates adult_mouse_hippocampus/data/fc_2_coordinate_system.csv \
    --output adult_mouse_hippocampus/spacemake/stitched.h5ad
    
  7. alignment
    1. use anndata.obsm["spatial"] for capture-area coordinates (vs anndata.obs["x/y_pos"])