Merge pull request #32 from rajewsky-lab/fast_cmdline

Fast cmdline
rajewsky-lab · Apr 12, 2024 · a8687b1 · a8687b1
2 parents 13ab58e + ea3ec41
commit a8687b1
Show file tree

Hide file tree

Showing 28 changed files with 1,742 additions and 1,604 deletions.
diff --git a/docs/computational/generate_expression_matrix.md b/docs/computational/generate_expression_matrix.md
@@ -19,15 +19,15 @@ refer to the [cellpose](https://cellpose.readthedocs.io/en/latest/index.html) do
 
 ```sh
 openst segment \
-    --adata <path_to_aligned_h5ad> \
+    --h5-in <path_to_aligned_h5ad> \
     --image-in <image_in_path> \
-    --output-mask <mask_out_path> \
-    --model <path>/HE_cellpose_rajewsky \
-    --chunked \ # divides the image into smaller chunks (lower memory usage)
-    --gpu \ # uses GPU for segmentation (Nvidia CUDA)
-    --num-workers 8 \ # processes the image in parallel
-    --dilate-px 10 # will extend the segmentation 10 micron around
+    --mask-out <mask_out_path> \
+    --model <path>/HE_cellpose_rajewsky
+    # --device cuda \ # uses GPU for segmentation, if available
+    # --chunked \ # specify if you run out of GPU memory - segments in chunks
 ```
+By default, segmentation is extended radially 10 pixels. This can be changed with the argument `--dilate-px`.
+
 Make sure to replace the placeholders (`<...>`). For instance,
 `<path_to_aligned_h5ad>` is the full path to the `h5ad` file [after pairwise alignment](pairwise_alignment.md#expected-output); 
 `<image_in_path>` is the path to the image - a path to a file, or a location inside the `h5ad` file,
@@ -45,13 +45,10 @@ for segmentation of H&E images The rest of parameters can be checked with `opens
 
      ```sh
      openst segment \
-          --adata <path_to_aligned_h5ad> \
+          --h5-in <path_to_aligned_h5ad> \
           --image-in <image_in_path> \
-          --output-mask <mask_out_path_larger> \
+          --mask-out <mask_out_path_larger> \
           --model <path>/HE_cellpose_rajewsky \
-          --chunked \ # divides the image into smaller chunks (lower memory usage)
-          --gpu \ # uses GPU for segmentation (Nvidia CUDA)
-          --num-workers 8 \ # processes the image in parallel
           --dilate-px 50 \
           --diameter 50 # diameter for the larger cell type
      ```
@@ -65,7 +62,7 @@ for segmentation of H&E images The rest of parameters can be checked with `opens
 
      ```sh
      openst segment_merge \
-          --adata <path_to_aligned_h5ad> \
+          --h5-in <path_to_aligned_h5ad> \
           --mask-in <mask_a> <mask_b>
           --mask-out <mask_combined>
      ```
@@ -81,11 +78,10 @@ This step allows you to associate capture spots with segmented cells.
 
 ```sh
 openst transcript_assign \
-    --adata <path_to_aligned_h5ad> \
+    --h5-in <path_to_aligned_h5ad> \
     --spatial-key spatial_pairwise_aligned_fine \
-    --mask-in-adata \
-    --mask <mask_out_path> \
-    --output <path_to_sc_h5ad>
+    --mask-in <mask_out_path> \
+    --h5-out <path_to_sc_h5ad>
 ```
 
 Replace the placeholders (`<...>`) as before; in this case, the placeholder `<mask_in_path>` must be set to

diff --git a/docs/computational/pairwise_alignment.md b/docs/computational/pairwise_alignment.md
@@ -18,7 +18,7 @@ The alignment workflow consists of two steps, that can be performed [automatical
 ## Required input data
 For automatic and manual alignment, two inputs are required: (1) a stitched tile-scan of 
 the staining image (see [Preprocessing of imaging](preprocessing_imaging.md)), and (2) a
-single [h5ad] file containing all the [barcoded tiles] of a sample 
+single [h5ad] file containing all the [barcoded tiles](preprocessing_sequencing.md#flow-cell-related-terms) of a sample 
 (see [Preprocessing of sequencing](preprocessing_sequencing.md)). 
 
 !!! warning
@@ -56,135 +56,105 @@ spots.
 [h5ad]: https://anndata.readthedocs.io/en/latest/fileformat-prose.html
 
 ## Automated workflow
-If you want to save time (😉), we provide a script that performs coarse and fine steps of alignment 
+If you want to save time, we provide a script that performs coarse and fine steps of alignment 
 automatically, by leveraging computer vision algorithms. To do so, make sure that you have the [necessary
 input data](#required-input-data); then, open a termina, type and run the following command (just an example):
 
 ```bash
 openst pairwise_aligner \
-    --image-in <path_image>/Image_Stitched_Composite.tif \
-    --h5-in <path>/<id>_stitched_spots.h5ad \
-    --h5-out <path>/<id>_stitched_spots_aligned.h5ad \
-    --metadata-out <path>/<id>_alignment_metadata.json \
-    --save-image-in-h5 \
-    --n-threads <cores>
+    --image-in Image_Stitched_Composite.tif \
+    --h5-in spatial_stitched_spots.h5ad \
+    --metadata alignment_metadata.json \
+    # --device cuda # by default, cpu. Only specify if you have cuda-compatible GPU
+    # --only-coarse # skips the fine fiducial registration, in case you want to do that manually
 ```
 
-Make sure to replace the placeholders (`<...>`). For instance, `<path_image>` in the `--image-in` command 
-with the folder containing the [stitched images](preprocessing_imaging.md) of the current sample, 
-`<path>` in the `--h5-in` and `--h5-out` arguments  to contain the folder containing the 
-[stitched barcoded tiles](preprocessing_sequencing.md), and `<id>` with the `sample_id` as 
-defined in the spacemake project. **Importantly**, make sure to specify a path where the metadata output file 
-should be created via `--metadata-out`; this will be useful for a visual assessment of whether
-automated alignment worked or not. You can set `<cores>` depending on your machine (it defaults to 1).
+Make sure to specify a path where the metadata output file 
+should be created via `--metadata`, for later visual assessment of the alignment.
 
 If you want to run only the coarse phase of the pairwise alignment (i.e., to run the fine
-alignment [yourself](#manual-workflow)), you can specify the argument `--only-coarse`. If you have a CUDA-compatible
-GPU in your machine, you can specify the argument `--device cuda` to accelerate feature detection and matching. 
+alignment [yourself](#manual-workflow)), you can specify the argument `--only-coarse`.
 
 !!! note
     For aligning STS to H&E-stained tissues, **we recommend** leaving the arguments with the default values. 
     Anyway, you can get a full list of configurable parameters by running `openst pairwise_aligner --help`.
 
-### Visual assessment of alignment
-Before proceeding to generating a cell-by-gene matrix, **we strongly recommend** visually assessing the alignment.
-For this, make sure that you specified a file name in the `--metadata-out`; then, you can open a terminal and run
-the following command:
+!!! tip
+    Right after automatic alignment and before
+    proceeding to segmentation and aggregating into a cell-by-gene matrix,
+    **we strongly recommend** visually assessing the alignment results. Specifically, that
+    the tissue is overall well-aligned in both modalities, and that the fiducial markers are
+    overlapping across all tiles.
 
-```sh
-openst report --metadata=<metadata_file> --html-out=<path_to_html_file>
-```
-
-Make sure to replace the placeholders (`<...>`) with the path to the metadata file (under `--metadata` argument), and
-the desired path and filename of the HTML report that will be generated (under `--html-out` argument). Running this command
-will create a HTML report containing images of the STS and staining image before and after alignment (coarse and/or fine,
-depending on the configuration).
-
-## Manual workflow
-If the automatic alignment is not successful, or you prefer to run the alignment yourself, we provide a
-Graphical User Interface (GUI) tool that allows to perform manual alignment. This GUI only needs an input file
-generated by the `openst` pipeline containing the spatial transcriptome (barcoded spots-by-genes) and the staining
-image, for visual reference. Such a file is generated upon running `openst pairwise_aligner` (if you'd like to
-refine an automatic alignment). Alternatively, you can start from the [same files as in the automated workflow](#required-input-data),
-and merge them with the command:
+### Visual assessment with report
+You can generate an HTML report that contains a qualitative summary of the alignment (images, parameters...)
 
 ```sh
-openst manual_pairwise_aligner \
-    --prepare-data \
-    --h5-in <path>/<id>_stitched_spots_aligned.h5ad \
-    --image-in <path_image>/Image_Stitched_Composite.tif
+openst report --metadata=alignment_metadata.json --html-out=alignment_report.html
 ```
 
-Make sure to replace the placeholders (`<...>`). For instance, `<path_image>` in the `--image-in` command 
-with the folder containing the [stitched images](preprocessing_imaging.md) of the current sample, 
-`<path>` in the `--h5-in` argument to contain the folder containing the 
-[stitched barcoded tiles](preprocessing_sequencing.md), and `<id>` with the `sample_id` as 
-defined in the spacemake project. 
-
-!!! warning
-    When `--h5-out` is not provided, the image will be stored as a new layer in the
-    file specified at `--h5-in`.
-
-Once you have prepared the input file, you can run the GUI with the following command (no arguments are required)
-
+### Visual assessment with GUI
+Alternatively, you can visualize the images & ST data interactively using the GUI.
+- With the GUI:
 ```sh
 openst manual_pairwise_aligner_gui
 ```
 
-This GUI can be used to align the images from scratch, or to validate and refine the results from
-automatic coarse and/or fine alignment. In any case, the GUI is the same: we provide a video walthrough
-on how to use this GUI tool for all these cases. In summary, the GUI will allow you to create a `json` file
-containing a list of pairs of corresponding points between the staining and spatial transcriptome coordinates.
-This can be used later to transform the coordinates of the spatial trancriptomics to match the coordinate system of the image.
+We provide a Graphical User Interface (GUI) for selecting keypoints between imaging & ST modalities, 
+for visualization and refinement of automatic results. This GUI requires a single Open-ST h5 object,
+the output of `openst pairwise_aligner`.
 
----
+## Manual/semiautomatic workflow
+We provide a Graphical User Interface (GUI) for selecting keypoints between imaging & ST modalities, 
+for full manual alignment or refinement of automatic results. This GUI requires a single Open-ST h5 object
+(after spatial stitching). There are two kinds of workflow:
 
-:fontawesome-brands-youtube:{ style="color: #EE0F0F" }
-__[Walkthrough of the GUI for manual alignment]__ by @danilexn – :octicons-clock-24:
-5m – Learn how to align STS and imaging data in a step-by-step guide.
+=== "Fully manual alignment"
 
-  [Walkthrough of the GUI for manual alignment]: https://www.youtube.com
-
----
-
-Once the `json` list of point correspondences has been generated with the GUI, you can run the following command to transform
-the coordinates of the spatial transcriptomics to match the coordinates of the staining image. We provide three alternative commands,
-depending on whether you chose to align from scratch, make a manual fine alignment from a (manual or automated) coarse alignment, 
-or refine an automated (or manual) fine alignment
+    ``` sh
+    # Add the image data to the Open-ST h5 object
+    openst merge_modalities \
+        --h5-in spatial_stitched_spots.h5ad \
+        --image-in Image_Stitched_Composite.tif
 
-=== "Coarse from raw"
+    # Use the GUI to select the keypoints.json file
+    openst manual_pairwise_aligner_gui
 
-    ``` sh
+    # Compute a rigid transformation from keypoints.json and apply it to the data
     openst manual_pairwise_aligner \
-        --keypoints-json <path_to_keypoints.json> \
-        --h5-in <path_to_sts.h5ad> \
-        --coarse
+        --keypoints-in keypoints.json \
+        --h5-in spatial_stitched_spots.h5ad \
+        --spatial-key-in 'obsm/spatial' \
+        --spatial-key-out 'obsm/spatial_manual_transformed'
+        ## --per-tile # when specified, there's one rigid transform per tile
     ```
 
-=== "Fine from coarse"
+=== "Semiautomatic alignment"
 
     ``` sh
+    # Use the GUI to select the keypoints.json file
+    openst manual_pairwise_aligner_gui
+
+    # Compute a rigid transformation from keypoints.json and apply it to the data
     openst manual_pairwise_aligner \
-        --keypoints-json <path_to_keypoints.json> \
-        --h5-in <path_to_sts.h5ad> \
-        --fine
+        --keypoints-in keypoints.json \
+        --h5-in spatial_stitched_spots.h5ad \
+        --spatial-key-in 'obsm/spatial_pairwise_aligned_fine' \
+        --spatial-key-out 'obsm/spatial_pairwise_aligned_refined'
+        ## --per-tile # when specified, there's one rigid transform per tile
     ```
 
-=== "Refine from fine"
+We provide a video showcasing the GUI, with an illustrative example of refinement from (coarse) automated alignment.
 
-    ``` sh
-    openst manual_pairwise_aligner \
-        --keypoints-json <path_to_keypoints.json> \
-        --h5-in <path_to_sts.h5ad> \
-        --refine
-    ```
+---
+
+:fontawesome-brands-youtube:{ style="color: #EE0F0F" }
+__[Walkthrough of the GUI for manual alignment]__ by @danilexn – :octicons-clock-24:
+5m – Learn how to visualize and align STS and imaging data in a step-by-step guide.
 
-Make sure to replace the placeholders (`<...>`). For instance,
-`<path_to_keypoints.json>` is the `json` file generated with the GUI, and  in the `--h5-in` argument to contain the folder containing the 
-[stitched barcoded tiles](preprocessing_sequencing.md), and `<path_to_sts.h5ad>` is the path to the h5ad file that was loaded
-with the GUI. Running the command above will generate a new [obsm] layer containing the transformed spatial transcriptome coordinates
+  [Walkthrough of the GUI for manual alignment]: https://www.youtube.com
 
-[obsm]: https://anndata.readthedocs.io/en/latest/generated/anndata.AnnData.obsm.html
+---
 
 ## Expected output
 After running the automatic or manual alignment, you must have a single `h5ad` file, containing the transformed spatial coordinates.

diff --git a/docs/computational/preprocessing_imaging.md b/docs/computational/preprocessing_imaging.md
@@ -23,10 +23,10 @@ Open a terminal, and run the following command:
 openst image_stitch \
      --microscope='keyence' \
      --imagej-bin=<path_to_fiji_or_imagej> \
-     --input-dir=<path_to_tiles> \
+     --image-indir=<path_to_tiles> \
      --tiles-prefix=<to_read> \
      --tmp-dir=<tmp_dir> \
-     --output-image=<output_image>
+     --image-out=<output_image>
 ```
 Make sure to replace the placeholders (`<...>`). For instance,
 `<path_to_fiji_or_imagej>` is the path where the [Fiji](https://imagej.net/software/fiji/downloads) executable is;

diff --git a/docs/computational/preprocessing_sequencing.md b/docs/computational/preprocessing_sequencing.md
@@ -48,8 +48,8 @@ encoded in the `bcl` and `fastq` files. To obtain per-tile barcodes and coordina
 
 ```sh
 openst barcode_preprocessing \
-    --in-fastq <fastq_of_tile> \
-    --out-path <out_path> \
+    --fastq-in <fastq_of_tile> \
+    --tilecoords-out <out_path> \
     --out-suffix <out_suffix> \
     --out-prefix <out_prefix> \
     --crop-seq <len_int> \
@@ -65,7 +65,7 @@ files will be written; `<out_suffix>` and `<out_prefix>` are suffixes and prefix
 must be written into the `csv` as their reverse-complementary; `--single-tile` argument is provided when the `fastq` file only contains data for
 a single tile (**our recommendation**).
 
-The code above will generate a file in `<out_path` per tile. Only a single fastq file can be provided at a time via `--in-fastq`. To
+The code above will generate a file in `<out_path` per tile. Only a single fastq file can be provided at a time via `--fastq-in`. To
 process this in parallel, you can run the following snippets (in Linux, assuming you start from the `fastq` files). We assume that
 you have a file `lanes_and_tiles.txt`, that contains the tile identifiers that you want to process; you can generate this file with:
 
@@ -77,11 +77,10 @@ where `RunInfo.xml` is a file contained in the basecalls directory. *We don't en
 this code snippet works* 🙈. Then, you can process various `fastq` files in the basecalls directory as follows:
 
 ```sh
-cat lanes_and_tiles.txt | xargs xargs -n 1 -P <parallel_processes> -I {} \
+cat lanes_and_tiles.txt | xargs -n 1 -P <parallel_processes> -I {} \
     sh -c 'openst barcode_preprocessing \
-                --in-fastq <fastq_dir>/{}/Undetermined_S0_R1_001.fastq.gz \
-                --in-fastq <fastq_of_tile> \
-                --out-path <out_path> \
+                --fastq-in <fastq_dir>/{}/Undetermined_S0_R1_001.fastq.gz \
+                --tilecoords-out <out_path> \
                 --out-suffix .txt \
                 --out-prefix <out_prefix>"{}" \
                 --crop-seq <len_int> \
@@ -99,14 +98,13 @@ Otherwise, if you start from `bcl` files (raw basecalls), you can run demultiple
 simultaneously to generating the barcode spatial coordinate file:
 
 ```sh
-cat lanes_and_tiles.txt | xargs xargs -n 1 -P <parallel_processes> -I {} \
+cat lanes_and_tiles.txt | xargs -n 1 -P <parallel_processes> -I {} \
     sh -c 'bcl2fastq -R <bcl_in> --no-lane-splitting \
                 -o <bcl_out>/"{}" --tiles s_"{}"; \
 
             openst barcode_preprocessing \
-                --in-fastq <bcl_out>/{}/Undetermined_S0_R1_001.fastq.gz \
-                --in-fastq <fastq_of_tile> \
-                --out-path <out_path> \
+                --fastq-in <bcl_out>/{}/Undetermined_S0_R1_001.fastq.gz \
+                --tilecoords-out <out_path> \
                 --out-suffix .txt \
                 --out-prefix <out_prefix>"{}" \
                 --crop-seq <len_int> \
@@ -161,7 +159,7 @@ To manually create 'puck_collection' files, you can run the following in a termi
 openst spatial_stitch \
     --tiles <space_separated_list_or_wildcards_to_h5ad> \
     --tile-coordinates <path_to_coordinate_system> \
-    --output <output_puck_collection_h5ad>
+    --h5-out <output_puck_collection_h5ad>
 ```
 
 This program has additional arguments that are explained when running `openst spatial_stitch --help`. Make sure to replace

diff --git a/docs/examples/adult_mouse/generate_expression_matrix.md b/docs/examples/adult_mouse/generate_expression_matrix.md
@@ -20,16 +20,16 @@ contains the spatial transcriptome coordinates and staining image after coarse+f
 
 ```sh
 openst segment \
-    --adata alignment/openst_demo_adult_mouse_spatial_beads_puck_collection_aligned.h5ad \
+    --h5-in alignment/openst_demo_adult_mouse_spatial_beads_puck_collection_aligned.h5ad \
     --image-in 'uns/spatial_pairwise_aligned/staining_image_transformed' \
-    --output-mask 'uns/spatial_pairwise_aligned/mask_transformed_0px' \
+    --mask-out 'uns/spatial_pairwise_aligned/mask_transformed_0px' \
     --model models/HE_cellpose_rajewsky \
     --chunked \
-    --gpu \
+    --device cuda \
     --num-workers 8
 ```
 
-After running this command, the segmentation mask is created and stored in the same `--adata` file, under
+After running this command, the segmentation mask is created and stored in the same `--h5-in` file, under
 the dataset `uns/spatial_pairwise_aligned/mask_transformed_0px`.
 
 ## Assigning transcripts to segmented cells
@@ -44,11 +44,10 @@ This step allows you to aggregate capture spots by segmented cells:
 
 ```sh
 openst transcript_assign \
-    --adata alignment/openst_demo_adult_mouse_spatial_beads_puck_collection_aligned.h5ad \
+    --h5-in alignment/openst_demo_adult_mouse_spatial_beads_puck_collection_aligned.h5ad \
     --spatial-key spatial_pairwise_aligned_fine \
-    --mask-in-adata \
-    --mask 'uns/spatial_pairwise_aligned/mask_transformed_0px' \
-    --output alignment/openst_demo_adult_mouse_by_cell.h5ad
+    --mask-in 'uns/spatial_pairwise_aligned/mask_transformed_0px' \
+    --h5-out alignment/openst_demo_adult_mouse_by_cell.h5ad
 ```
 
 ## Expected output

diff --git a/docs/examples/adult_mouse/pairwise_alignment.md b/docs/examples/adult_mouse/pairwise_alignment.md
@@ -66,7 +66,7 @@ coarse alignment, and the keypoints file, to perform the fine alignment:
 
 ```sh
 openst manual_pairwise_aligner \
-    --keypoints-json alignment/openst_adult_demo_fine_keypoints.json \
+    --keypoints-in alignment/openst_adult_demo_fine_keypoints.json \
     --h5-in alignment/openst_demo_adult_mouse_spatial_beads_puck_collection_aligned.h5ad \
     --fine
 ```

diff --git a/docs/examples/adult_mouse/preprocessing_sequencing.md b/docs/examples/adult_mouse/preprocessing_sequencing.md
@@ -162,4 +162,4 @@ If you specified options for *meshing* in the `run_mode`, there will be a file c
 This contains *approximate* cell-by-gene information, as the transcripts are aggregated by a regular lattice and not by the true spatial arrangement of
 cells. This might be already enough for some analyses. 
 
-Anyway... keep going with the tutorial if you want to unleash the full potential of Open-ST 😉.
+Anyway... keep going with the tutorial if you want to unleash the full potential of Open-ST.