documentation tweaks

nf-core · Oct 16, 2023 · 29ed89e · 29ed89e
1 parent 04f286f
commit 29ed89e
Show file tree

Hide file tree

Showing 3 changed files with 109 additions and 70 deletions.
diff --git a/docs/output.md b/docs/output.md
@@ -23,38 +23,28 @@ The pipeline consists of the following steps:
 
 ### Preprocessing
 
-The preprocessing step uses `pixelator single-cell concatenate` to create a full amplicon sequence from both single-end and paired-end data.
-It returns a single fastq per sample containing fixed length amplicons.
-This step will also calculate Q30 quality scores for different regions of the library.
-
 <details markdown="1">
 <summary>Output files</summary>
 
 - `pixelator`
 
-  - `concatenate`
+  - `amplicon`
 
     - `<sample-id>.merged.fastq.gz`:
       Combine R1 and R2 reads into full amplicon reads and calculate Q30 scores for the amplicon regions.
     - `<sample-id>.report.json`: Q30 metrics of the amplicon.
     - `<sample-id>.meta.json`: Command invocation metadata.
 
   - `logs`
-    - `<sample-id>.pixelator-concatenate.log`: pixelator log output.
+    - `<sample-id>.pixelator-amplicon.log`: pixelator log output.
 
 </details>
 
-### Quality control
-
-Quality control is performed using `pixelator single-cell preqc` and `pixelator single-cell adapterqc`.
-
-The preqc stage performs QC and quality filtering of the raw sequencing data.
-It also generates a QC report in HTML and JSON formats. It saves processed reads as well as reads that were
-discarded (i.e. were too short, had too many Ns, or too low quality, etc.). Internally `preqc`
-uses [Fastp](https://github.com/OpenGene/fastp), and `adapterqc`
-uses [Cutadapt](https://cutadapt.readthedocs.io/en/stable/).
+The preprocessing step uses `pixelator single-cell amplicon` to create full-length amplicon sequences from both single-end and paired-end data.
+It returns a single fastq file per sample containing fixed length amplicons.
+This step will also calculate Q30 quality scores for different regions of the library.
 
-The `adapterqc` stage checks for the presence and correctness of the pixel binding sequences. It also generates a QC report in JSON format. It saves processed reads as well as discarded reads (i.e. reads that did not have a match for both pixel binding sequences).
+### Quality control
 
 <details markdown="1">
 <summary>Output files</summary>
@@ -78,11 +68,17 @@ The `adapterqc` stage checks for the presence and correctness of the pixel bindi
 
 </details>
 
-### Demultiplexing
+Quality control is performed using `pixelator single-cell preqc` and `pixelator single-cell adapterqc`.
 
-The `pixelator single-cell demux` command assigns a marker (barcode) to each read. It also generates QC report in
-JSON format. It saves processed reads (one per antibody) as well as discarded reads with no match to the
-given barcodes/antibodies.
+The preqc stage performs QC and quality filtering of the raw sequencing data.
+It also generates a QC report in HTML and JSON formats. It saves processed reads as well as reads that were
+discarded (i.e. were too short, had too many Ns, or too low quality, etc.). Internally `preqc`
+uses [Fastp](https://github.com/OpenGene/fastp), and `adapterqc`
+uses [Cutadapt](https://cutadapt.readthedocs.io/en/stable/).
+
+The `adapterqc` stage checks for the presence and correctness of the pixel binding sequences. It also generates a QC report in JSON format. It saves processed reads as well as discarded reads (i.e. reads that did not have a match for both pixel binding sequences).
+
+### Demultiplexing
 
 <details markdown="1">
 <summary>Output files</summary>
@@ -101,16 +97,11 @@ given barcodes/antibodies.
 
 </details>
 
-### Duplicate removal and error correction
-
-This step uses the `pixelator single-cell collapse` command.
-
-The `collapse` command removes duplicate reads and performs error correction.
-This is achieved using the unique pixel identifier and unique molecular identifier sequences to check for
-uniqueness, collapse and compute a read count. The command generates a QC report in JSON format.
-Errors are allowed when collapsing reads if `--algorithm` is set to `adjacency` (this is the default option).
+The `pixelator single-cell demux` command assigns a marker (barcode) to each read. It also generates QC report in
+JSON format. It saves processed reads (one per antibody) as well as discarded reads with no match to the
+given barcodes/antibodies.
 
-The output format of this command is an edge list in CSV format.
+### Duplicate removal and error correction
 
 <details markdown="1">
 <summary>Output files</summary>
@@ -128,17 +119,16 @@ The output format of this command is an edge list in CSV format.
 
 </details>
 
-### Compute connected components
+This step uses the `pixelator single-cell collapse` command.
 
-This step uses the `pixelator single-cell graph` command.
-The input is the edge list dataframe (CSV) generated in the collapse step and after filtering it
-by count (`--graph_min_count`), the connected components of the graph (graphs) are computed and
-added to the edge list in a column called "component".
+The `collapse` command removes duplicate reads and performs error correction.
+This is achieved using the unique pixel identifier and unique molecular identifier sequences to check for
+uniqueness, collapse and compute a read count. The command generates a QC report in JSON format.
+Errors are allowed when collapsing reads if `--algorithm` is set to `adjacency` (this is the default option).
 
-The graph command has the option to recover components (technical multiplets) into smaller
-components using community detection to find and remove problematic edges.
-(See `--multiplet_recovery`). The information to keep track of the original and
-new (recovered) components are stored in a file (components_recovered.csv).
+The output format of this command is an edge list in CSV format.
+
+### Compute connected components
 
 <details markdown="1">
 <summary>Output files</summary>
@@ -162,13 +152,17 @@ new (recovered) components are stored in a file (components_recovered.csv).
 
 </details>
 
-### Cell-calling, filtering, and annotation
+This step uses the `pixelator single-cell graph` command.
+The input is the edge list dataframe (CSV) generated in the collapse step and after filtering it
+by count (`--graph_min_count`), the connected components of the graph (graphs) are computed and
+added to the edge list in a column called "component".
 
-This step uses the `pixelator single-cell annotate` command.
+The graph command has the option to recover components (technical multiplets) into smaller
+components using community detection to find and remove problematic edges.
+(See `--multiplet_recovery`). The information to keep track of the original and
+new (recovered) components are stored in a file (components_recovered.csv).
 
-The annotate command takes as input the edge list (CSV) file generated in the graph command. It parses, and filters the
-edgelist to find putative cells, and it will generate a pxl file containing the edgelist, and an
-(AnnData object)[https://anndata.readthedocs.io/en/latest/] as well as some useful medatadata.
+### Cell-calling, filtering, and annotation
 
 <details markdown="1">
 <summary>Output files</summary>
@@ -186,18 +180,13 @@ edgelist to find putative cells, and it will generate a pxl file containing the
     - `<sample-id>.pixelator-annotate.log`: pixelator log output.
     </details>
 
-### Downstream analysis
-
-This step uses the `pixelator single-cell analysis` command.
-Downstream analysis is performed on the `pxl` file generated by the previous stage.
-The results of the analysis is added to the pxl file.
-
-Currently, the following analysis can be performed (if enabled):
+This step uses the `pixelator single-cell annotate` command.
 
-- polarization scores (enable with `--compute_polarization`)
-- co-localization scores (enable with `--compute_colocalization`)
+The annotate command takes as input the edge list (CSV) file generated in the graph command. It parses, and filters the
+edgelist to find putative cells, and it will generate a pxl file containing the edgelist, and an
+(AnnData object)[https://anndata.readthedocs.io/en/latest/] as well as some useful medatadata.
 
-This step can be skipped using the `--skip_analysis` option.
+### Downstream analysis
 
 <details markdown="1">
 <summary>Output files</summary>
@@ -215,13 +204,18 @@ This step can be skipped using the `--skip_analysis` option.
 
 </details>
 
-### Generate reports
+This step uses the `pixelator single-cell analysis` command.
+Downstream analysis is performed on the `pxl` file generated by the previous stage.
+The results of the analysis is added to the pxl file.
 
-This step uses the `pixelator single-cell report` command.
-This step will collect metrics and outputs generated by previous stages
-and generate a report in HTML format for each sample.
+Currently, the following analysis can be performed (if enabled):
 
-This step can be skipped using the `--skip_report` option.
+- polarization scores (enable with `--compute_polarization`)
+- co-localization scores (enable with `--compute_colocalization`)
+
+This step can be skipped using the `--skip_analysis` option.
+
+### Generate reports
 
 <details markdown="1">
 <summary>Output files</summary>
@@ -234,6 +228,14 @@ This step can be skipped using the `--skip_report` option.
 
 </details>
 
+This step uses the `pixelator single-cell report` command.
+This step will collect metrics and outputs generated by previous stages
+and generate a report in HTML format for each sample.
+
+This step can be skipped using the `--skip_report` option.
+
+More information on the report can be found in the pixelator documentation [here](https://software.pixelgen.com/pixelator/outputs/web-report/)
+
 ### Pipeline information
 
 <details markdown="1">

diff --git a/docs/usage.md b/docs/usage.md
@@ -37,14 +37,13 @@ uropod_stimulated,D21,human-sc-immunology-spatial-proteomics,uropod_stimulated_S
 Columns not defined in the table below are ignored by the pipeline but can be useful
 to add extra information for downstream processing.
 
-| Column       | Description                                                                                                                                                                            |
-| ------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `sample`     | Custom sample name. This entry will be identical for multiple sequencing libraries/runs from the same sample. Spaces in sample names are automatically converted to underscores (`_`). |
-| `design`     | The name of the pixelator design configuration.                                                                                                                                        |
-| `panel`      | Name of the panel to use.                                                                                                                                                              |
-| `panel_file` | Path to a CSV file containing a custom panel.                                                                                                                                          |
-| `fastq_1`    | Path to FastQ file for Illumina short reads 1. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz".                                                                  |
-| `fastq_2`    | Path to FastQ file for Illumina short reads 2. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz".                                                                  |
+| Column                              | Required | Description                                                                                                                                                                            |
+| ----------------------------------- | -------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `sample`                            | Yes      | Custom sample name. This entry will be identical for multiple sequencing libraries/runs from the same sample. Spaces in sample names are automatically converted to underscores (`_`). |
+| `design`                            | Yes      | The name of the pixelator design configuration.                                                                                                                                        |
+| `panel` <br />or<br /> `panel_file` | Yes      | Name of the panel to use. <br />or<br /> Path to a CSV file containing a custom panel.                                                                                                 |
+| `fastq_1`                           | Yes      | Path to FastQ file for Illumina short reads 1. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz".                                                                  |
+| `fastq_2`                           | No       | Path to FastQ file for Illumina short reads 2. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". Parameter only used if you are running paired-end.               |
 
 The `panel` and `panel_file` options are mutually exclusive. If both are specified, the pipeline will throw an error.
 One of them has to be specified.
@@ -56,10 +55,10 @@ The pipeline will auto-detect whether a sample is single- or paired-end based on
 The `sample` identifiers have to be the same when you have re-sequenced the same sample more than once e.g. to increase sequencing depth. The pipeline will concatenate the raw reads before performing any downstream analysis. Below is an example for the same sample sequenced across 3 lanes:
 
 ```csv
-sample,design,panel,panel_file,fastq_1,fastq_2
-uropod_control_1,D21,human-sc-immunology-spatial-proteomics,,uropod_control_S1_L001_R1_001.fastq.gz,uropod_control_S1_L001_R2_001.fastq.gz
-uropod_control_1,D21,human-sc-immunology-spatial-proteomics,,uropod_control_S1_L002_R1_001.fastq.gz,uropod_control_S1_L002_R2_001.fastq.gz
-uropod_control_1,D21,human-sc-immunology-spatial-proteomics,,uropod_control_S1_L003_R1_001.fastq.gz,uropod_control_S1_L003_R2_001.fastq.gz
+sample,design,panel,fastq_1,fastq_2
+uropod_control_1,D21,human-sc-immunology-spatial-proteomics,uropod_control_S1_L001_R1_001.fastq.gz,uropod_control_S1_L001_R2_001.fastq.gz
+uropod_control_1,D21,human-sc-immunology-spatial-proteomics,uropod_control_S1_L002_R1_001.fastq.gz,uropod_control_S1_L002_R2_001.fastq.gz
+uropod_control_1,D21,human-sc-immunology-spatial-proteomics,uropod_control_S1_L003_R1_001.fastq.gz,uropod_control_S1_L003_R2_001.fastq.gz
 ```
 
 ### Relative paths
@@ -97,6 +96,41 @@ For example, using the same samplesheet as above, but with the samplesheet on th
 nextflow run nf-core/pixelator --input samplesheet.csv --input_basedir s3://my-company-data/experiment-1/
 ```
 
+### Design
+
+The `design` column specifies the name of the pixelator assay design configuration to use.
+
+A list of available designs can be listed by running following command:
+
+```shell
+pixelator single-cell --list-designs
+```
+
+Currently, a single design is available:
+
+- `D21`
+
+### Panels
+
+The panel file contains all information used to link antibodies barcodes to their respective targets.
+Panel files can be specified in two ways:
+
+- Using a predefined panel name to use the default build in panels.
+- Passing a csv file with a customized panel.
+
+Predefined panels can be passed in the `panel` field. Custom panels can be passed in the `panel_file` field.
+Every sample should have either `panel` or `panel_file` specified.
+
+A list of available panels can be listed by running following command:
+
+```shell
+pixelator single-cell --list-panels
+```
+
+Currently, a single built-in panel is available:
+
+- `human-sc-immunology-spatial-proteomics`
+
 ## Running the pipeline
 
 The typical command for running the pipeline is as follows:

diff --git a/samplesheet.csv b/samplesheet.csv
@@ -0,0 +1,3 @@
+sample,design,panel,fastq_1,fastq_2
+uropod_control,D21,human-sc-immunology-spatial-proteomics,uropod_control_300k_R1_001.fastq.gz,uropod_control_300k_R2_001.fastq.gz
+pbmcs_unstimulated,D21,human-sc-immunology-spatial-proteomics,Sample01_human_pbmcs_unstimulated_200k_R1_001.fastq.gz,Sample01_human_pbmcs_unstimulated_200k_R2_001.fastq.gz