Skip to content

Commit

Permalink
documentation tweaks
Browse files Browse the repository at this point in the history
  • Loading branch information
fbdtemme committed Oct 16, 2023
1 parent 04f286f commit 29ed89e
Show file tree
Hide file tree
Showing 3 changed files with 109 additions and 70 deletions.
118 changes: 60 additions & 58 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,38 +23,28 @@ The pipeline consists of the following steps:

### Preprocessing

The preprocessing step uses `pixelator single-cell concatenate` to create a full amplicon sequence from both single-end and paired-end data.
It returns a single fastq per sample containing fixed length amplicons.
This step will also calculate Q30 quality scores for different regions of the library.

<details markdown="1">
<summary>Output files</summary>

- `pixelator`

- `concatenate`
- `amplicon`

- `<sample-id>.merged.fastq.gz`:
Combine R1 and R2 reads into full amplicon reads and calculate Q30 scores for the amplicon regions.
- `<sample-id>.report.json`: Q30 metrics of the amplicon.
- `<sample-id>.meta.json`: Command invocation metadata.

- `logs`
- `<sample-id>.pixelator-concatenate.log`: pixelator log output.
- `<sample-id>.pixelator-amplicon.log`: pixelator log output.

</details>

### Quality control

Quality control is performed using `pixelator single-cell preqc` and `pixelator single-cell adapterqc`.

The preqc stage performs QC and quality filtering of the raw sequencing data.
It also generates a QC report in HTML and JSON formats. It saves processed reads as well as reads that were
discarded (i.e. were too short, had too many Ns, or too low quality, etc.). Internally `preqc`
uses [Fastp](https://github.com/OpenGene/fastp), and `adapterqc`
uses [Cutadapt](https://cutadapt.readthedocs.io/en/stable/).
The preprocessing step uses `pixelator single-cell amplicon` to create full-length amplicon sequences from both single-end and paired-end data.
It returns a single fastq file per sample containing fixed length amplicons.
This step will also calculate Q30 quality scores for different regions of the library.

The `adapterqc` stage checks for the presence and correctness of the pixel binding sequences. It also generates a QC report in JSON format. It saves processed reads as well as discarded reads (i.e. reads that did not have a match for both pixel binding sequences).
### Quality control

<details markdown="1">
<summary>Output files</summary>
Expand All @@ -78,11 +68,17 @@ The `adapterqc` stage checks for the presence and correctness of the pixel bindi

</details>

### Demultiplexing
Quality control is performed using `pixelator single-cell preqc` and `pixelator single-cell adapterqc`.

The `pixelator single-cell demux` command assigns a marker (barcode) to each read. It also generates QC report in
JSON format. It saves processed reads (one per antibody) as well as discarded reads with no match to the
given barcodes/antibodies.
The preqc stage performs QC and quality filtering of the raw sequencing data.
It also generates a QC report in HTML and JSON formats. It saves processed reads as well as reads that were
discarded (i.e. were too short, had too many Ns, or too low quality, etc.). Internally `preqc`
uses [Fastp](https://github.com/OpenGene/fastp), and `adapterqc`
uses [Cutadapt](https://cutadapt.readthedocs.io/en/stable/).

The `adapterqc` stage checks for the presence and correctness of the pixel binding sequences. It also generates a QC report in JSON format. It saves processed reads as well as discarded reads (i.e. reads that did not have a match for both pixel binding sequences).

### Demultiplexing

<details markdown="1">
<summary>Output files</summary>
Expand All @@ -101,16 +97,11 @@ given barcodes/antibodies.

</details>

### Duplicate removal and error correction

This step uses the `pixelator single-cell collapse` command.

The `collapse` command removes duplicate reads and performs error correction.
This is achieved using the unique pixel identifier and unique molecular identifier sequences to check for
uniqueness, collapse and compute a read count. The command generates a QC report in JSON format.
Errors are allowed when collapsing reads if `--algorithm` is set to `adjacency` (this is the default option).
The `pixelator single-cell demux` command assigns a marker (barcode) to each read. It also generates QC report in
JSON format. It saves processed reads (one per antibody) as well as discarded reads with no match to the
given barcodes/antibodies.

The output format of this command is an edge list in CSV format.
### Duplicate removal and error correction

<details markdown="1">
<summary>Output files</summary>
Expand All @@ -128,17 +119,16 @@ The output format of this command is an edge list in CSV format.

</details>

### Compute connected components
This step uses the `pixelator single-cell collapse` command.

This step uses the `pixelator single-cell graph` command.
The input is the edge list dataframe (CSV) generated in the collapse step and after filtering it
by count (`--graph_min_count`), the connected components of the graph (graphs) are computed and
added to the edge list in a column called "component".
The `collapse` command removes duplicate reads and performs error correction.
This is achieved using the unique pixel identifier and unique molecular identifier sequences to check for
uniqueness, collapse and compute a read count. The command generates a QC report in JSON format.
Errors are allowed when collapsing reads if `--algorithm` is set to `adjacency` (this is the default option).

The graph command has the option to recover components (technical multiplets) into smaller
components using community detection to find and remove problematic edges.
(See `--multiplet_recovery`). The information to keep track of the original and
new (recovered) components are stored in a file (components_recovered.csv).
The output format of this command is an edge list in CSV format.

### Compute connected components

<details markdown="1">
<summary>Output files</summary>
Expand All @@ -162,13 +152,17 @@ new (recovered) components are stored in a file (components_recovered.csv).

</details>

### Cell-calling, filtering, and annotation
This step uses the `pixelator single-cell graph` command.
The input is the edge list dataframe (CSV) generated in the collapse step and after filtering it
by count (`--graph_min_count`), the connected components of the graph (graphs) are computed and
added to the edge list in a column called "component".

This step uses the `pixelator single-cell annotate` command.
The graph command has the option to recover components (technical multiplets) into smaller
components using community detection to find and remove problematic edges.
(See `--multiplet_recovery`). The information to keep track of the original and
new (recovered) components are stored in a file (components_recovered.csv).

The annotate command takes as input the edge list (CSV) file generated in the graph command. It parses, and filters the
edgelist to find putative cells, and it will generate a pxl file containing the edgelist, and an
(AnnData object)[https://anndata.readthedocs.io/en/latest/] as well as some useful medatadata.
### Cell-calling, filtering, and annotation

<details markdown="1">
<summary>Output files</summary>
Expand All @@ -186,18 +180,13 @@ edgelist to find putative cells, and it will generate a pxl file containing the
- `<sample-id>.pixelator-annotate.log`: pixelator log output.
</details>

### Downstream analysis

This step uses the `pixelator single-cell analysis` command.
Downstream analysis is performed on the `pxl` file generated by the previous stage.
The results of the analysis is added to the pxl file.

Currently, the following analysis can be performed (if enabled):
This step uses the `pixelator single-cell annotate` command.

- polarization scores (enable with `--compute_polarization`)
- co-localization scores (enable with `--compute_colocalization`)
The annotate command takes as input the edge list (CSV) file generated in the graph command. It parses, and filters the
edgelist to find putative cells, and it will generate a pxl file containing the edgelist, and an
(AnnData object)[https://anndata.readthedocs.io/en/latest/] as well as some useful medatadata.

This step can be skipped using the `--skip_analysis` option.
### Downstream analysis

<details markdown="1">
<summary>Output files</summary>
Expand All @@ -215,13 +204,18 @@ This step can be skipped using the `--skip_analysis` option.

</details>

### Generate reports
This step uses the `pixelator single-cell analysis` command.
Downstream analysis is performed on the `pxl` file generated by the previous stage.
The results of the analysis is added to the pxl file.

This step uses the `pixelator single-cell report` command.
This step will collect metrics and outputs generated by previous stages
and generate a report in HTML format for each sample.
Currently, the following analysis can be performed (if enabled):

This step can be skipped using the `--skip_report` option.
- polarization scores (enable with `--compute_polarization`)
- co-localization scores (enable with `--compute_colocalization`)

This step can be skipped using the `--skip_analysis` option.

### Generate reports

<details markdown="1">
<summary>Output files</summary>
Expand All @@ -234,6 +228,14 @@ This step can be skipped using the `--skip_report` option.

</details>

This step uses the `pixelator single-cell report` command.
This step will collect metrics and outputs generated by previous stages
and generate a report in HTML format for each sample.

This step can be skipped using the `--skip_report` option.

More information on the report can be found in the pixelator documentation [here](https://software.pixelgen.com/pixelator/outputs/web-report/)

### Pipeline information

<details markdown="1">
Expand Down
58 changes: 46 additions & 12 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,14 +37,13 @@ uropod_stimulated,D21,human-sc-immunology-spatial-proteomics,uropod_stimulated_S
Columns not defined in the table below are ignored by the pipeline but can be useful
to add extra information for downstream processing.

| Column | Description |
| ------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `sample` | Custom sample name. This entry will be identical for multiple sequencing libraries/runs from the same sample. Spaces in sample names are automatically converted to underscores (`_`). |
| `design` | The name of the pixelator design configuration. |
| `panel` | Name of the panel to use. |
| `panel_file` | Path to a CSV file containing a custom panel. |
| `fastq_1` | Path to FastQ file for Illumina short reads 1. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". |
| `fastq_2` | Path to FastQ file for Illumina short reads 2. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". |
| Column | Required | Description |
| ----------------------------------- | -------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `sample` | Yes | Custom sample name. This entry will be identical for multiple sequencing libraries/runs from the same sample. Spaces in sample names are automatically converted to underscores (`_`). |
| `design` | Yes | The name of the pixelator design configuration. |
| `panel` <br />or<br /> `panel_file` | Yes | Name of the panel to use. <br />or<br /> Path to a CSV file containing a custom panel. |
| `fastq_1` | Yes | Path to FastQ file for Illumina short reads 1. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". |
| `fastq_2` | No | Path to FastQ file for Illumina short reads 2. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". Parameter only used if you are running paired-end. |

The `panel` and `panel_file` options are mutually exclusive. If both are specified, the pipeline will throw an error.
One of them has to be specified.
Expand All @@ -56,10 +55,10 @@ The pipeline will auto-detect whether a sample is single- or paired-end based on
The `sample` identifiers have to be the same when you have re-sequenced the same sample more than once e.g. to increase sequencing depth. The pipeline will concatenate the raw reads before performing any downstream analysis. Below is an example for the same sample sequenced across 3 lanes:

```csv
sample,design,panel,panel_file,fastq_1,fastq_2
uropod_control_1,D21,human-sc-immunology-spatial-proteomics,,uropod_control_S1_L001_R1_001.fastq.gz,uropod_control_S1_L001_R2_001.fastq.gz
uropod_control_1,D21,human-sc-immunology-spatial-proteomics,,uropod_control_S1_L002_R1_001.fastq.gz,uropod_control_S1_L002_R2_001.fastq.gz
uropod_control_1,D21,human-sc-immunology-spatial-proteomics,,uropod_control_S1_L003_R1_001.fastq.gz,uropod_control_S1_L003_R2_001.fastq.gz
sample,design,panel,fastq_1,fastq_2
uropod_control_1,D21,human-sc-immunology-spatial-proteomics,uropod_control_S1_L001_R1_001.fastq.gz,uropod_control_S1_L001_R2_001.fastq.gz
uropod_control_1,D21,human-sc-immunology-spatial-proteomics,uropod_control_S1_L002_R1_001.fastq.gz,uropod_control_S1_L002_R2_001.fastq.gz
uropod_control_1,D21,human-sc-immunology-spatial-proteomics,uropod_control_S1_L003_R1_001.fastq.gz,uropod_control_S1_L003_R2_001.fastq.gz
```

### Relative paths
Expand Down Expand Up @@ -97,6 +96,41 @@ For example, using the same samplesheet as above, but with the samplesheet on th
nextflow run nf-core/pixelator --input samplesheet.csv --input_basedir s3://my-company-data/experiment-1/
```

### Design

The `design` column specifies the name of the pixelator assay design configuration to use.

A list of available designs can be listed by running following command:

```shell
pixelator single-cell --list-designs
```

Currently, a single design is available:

- `D21`

### Panels

The panel file contains all information used to link antibodies barcodes to their respective targets.
Panel files can be specified in two ways:

- Using a predefined panel name to use the default build in panels.
- Passing a csv file with a customized panel.

Predefined panels can be passed in the `panel` field. Custom panels can be passed in the `panel_file` field.
Every sample should have either `panel` or `panel_file` specified.

A list of available panels can be listed by running following command:

```shell
pixelator single-cell --list-panels
```

Currently, a single built-in panel is available:

- `human-sc-immunology-spatial-proteomics`

## Running the pipeline

The typical command for running the pipeline is as follows:
Expand Down
3 changes: 3 additions & 0 deletions samplesheet.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
sample,design,panel,fastq_1,fastq_2
uropod_control,D21,human-sc-immunology-spatial-proteomics,uropod_control_300k_R1_001.fastq.gz,uropod_control_300k_R2_001.fastq.gz
pbmcs_unstimulated,D21,human-sc-immunology-spatial-proteomics,Sample01_human_pbmcs_unstimulated_200k_R1_001.fastq.gz,Sample01_human_pbmcs_unstimulated_200k_R2_001.fastq.gz

0 comments on commit 29ed89e

Please sign in to comment.