Phylogenetic Analysis for SARS-COV2.
nhhaidee/scovtree is a bioinformatics pipeline for sars-cov2 phylogenetic analysis, given a consensus sequences the workflow will output phylogenetic tree and SNP information. The pipeline also allows to filter and find the most related sequences in GISAID. The GISAID filters workflow will output filtered sequences and metadata in old format (GISAID changed format of metadata recently) so the output then can be used with Nextstrain locally.
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.
-
Install
nextflow
-
Install any of
Docker
,Singularity
for full pipeline reproducibility (please only useConda
as a last resort; see docs) -
Download the pipeline and test it on a minimal dataset with a single command:
nextflow run nhhaidee/scovtree -profile test_gisaid_full,<docker/singularity/conda> nextflow run nhhaidee/scovtree -profile test_gisaid_drop_columns,<docker/singularity/conda> nextflow run nhhaidee/scovtree -profile test,<docker/singularity/conda>
-
Start running your own analysis!
-
Typical command for phylogenetic analysis is as follow:
nextflow run nhhaidee/scovtree -profile <docker/singularity/conda> \ --filter_gisaid false \ --input '/path/to/consensus/consensus_sequences.fasta'
-
Typical command for phylogenetic analysis with GISAID Sequences is as follow:
nextflow run nhhaidee/scovtree -profile <docker/singularity/conda> \ --filter_gisaid true \ --gisaid_sequences /path/to/sequences.fasta \ --gisaid_metadata /path/to/metadata.tsv \ --input '/path/to/consensus/consensus_sequences.fasta'
-
nhhaidee/scovtree was originally written by Hai Nguyen.
If you would like to contribute to this pipeline, please see the contributing guidelines.
For further information or help, don't hesitate to get in touch on the Slack #scovtree
channel (you can join with this invite).
You can cite the nf-core
publication as follows:
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.
In addition, references of tools and data used in this pipeline are as follows: