The ProteoGenomics database generation workflow (pgdb) use the pypgatk and nextflow to create different protein databases for ProteoGenomics data analysis.
nf-core/pgdb is a bioinformatics best-practise analysis pipeline for
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.
-
Install
nextflow
-
Install any of
Docker
,Singularity
orPodman
for full pipeline reproducibility (please only useConda
as a last resort; see docs) -
Download the pipeline and test it on a minimal dataset with a single command:
nextflow run nf-core/pgdb -profile test,<docker/singularity/podman/conda/institute>
Please check nf-core/configs to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use
-profile <institute>
in your command. This will enable eitherdocker
orsingularity
and set the appropriate execution settings for your local compute environment. -
Start running your own analysis!
nextflow run nf-core/pgdb -profile <docker/singularity/podman/conda/institute> --ensembl_name homo_sapines --ensembl false
See usage docs for all of the available options when running the pipeline.
By default, the pipeline currently performs the following:
- Download protein databases from ENSEMBL
- Translate from Genomics Variant databases into ProteoGenomics Databases (
COSMIC
,GNOMAD
) - Add to a Reference proteomics database, non-coding RNAs + pseudogenes.
- Compute Decoy for a proteogenomics databases
The nf-core/pgdb pipeline comes with documentation about the pipeline: usage and output.
nf-core/pgdb was originally written by Husen M. Umer & Yasset Perez-Riverol.
We thank the following people for their extensive assistance in the development of this pipeline:
If you would like to contribute to this pipeline, please see the contributing guidelines.
For further information or help, don't hesitate to get in touch on the Slack #pgdb
channel (you can join with this invite).
You can cite the nf-core
publication as follows:
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x. ReadCube: Full Access Link
In addition, references of tools and data used in this pipeline are as follows: