Juno is designed for processing Illumina paired-end metagenomics sequencing data against OROV reference genomes, performing QC, taxonomic classification, alignment, variant calling, and consensus generation.
$ nextflow run juno.nf -profile singularity -params-file params.yaml
$ sbatch ./juno.sh
- Nextflow 23.04.0+
- Singularity or Docker
- Python 3.6+
- Slurm (only if HiPerGator will be used)
git clone https://github.com/BPHL-Molecular/Juno.git
cd Juno
mkdir fastq
# move or copy your FASTQ files into this directory
Important: All pipeline parameters must be set in the params.yaml
file. Make sure you edit this file to provide the correct paths and values before running the pipeline.
You will also need to download the kraken2/bracken viral database from the BenLangmead Index zone link.
# Input/Output paths
input_dir: "/path/to/fastq"
output_dir: "/path/to/output_dir"
# References path, default reference directory, DO NOT change.
refs_dir: "${projectDir}/references"
# Database path
kraken2_db: "/path/to/kraken2_db"
# Resource configuration, default number of threads per process
threads: 32
# Human scrubber processing option, set to true for HPC environments
parallel_hrrt: false
# Quality control thresholds
qc_thresholds:
min_coverage: 90
min_depth: 15
Please see the notes on the references sequences used in this pipeline.
- Quality Control
- Human Read Removal -
sra-human-scrubber
- Read QC and trimming -
fastp
- Human Read Removal -
- Taxonomic Classification
- Read classification -
kraken2
- Read classification -
- Assembly
- Quality Assessment
output_dir/
├── dehosted/ # Cleaned reads
├── trimmed/ # Trimmed reads
├── kraken2/ # Classification results
├── alignments/ # SAM/BAM files & indices
├── stats/ # Alignment statistics
├── variants/ # Variant calls
├── consensus/ # Consensus sequences
├── quast/ # Assembly metrics
├── multiqc/ # Combined QC report
└── summary_report.tsv
- Sample and reference identifiers
- Cleaned read counts
- Classification read counts
- Mapping statistics
- Coverage metrics
- Variant counts
- Assembly quality metrics
- Overall QC status
Pipeline Errors:
Check Nextflow execution logs in .nextflow.log
Low Coverage Regions:
Regions with low coverage (<10x) will be filled with 'N' in consensus sequences.
Quality Thresholds:
Default quality thresholds can be modified in params.yaml as needed.
We welcome contributions to make Juno better! Feel free to open issues or submit pull requests to suggest any additional features or enhancements!
Email: bphl-sebioinformatics@flhealth.gov
Juno is licensed under the MIT License.