Skip to content

Florida's BPHL Nextflow pipeline for OROV reference-based assembly from metagenomics reads.

License

Notifications You must be signed in to change notification settings

BPHL-Molecular/Juno

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Juno 🦟🦠🧬📊 - A Nextflow Pipeline for Reference-Based Assembly of Oropouche Virus (OROV) Genomes

Juno is designed for processing Illumina paired-end metagenomics sequencing data against OROV reference genomes, performing QC, taxonomic classification, alignment, variant calling, and consensus generation.

⚡ Usage

$ nextflow run juno.nf -profile singularity -params-file params.yaml

🐊 HiPerGator Usage

$ sbatch ./juno.sh

📦 Dependencies

⚙️ Configuration

1. Clone this repository

git clone https://github.com/BPHL-Molecular/Juno.git
cd Juno

2. Create a directory for Input FASTQ Files

mkdir fastq
# move or copy your FASTQ files into this directory

3. Set required parameters:

Important: All pipeline parameters must be set in the params.yaml file. Make sure you edit this file to provide the correct paths and values before running the pipeline.

You will also need to download the kraken2/bracken viral database from the BenLangmead Index zone link.

# Input/Output paths
input_dir: "/path/to/fastq"
output_dir: "/path/to/output_dir"

# References path, default reference directory, DO NOT change.
refs_dir: "${projectDir}/references"

# Database path
kraken2_db: "/path/to/kraken2_db"

# Resource configuration, default number of threads per process
threads: 32

# Human scrubber processing option, set to true for HPC environments
parallel_hrrt: false

# Quality control thresholds
qc_thresholds:
    min_coverage: 90
    min_depth: 15
Please see the notes on the references sequences used in this pipeline.

🛠️ Pipeline Steps

  1. Quality Control
  2. Taxonomic Classification
  3. Assembly
    • Reference alignment - bwa
    • SAM/BAM processing - samtools
    • Variant calling & consensus - ivar
  4. Quality Assessment

📂 Output Structure

output_dir/
├── dehosted/         # Cleaned reads
├── trimmed/          # Trimmed reads
├── kraken2/          # Classification results
├── alignments/       # SAM/BAM files & indices
├── stats/            # Alignment statistics
├── variants/         # Variant calls
├── consensus/        # Consensus sequences
├── quast/            # Assembly metrics
├── multiqc/          # Combined QC report
└── summary_report.tsv

📋 Summary Report Metrics

  • Sample and reference identifiers
  • Cleaned read counts
  • Classification read counts
  • Mapping statistics
  • Coverage metrics
  • Variant counts
  • Assembly quality metrics
  • Overall QC status

🐛 Troubleshooting

Pipeline Errors:
Check Nextflow execution logs in .nextflow.log

Low Coverage Regions:
Regions with low coverage (<10x) will be filled with 'N' in consensus sequences.

Quality Thresholds:
Default quality thresholds can be modified in params.yaml as needed.

🤝 Contributing

We welcome contributions to make Juno better! Feel free to open issues or submit pull requests to suggest any additional features or enhancements!

📧 Contact

Email: bphl-sebioinformatics@flhealth.gov

⚖️ License

Juno is licensed under the MIT License.

About

Florida's BPHL Nextflow pipeline for OROV reference-based assembly from metagenomics reads.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published