WES-Variant-Calling

Overview

The WES-Variant-Calling workflow is designed to process human Whole Exome Sequencing (WES) data following GATK4 best practices for germline variant calling. This pipeline encompasses data downloading, quality control, alignment, duplicate marking, base quality score recalibration (BQSR), variant calling, filtering, and VCF validation.

Prerequisites

Before running the WES-Variant-Calling pipeline, ensure you have the following:

Reference Files :
- Human Reference Genome (downloaded during execution): GRCh38 (GCA_000001405.15_GRCh38_no_alt_analysis_set.fna)
- dbSNP VCF (downloaded during execution): Homo_sapiens_assembly38.dbsnp138.vcf
- Sample Metadata TSV: Contains URLs for FASTQ files and sample information
Software Dependencies:
- Anaconda: For environment management
- Bioinformatics Tools (downloaded during execution): Installed via Conda (e.g., GATK4, BWA-MEM, Samtools, FastQC, Picard, BCFtools, VCF-validator)
Compute Resources:
- Sufficient memory and CPU cores as required by the pipeline steps

Installation

1. Clone the Repository

git clone https://github.com/KavyaBanerj/WES-Variant-Calling.git
cd WES-Variant-Calling

Usage

1. Prepare Sample Metadata

Create a TSV file (e.g., igsr_HG00479.tsv) containing the URLs for your FASTQ files and sample information.

2. Configure the Pipeline Script

Ensure that the wes_pipeline.sh script has the correct paths and configurations based on your directory structure and sample metadata.

3. Run the Pipeline

Execute the pipeline script:

bash ./wes_pipeline.sh

Note: Ensure that the script has execute permissions. If not, set them using:

chmod +x ./wes_pipeline.sh

Pipeline Steps

Data Downloading:
- Downloads FASTQ files based on URLs provided in the metadata TSV.
- Downloads and prepares the reference genome and dbSNP VCF.
Quality Control (QC):
- Runs FastQC on raw FASTQ files to assess quality metrics.
Alignment:
- Aligns reads to the reference genome using BWA-MEM.
- Includes read group (@RG) information for downstream analyses.
Duplicate Marking:
- Marks PCR duplicates using GATK's MarkDuplicatesSpark.
Base Quality Score Recalibration (BQSR):
- Performs BQSR to correct systematic errors in base quality scores.
Variant Calling:
- Calls variants using GATK's HaplotypeCaller.
- Generates raw VCF files with variant annotations.
Variant Extraction:
- Extracts SNPs and Indels into separate VCF files using SelectVariants.
Variant Filtering:
- Applies filters based on quality metrics (QD, FS, MQ, SOR, MQRankSum, ReadPosRankSum) using VariantFiltration.
- Applies genotype-level filters (DP, GQ).
VCF Processing:
- Further filters variants using grep to remove genotype filter flags.
- Sorts, compresses, indexes, and concatenates SNP and Indel VCFs using bcftools.
VCF Validation:
- Validates the final VCF file using GATK's ValidateVariants.

Outputs

The pipeline generates several output files within the results/ directory:

Raw Variants:
- raw_variants.vcf: Unfiltered variant calls.
Filtered Variants:
- filtered_snps.vcf: SNPs after applying variant filters.
- filtered_indels.vcf: Indels after applying variant filters.
Analysis-Ready Variants:
- analysis_ready_snps.vcf: SNPs passing all filters.
- analysis_ready_indels.vcf: Indels passing all filters.
Final Combined VCF:
- combined_analysis_ready_{sample_name}.vcf.gz: Concatenated and compressed VCF file ready for downstream tools like wANNOVAR.

Acknowledgements

Pipeline inspired by WES-analysis and variant_calling

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
results		results
README.md		README.md
igsr_HG00479.tsv		igsr_HG00479.tsv
wes_pipeline.sh		wes_pipeline.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WES-Variant-Calling

Overview

Prerequisites

Installation

1. Clone the Repository

Usage

1. Prepare Sample Metadata

2. Configure the Pipeline Script

3. Run the Pipeline

Pipeline Steps

Outputs

Acknowledgements

About

Releases

Packages

Languages

KavyaBanerj/WES-Variant-Calling

Folders and files

Latest commit

History

Repository files navigation

WES-Variant-Calling

Overview

Prerequisites

Installation

1. Clone the Repository

Usage

1. Prepare Sample Metadata

2. Configure the Pipeline Script

3. Run the Pipeline

Pipeline Steps

Outputs

Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages