(c) 2017 Timothy Becker & Wan-Ping Lee
SVE is a python script based execution engine for Structural Variation (SV) detection and can be used for any levels of data inputs, raw FASTQs, aligned BAMs, or variant call format (VCFs), and generates a unified VCF as its output. By design, SVE consists of alignment, realignment and the ensemble of state-of-the-art SV-calling algorithms by default. They are BreakDancer, BreakSeq, cnMOPS, CNVnator, DELLY, Hydra and LUMPY. FusorSV is also embedded that is a data mining approach to assess performance and merge callsets from an ensemble of SV-calling algorithms.
- python 2.7, HTSeq, numpy, scipy, subprocess32, bx-python, CrossMap and mygene
- gcc 4.8 or greater
- cmake 3.0 or greater
- Root
- R 3.3 or greater. You may type "make R-install" to install R-3.3.3.
Please set ROOT enviorment.
export ROOTSYS=/ROOT_Build_Path
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ROOTSYS/lib
git clone --recursive https://github.com/TheJacksonLaboratory/SVE.git
cd SVE
make
Please check python2.7 header files and modify "CFLAGS_FUSOR_SV" in Makefile. The header files may be on "/usr/include/python2.7" and use "CFLAGS_FUSOR_SV=-I /usr/include/python2.7" instead.
make FusorSV
Or, you can install FusorSV by setup.py
cd SVE/scripts/FusorSV/
python setup.py build_ext --inplace
tar -zxvf data.tar.gz
Alternatively, Dockerfile and Docker image are provided. Please notice that sudo may be required for docker usages depending on your machine setting.
cd SVE
docker build .
Pull docker image from the repository.
docker pull wanpinglee/sve
SVE is built on /tools/SVE. Check the help by
/tools/SVE/bin/sve
Short reads in FASTQ will be mapped against the given FASTA and a sorted BAM will be generated.
bin/sve align [options] -r <FASTA> <FASTQ1 [FASTQ2]>
If the reads are given by BAM format, realign will remap reads against FASTA and generate a sorted BAM. We use SpeedSeq to accomplish realign.
bin/sve realign -r <FASTA> <BAM>
There are seven SV calling algorithms that can be used for SV calling. VCF will be generated.
bin/sve call -r <FASTA> -g <hg19|hg38|others> -a <breakdancer|breakseq|cnvnator|hydra|delly|lumpy|cnmops> <BAM [BAM ...]>
After calling, each sample may have mulitple VCFs depending on how many callers used. Please collect VCFs of a sample in a folder.
The vcfs should use SVE IDs to indicate the callers.
SVE ID | Caller |
---|---|
4 | BreakDancer (v1.4.5) |
9 | cn.MOPS (v1.20) |
10 | CNVnator (v0.3.3) |
11 | DELLY (v2) |
14* | GenomeSTRiP |
17 | Hydra |
18 | LUMPY |
35 | BreakSeq (v2.2) |
0 | Truth (optional) |
Note*: Because of license issue, GenomeSTRiP is not embedded in SVE. However, FusorSV default model is able to handle GenomeSTRiP VCF.
Example input vcf files can be organized as follows. Please note that vcfFiles is the argument for -i for FusorSV.
- vcfFiles/sample1/sample1_S11.vcf
- vcfFiles/sample1/sample1_S10.vcf
- vcfFiles/sample1/sample1_S4.vcf
- vcfFiles/sample2/sample2_S11.vcf
- vcfFiles/sample2/sample2_S10.vcf
- vcfFiles/sample2/sample2_S4.vcf
python scripts/FusorSV/FusorSV.py -f scripts/FusorSV/data/models/default.pickle -L DEFAULT -r <FASTA> -i <vcfFiles>/ -p <THREADS> -o <OUT_DIR>
According to S0.vcf, a new model will be generated and VCFs will be merged by the new model.
python scripts/FusorSV/FusorSV.py -L DEFAULT -r <FASTA> -i <vcfFiles>/ -p <THREADS> -o <OUT_DIR>
The project is licensed under the GPL-3.0 License. Please see LICENSE for details.