-
Notifications
You must be signed in to change notification settings - Fork 0
Pipeline scripts
zeyang-shen edited this page Jun 26, 2020
·
6 revisions
This script takes positive and negative sequences from FASTA files and conduct MAGGIE analysis.
python ./bin/maggie_fasta_input.py [posFile] [negFile] -o [directory]
argument | description | default |
---|---|---|
posFile | fasta file(s) that contain positive sequences; multiple files should be separated by comma without space (e.g., file1,file2,file3,...) | required user input |
negFile | fasta file(s) that contain negative sequences that should have the same sequence identifiers as positive sequences to form pairs | required user input |
-o | directory to store output files; by default, a new folder will be created under the current path | ./maggie_output/ |
--motifPath | directory that stores motif files | ./data/JASPAR2020_CORE_vertebrates_motifs/ |
-m/--motifs | specify motifs to compute; multiple motifs should be separated by comma without space (e.g., SPI1,CEBPB) | all motifs from --motifPath
|
-p | number of processors to run | 1 |
-R | Flag to overwrite the output folder specified by -o if it already exists |
False |
-mCut | cutoff for merging similar motifs; should be a float value ranging from 0 (merge everything) to 1 (no merging at all) | 0.6 |
-sCut | cutoff for calling significance based on FDR values | 0.05 |
-T | number of top motif scores to be used to compute for representative motif score | 1 |
--saveDiff | Flag for saving motif score differences. This file can be large. | False |
--linear | Change to linear model for analysis | False |
This script takes a VCF file with allele and effect size information and conduct MAGGIE analysis.
python ./bin/maggie_vcf_input.py [vcfFile] [genome] -e [effect size column number] -o [directory]
argument | description | default |
---|---|---|
vcfFile | VCF file that contains testing variants | required user input |
genome | reference genome; or specify a path to genome FASTA file | required user input (currently support hg19, hg38, mm10, hg18) |
-o | directory to store output files | ./maggie_output/ |
-e/--effect | the column index in the input file for effect size that compares alternative vs. reference alleles. If not specified, assume alternative alleles are always associated with a higher signal | required user input |
-a1 | the column index for the reference allele. If not specified, use the 4th column | 4 |
-a2 | the column index for the alternative allele. If not specified, use the 5th column | 5 |
--saveSeq | Flag for saving intermediate sequences. Will generate two files that correspond to alleles associated higher and lower signals | False |
-S/--size | size of sequences to test around variants | 100 |
--motifPath | path to the motif files | ./data/JASPAR2020_CORE_vertebrates_motifs/ |
-m/--motifs | specify motifs to compute; multiple motifs should be separated by comma without space (e.g., SPI1,CEBPB) | all motifs from --motifPath
|
-p | number of processors to run | 1 |
-mCut | cutoff for merging similar motifs; should be a float value ranging from 0 (merge everything) to 1 (no merging at all) | 0.6 |
-sCut | cutoff for calling significance based on FDR values | 0.05 |
-T | number of top motif scores to be used to compute for representative motif score | 1 |
--saveDiff | Flag for saving motif score differences. This file can be large. | False |
This script splits variants in a VCF file based on genomic annotations into different categories (near TSS, intergenic, intronic).
python ./bin/splitVariants.py [vcfFile] [genome] -o [directory]
argument | description | default |
---|---|---|
vcfFile | VCF file that contains testing variants | required user input |
genome | reference genome; or specify a path to genome FASTA file | required user input (currently support hg19, hg38, mm10, hg18) |
-o | directory to store output files | ./maggie_output/ |
-L/--overlap | overlap size to count for annotation | 100 |
This script can be used to download the genomic data, including genome and annotations. Currently available genomes include hg19, hg38, mm10, hg18.
python ./bin/download_genomic_data.py [genome] -o [directory]
argument | description | default |
---|---|---|
genome | genome to download: hg19, hg38, mm10, hg18 | required user input |
-o | directory to store downloaded files | ./data/genomes/ |
--annot | Flag for downloading annotation files at the same time | False |