Skip to content

v0.3.2 - Much improved SNP genome preparation

Compare
Choose a tag to compare
@FelixKrueger FelixKrueger released this 29 Mar 10:15
· 158 commits to master since this release

SNPsplit


  • Changed the samtools command throughout SNPsplit to now correctly use the path supplied by the user with --samtools_path. Thanks to Kenzo Hillion for spotting this (see here).

  • Option --genome_build [NAME] should now work as intended (used to be --build only).

SNPsplit_genome_preparation

  • Relaxed SNP filtering criteria to now support multiple homozygous variants for the same position in the genome. This step should incresae the number of usable SNPs slightly (but noticably). See here

  • Changed the SNP filtering for --dual_hybrid mode to only include positions where both strains had a high confidence call (irrespective of the nature of the call). This step should greatly reduce the number of false positive allele calls. See here for more details.

  • Added a check to SNPsplit_genome_preparation that produces a [FATAL ERROR] if the stored chromosome names are not the same as the ones in the VCF file (which is a rather common mistake when people use the Ensembl VCF file but get the genome from UCSC. This should change soon if and when Ensembl adopts the same standard used by NCBI/UCSC).

  • Added a new version of the genome preparation script that can deal with the latest version of the VCF file for the old NCBIM37 genome build ("mgp.v2.snps.annot.reformat.vcf.gz"). The script is called "SNPsplit_genome_preparation_v2VCF" and may be found in the folder "outdated_VCF_versions" on Github. Please note that this does not include the changes to we made the current version (see above).