-
Notifications
You must be signed in to change notification settings - Fork 29
Circular mtDNA
Caleb Lareau
This page provides a vignette of how to align to a circular mtDNA reference and align reads using gsnap, the only tool that I've found to handle this feature of mtDNA.
Mammalian mtDNA is circular, so many reads from ATAC-seq and other assays will have issues with alignments near the junction (in the d-loop; reference positions in humans 16569<->1
). The Broad's GATK best practices has an approach for handling the circular mtDNA (by realigning to a modified contig and further processing both alignments). Another approach is to use an aligner that enables reference contigs to be circular. When I searched for an alignment tool that enables this, most popular ones (bowtie, bowtie2, bwa, STAR) do not have functionality that supports this. The only major alignment suite I could find that facilitated a circular reference contig was GMAP/GSNAP
. This page provides a crash course on how to set up these tools and align data with a circular mtDNA chromosome being specified.
However, after playing around with alignments over several datasets and then analyzing them downstream with mgatk
, I found bwa to be a sufficiently okay aligner to handle the circular nature of mtDNA. While it isn't optimal, bwa
is the default aligner in most ATAC-seq pipelines, including CellRanger-ATAC
. Importantly, bwa
is quite good at soft-clipping reads to enable alignments near the mtDNA junction, which facilitates reasonable coverage at the junction. Thus, it's still recommended to use bwa
upstream of mgatk
; nevertheless, this page provides a recipe for setting up alignment and processing with GMAP/GSNAP
, which may be of use when variants of interest fall in the d-loop, specifically at positions ~1-20
or ~16549-16569
. Otherwise, it is this author's opinion that the cost of switching from bwa
to GMAP/GSNAP
is not worth it.
available at http://research-pub.gene.com/gmap/
Recall: GMAP is for RNA; GSNAP is for DNA
For example:
# Download
wget http://research-pub.gene.com/gmap/src/gmap-gsnap-2019-09-12.tar.gz
tar -zxvf gmap-gsnap-2019-09-12.tar.gz
# Install
./configure # --prefix=`pwd`
make && make install
If we specify the output directory as $GSNAP_DIR
, this command builds the reference for a supplied reference fasta file:
gmap_build -d hg38 hg38.fasta -c chrM -D $GSNAP_DIR
Here the -c chrM
specifies that the chrM chromosome is circular.
For a sample ATAC-seq library, we can align using the following workflow:
gsnap --nthreads 4 --gunzip -D $GSNAP_DIR -d hg38 $fastq1 $fastq2 -A sam | \
samtools view -bS - | samtools sort -@ 4 - -o "${sample}.st.bam"
From the GSNAP documentation:
XC: Indicates whether the alignment crosses over the origin of a
circular chromosome. If so, the string XC:A:+ is printed.
We can verify that this indeed works--
samtools view miseq_bulk1.gsnap.bam | grep XC:A:+
Please raise an issue here