Skip to content

Circular mtDNA

Caleb Lareau edited this page Jan 27, 2020 · 3 revisions

Handling the circular mtDNA

Caleb Lareau

TLDR; this is not required for running mgatk but hopefully useful to know.

This page provides a vignette of how to align to a circular mtDNA reference and align reads using gsnap, the only tool that I've found to handle this feature of mtDNA.

Mammalian mtDNA is circular, so many reads from ATAC-seq and other assays will have issues with alignments near the junction (in the d-loop; reference positions in humans 16569<->1). The Broad's GATK best practices has an approach for handling the circular mtDNA (by realigning to a modified contig and further processing both alignments). Another approach is to use an aligner that enables reference contigs to be circular. When I searched for an alignment tool that enables this, most popular ones (bowtie, bowtie2, bwa, STAR) do not have functionality that supports this. The only major alignment suite I could find that facilitated a circular reference contig was GMAP/GSNAP. This page provides a crash course on how to set up these tools and align data with a circular mtDNA chromosome being specified.

However, after playing around with alignments over several datasets and then analyzing them downstream with mgatk, I found bwa to be a sufficiently okay aligner to handle the circular nature of mtDNA. While it isn't optimal, bwa is the default aligner in most ATAC-seq pipelines, including CellRanger-ATAC. Importantly, bwa is quite good at soft-clipping reads to enable alignments near the mtDNA junction, which facilitates reasonable coverage at the junction. Thus, it's still recommended to use bwa upstream of mgatk; nevertheless, this page provides a recipe for setting up alignment and processing with GMAP/GSNAP, which may be of use when variants of interest fall in the d-loop, specifically at positions ~1-20 or ~16549-16569. Otherwise, it is this author's opinion that the cost of switching from bwa to GMAP/GSNAP is not worth it.


0) Install GMAP/GSNAP

available at http://research-pub.gene.com/gmap/

Recall: GMAP is for RNA; GSNAP is for DNA

For example:

# Download
wget http://research-pub.gene.com/gmap/src/gmap-gsnap-2019-09-12.tar.gz
tar -zxvf gmap-gsnap-2019-09-12.tar.gz	

# Install
./configure # --prefix=`pwd`
make && make install

1) Build reference genome with the mitochondrial genome specified as circular

If we specify the output directory as $GSNAP_DIR, this command builds the reference for a supplied reference fasta file:

gmap_build -d hg38 hg38.fasta -c chrM -D $GSNAP_DIR

Here the -c chrM specifies that the chrM chromosome is circular.


2) Example alignment

For a sample ATAC-seq library, we can align using the following workflow:

gsnap --nthreads 4 --gunzip -D $GSNAP_DIR -d hg38 $fastq1 $fastq2 -A sam | \
     samtools view -bS - | samtools sort -@ 4 - -o "${sample}.st.bam"

3) Verify reads were aligned over the origins of the circular chroomosome;

From the GSNAP documentation:

XC: Indicates whether the alignment crosses over the origin of a
circular chromosome. If so, the string XC:A:+ is printed.

We can verify that this indeed works--

samtools view miseq_bulk1.gsnap.bam | grep XC:A:+