An online De Bruijn graph sequence assembler that will generate a visual and interactive graph. This tool is intended for educational use; it's probably too slow to use in production.
You can input raw seqencing reads directly into it, or upload a FASTQ or FASTA file. It will then generate a De Bruijn graph using d3-graphviz and will find all possible Eulerian paths (the solutions) from the only possible node, or a random node (that will work) if exists an Eulerian cycle. The core is written in Elm, a pure functional language that compiles to JavaScript.
There are five major components to the tool: generating the kmers, generating the graph, compressing the graph, finding all Eulerian paths from a starting kmer, and resolving repeats.
-
Generating kmers uses a sliding window and has a time complexity of
O(n^2)
wheren
is the number of reads andk
is the kmer size desired. -
Generating the graph involves identifying
k-1
overlaps for each kmer and compiling to dot for Graphviz and has a time complexity ofO(n^2)
wheren
is the number of kmers. I am unsure the time complexity for Graphviz to generate the given graph. -
Compressing the graph involves finding components in the graph that can be compressed and has a time complexity of
O(n+e)
wheren
is the number of kmers ande
is the number of overlaps. -
Finding Eulerian paths uses depth-first search and I am uncertain about the time complexity as each kmer potentially can have branching paths with cycles that each trigger their own DFS.
-
Resolving the repeats involves cutting out repeats found in the graph then uses a very similar algorithm to what's used in finding all the Eulerian paths to find each individual sequence. Finding repeats and cutting them out has a time complexity of
O(n)
wheren
is the number of kmers. Finding those individual paths in the cut graph has a time complexity ofO(n+e)
whereN
is the number of kmers ande
is the number of overlaps. It's much quicker than finding all the Eulerian paths due to having the repeats cut out removes the possibility for any branching paths.
More about De Bruijn graphs and their applications in genome assembly.
Check out the tool here!
To setup locally for development, you'll need a few dependencies:
To build the project, you just need to execute make
:
make
To build for production use:
make env=prod