Workshop: De novo assembly and mapping

Hands-on:

We will continue first with the FASTQ data that should be located in your folder data/ONT_R10-filtered_reads.fastq.gz. Remember, that you already length-filtered the data. Use this as an input for the de novo assembly. Remember to activate your Conda environment or install the necessary tools if not available.

1. De novo assembly (Flye)

use your quality-checked and filtered reads for input
note that we're using --nano-hq with newer ONT R10 chemistry and --nano-raw with older ONT R9 chemistry
the output folder is called flye_output
- we use --meta to activate the "expect metagenome/uneven coverage" mode which can help to recover full plasmid sequences
we tell the tool that the expected --genome-size is 5 Mbp

# run the assembly, this will take a bit time
flye --nano-hq data/ONT_R10-filtered_reads.fastq.gz -o flye_output_R10 -t 8 --meta --genome-size 5M

# the final output genome assembly will be in flye_output_R10/assembly.fasta

While this is running, check the original publication and the GitHub repository of the tool:

Publication | Code

2. Visualization of the assembly (Bandage) -> not possible on HPC

# open the GUI
Bandage &

# load graph file generated by flye:
# ->  flye_output_R10/assembly_graph.gfa

# click "draw graph"

Publication | Code

Tools that have a graphical user interface can cause problems on a cluster machine.

3. Mapping (minimap2)

Now, we want to map the long reads to the assembly you calculated to visualize them.

minimap2 -ax map-ont flye_output_R10/assembly.fasta data/ONT_R10-filtered_reads.fastq.gz > data/ONT_R10-mapping.sam

Publication | Code

First, we need to convert the SAM file into a sorted BAM file to load it subsequently in IGV

samtools view -bS data/ONT_R10-mapping.sam | samtools sort -@ 4 > data/ONT_R10-mapping.sorted.bam  
samtools index data/ONT_R10-mapping.sorted.bam

Inspect the resulting SAM file. Check the SAM format specification.

3.1. Visualization of the mapping (IGV)

# start IGV browser and load the assembly (FASTA) and BAM file, inspect the output
igv &

# load assembly file as 'Genomes'
# ->  flye_output_R10/assembly.fasta

# load mapping file as 'File'
# ->  data/ONT_R10-mapping.sorted.bam

3.2. Alternative: Visualization of mapping (Tablet)

# open the GUI
tablet &

# load mapping file as 'primary assembly'
# ->  data/ONT_R10-mapping.sorted.bam

# load assembly file as 'Reference/consensus file'
# ->  flye_output_R10/assembly.fasta

Publication | Code

Alternative ways to visualize such a mapping are given by (commercial software) such as Geneious or CLC Genomic Workbench.

Next: Polishing with long reads

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

5_LR_assembly.md

5_LR_assembly.md

Workshop: De novo assembly and mapping

Hands-on:

1. De novo assembly (Flye)

2. Visualization of the assembly (Bandage) -> not possible on HPC

3. Mapping (minimap2)

3.1. Visualization of the mapping (IGV)

3.2. Alternative: Visualization of mapping (Tablet)

Files

5_LR_assembly.md

Latest commit

History

5_LR_assembly.md

File metadata and controls

Workshop: De novo assembly and mapping

Hands-on:

1. De novo assembly (Flye)

2. Visualization of the assembly (Bandage) -> not possible on HPC

3. Mapping (minimap2)

3.1. Visualization of the mapping (IGV)

3.2. Alternative: Visualization of mapping (Tablet)