Skip to content

Latest commit

 

History

History
88 lines (59 loc) · 3.22 KB

5_LR_assembly.md

File metadata and controls

88 lines (59 loc) · 3.22 KB

Workshop: De novo assembly and mapping

Hands-on:

We will continue first with the FASTQ data that should be located in your folder data/ONT_R10-filtered_reads.fastq.gz. Remember, that you already length-filtered the data. Use this as an input for the de novo assembly. Remember to activate your Conda environment or install the necessary tools if not available.

1. De novo assembly (Flye)

  • use your quality-checked and filtered reads for input
  • note that we're using --nano-hq with newer ONT R10 chemistry and --nano-raw with older ONT R9 chemistry
  • the output folder is called flye_output
    • we use --meta to activate the "expect metagenome/uneven coverage" mode which can help to recover full plasmid sequences
  • we tell the tool that the expected --genome-size is 5 Mbp
# run the assembly, this will take a bit time
flye --nano-hq data/ONT_R10-filtered_reads.fastq.gz -o flye_output_R10 -t 8 --meta --genome-size 5M

# the final output genome assembly will be in flye_output_R10/assembly.fasta

While this is running, check the original publication and the GitHub repository of the tool:

Publication | Code

2. Visualization of the assembly (Bandage) -> not possible on HPC

# open the GUI
Bandage &

# load graph file generated by flye:
# ->  flye_output_R10/assembly_graph.gfa

# click "draw graph"

Publication | Code

Tools that have a graphical user interface can cause problems on a cluster machine.

3. Mapping (minimap2)

Now, we want to map the long reads to the assembly you calculated to visualize them.

minimap2 -ax map-ont flye_output_R10/assembly.fasta data/ONT_R10-filtered_reads.fastq.gz > data/ONT_R10-mapping.sam

Publication | Code

First, we need to convert the SAM file into a sorted BAM file to load it subsequently in IGV

samtools view -bS data/ONT_R10-mapping.sam | samtools sort -@ 4 > data/ONT_R10-mapping.sorted.bam  
samtools index data/ONT_R10-mapping.sorted.bam

Inspect the resulting SAM file. Check the SAM format specification.

3.1. Visualization of the mapping (IGV)

# start IGV browser and load the assembly (FASTA) and BAM file, inspect the output
igv &

# load assembly file as 'Genomes'
# ->  flye_output_R10/assembly.fasta

# load mapping file as 'File'
# ->  data/ONT_R10-mapping.sorted.bam

3.2. Alternative: Visualization of mapping (Tablet)

# open the GUI
tablet &

# load mapping file as 'primary assembly'
# ->  data/ONT_R10-mapping.sorted.bam

# load assembly file as 'Reference/consensus file'
# ->  flye_output_R10/assembly.fasta

Publication | Code

Alternative ways to visualize such a mapping are given by (commercial software) such as Geneious or CLC Genomic Workbench.

Next: Polishing with long reads