We will continue first with the FASTQ data that should be located in your folder data/ONT_R10-filtered_reads.fastq.gz
. Remember, that you already length-filtered the data. Use this as an input for the de novo assembly. Remember to activate your Conda environment or install the necessary tools if not available.
- use your quality-checked and filtered reads for input
- note that we're using
--nano-hq
with newer ONT R10 chemistry and--nano-raw
with older ONT R9 chemistry - the output folder is called
flye_output
-
- we use
--meta
to activate the "expect metagenome/uneven coverage" mode which can help to recover full plasmid sequences
- we use
- we tell the tool that the expected
--genome-size
is 5 Mbp
# run the assembly, this will take a bit time
flye --nano-hq data/ONT_R10-filtered_reads.fastq.gz -o flye_output_R10 -t 8 --meta --genome-size 5M
# the final output genome assembly will be in flye_output_R10/assembly.fasta
While this is running, check the original publication and the GitHub repository of the tool:
# open the GUI
Bandage &
# load graph file generated by flye:
# -> flye_output_R10/assembly_graph.gfa
# click "draw graph"
Tools that have a graphical user interface can cause problems on a cluster machine.
Now, we want to map the long reads to the assembly you calculated to visualize them.
minimap2 -ax map-ont flye_output_R10/assembly.fasta data/ONT_R10-filtered_reads.fastq.gz > data/ONT_R10-mapping.sam
First, we need to convert the SAM file into a sorted BAM file to load it subsequently in IGV
samtools view -bS data/ONT_R10-mapping.sam | samtools sort -@ 4 > data/ONT_R10-mapping.sorted.bam
samtools index data/ONT_R10-mapping.sorted.bam
Inspect the resulting SAM file. Check the SAM format specification.
# start IGV browser and load the assembly (FASTA) and BAM file, inspect the output
igv &
# load assembly file as 'Genomes'
# -> flye_output_R10/assembly.fasta
# load mapping file as 'File'
# -> data/ONT_R10-mapping.sorted.bam
# open the GUI
tablet &
# load mapping file as 'primary assembly'
# -> data/ONT_R10-mapping.sorted.bam
# load assembly file as 'Reference/consensus file'
# -> flye_output_R10/assembly.fasta
Alternative ways to visualize such a mapping are given by (commercial software) such as Geneious or CLC Genomic Workbench.