Extremely Small Assembly Size Using Herro-Corrected Reads #49

NancyChoudhary28 · 2025-01-23T17:38:39Z

Hello,
I have ONT R9.4.1 flow cell reads that were Dorado basecalled and Herro corrected. The corrected reads file is approximately 72GB in size and contains around 2 million reads. The estimated genome size is 2.7–2.8 Gb.
When I use the Nanopore-r10.4.1_e8.2-400bps_sup-Herro-Sep2024.conf configuration file, the resulting assembly is only 337 kb in size. However, using the same reads with the older Nanopore-May2022.conf file produces a much larger assembly of 2.7 Gb, which matches the estimated genome size.

Here are the associated AssemblySummary reports:

Herro_conf_AssemblySummary.html file:Herro.pdf
May2022_conf_AssemblySummary.html file: May2022.pdf

I observed that the number of alignments in the read graph is extremely low (only 4393) when using the Herro-specific configuration file.

Why is the assembly size so small when using the Herro-specific configuration file? Are there specific parameters in the configuration file that might be causing this issue? Could you help me in resolving this and improving the assembly?
Thanks.

kokyriakidis · 2025-01-23T18:09:13Z

Hey @NancyChoudhary28!

We will release in a few day the next version of Shasta.

Is it possible to share the data you used to test and optimize Shasta?

colindaven · 2025-01-24T07:21:54Z

@NancyChoudhary28 This is older data - R9.4.1 is much lower accuracy than R10 before Herro correction (maybe 93-94 vs 97%, but varies per dataset?). This is why dorado correct will not perform error correction on R9.4.1 data to my knowledge. AFAIK (I'm not a shasta dev) the newer dorado corrected R10 reads will be about 99-100% identical to the genome, and so fit into shasta's expectations for read quality input. I would guess your reads might have a mode of 96% identity to the genome.

You can test this by aligning say 100k reads vs a genome (say your assembly from the May2022 config), then using a tool like cramino to test the actual aligned read identity distribution.

Why not continue your your May2022 assembly and just intensively polish the contigs ?

paoloshasta · 2025-01-24T15:40:31Z

@colindaven is correct that the assembly configuration for Herro-corrected reads requires the latest ONT reads, which have much higher accuracy than your old R9 reads. For the same reason, the new Shasta release that is coming up (as mentioned by @kokyriakidis ) will not help you.

Your assembly with the Nanopore-May2022 configuration is usable. The low N50 (2.8 Mb) is a consequence of low coverage (75 Gb or 30X for a 3 Gb genome).

It seems to me that you have 3 options:

Stay with your current assembly with the Nanopore-May2022 assembly configuration, optionally polishing with one of the available polishing tools, as suggested by @colindaven.
Try and get more coverage in R9 reads. This is only possible if these reads are already available somewhere, because I don't think ONT supports new R9 sequencing.
Repeat sequencing with the latest from ONT. Only in that case you can use the newer assembly configurations.

paoloshasta added the discussion label Feb 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extremely Small Assembly Size Using Herro-Corrected Reads #49

Extremely Small Assembly Size Using Herro-Corrected Reads #49

NancyChoudhary28 commented Jan 23, 2025 •

edited

Loading

kokyriakidis commented Jan 23, 2025

colindaven commented Jan 24, 2025

paoloshasta commented Jan 24, 2025

Extremely Small Assembly Size Using Herro-Corrected Reads #49

Extremely Small Assembly Size Using Herro-Corrected Reads #49

Comments

NancyChoudhary28 commented Jan 23, 2025 • edited Loading

kokyriakidis commented Jan 23, 2025

colindaven commented Jan 24, 2025

paoloshasta commented Jan 24, 2025

NancyChoudhary28 commented Jan 23, 2025 •

edited

Loading