-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extremely Small Assembly Size Using Herro-Corrected Reads #49
Comments
Hey @NancyChoudhary28! We will release in a few day the next version of Shasta. Is it possible to share the data you used to test and optimize Shasta? |
@NancyChoudhary28 This is older data - R9.4.1 is much lower accuracy than R10 before Herro correction (maybe 93-94 vs 97%, but varies per dataset?). This is why dorado correct will not perform error correction on R9.4.1 data to my knowledge. AFAIK (I'm not a shasta dev) the newer dorado corrected R10 reads will be about 99-100% identical to the genome, and so fit into shasta's expectations for read quality input. I would guess your reads might have a mode of 96% identity to the genome. You can test this by aligning say 100k reads vs a genome (say your assembly from the May2022 config), then using a tool like cramino to test the actual aligned read identity distribution. Why not continue your your May2022 assembly and just intensively polish the contigs ? |
@colindaven is correct that the assembly configuration for Herro-corrected reads requires the latest ONT reads, which have much higher accuracy than your old R9 reads. For the same reason, the new Shasta release that is coming up (as mentioned by @kokyriakidis ) will not help you. Your assembly with the It seems to me that you have 3 options:
|
Hello,
I have ONT R9.4.1 flow cell reads that were Dorado basecalled and Herro corrected. The corrected reads file is approximately 72GB in size and contains around 2 million reads. The estimated genome size is 2.7–2.8 Gb.
When I use the Nanopore-r10.4.1_e8.2-400bps_sup-Herro-Sep2024.conf configuration file, the resulting assembly is only 337 kb in size. However, using the same reads with the older Nanopore-May2022.conf file produces a much larger assembly of 2.7 Gb, which matches the estimated genome size.
Here are the associated AssemblySummary reports:
I observed that the number of alignments in the read graph is extremely low (only 4393) when using the Herro-specific configuration file.
Why is the assembly size so small when using the Herro-specific configuration file? Are there specific parameters in the configuration file that might be causing this issue? Could you help me in resolving this and improving the assembly?
Thanks.
The text was updated successfully, but these errors were encountered: