-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Options for higher protein-to-genome alignment sensitivity #82
Comments
Upon visual inspection of the alignments on the Genome Data Viewer, we saw a pattern consistent with Figure 2a-c of the
Here are some GDV examples of the wasp genome chr16 (all URLs live for ~90 days since the last visit)
... I will now look into what happened with the "missing" BUSCOs only in the annotation using In the meantime, it would be great if there were Thanks! |
Dear Dong-Ha, Thank you for your reports, which are generally consistent with my experience. The primarily purpose of Spaln is to predict the complete structure (protein coding region only) of the gene orthologous to the query. To improve the mapping sensitivity including paralogs, following options might be effective. The -yX2 option is intended to find weaker homologs than the default (-yX1). However, I have not yet confirmed whether this intention is actually realized. < Examples where miniprot had some alignments where spaln did not, or spaln missed the starting exons: I have tried a few methods to improve the mapping sensitivity. Unfortunately, I have not yet obtained that consistently outperforms to the current method. As for the missing tarting exons, there seems to be a space to be improved. I am very glad if you send me a few such examples. Finally, you can limit the maximum intron length by the -yM_n_ option (ex. -yM100K). Osamu, |
Dear Osamu, @ogotoh
Hello again!
Continuing from issues #78 and #80, we have been testing
spaln
(v.3.0.6d) to generate protein-to-genome alignments that can provide evidence for our genome annotation pipeline.For the test, we tried aligning ~217K (mostly) Hymenoptera proteins to a wasp genome and comparing the results with
miniprot
(v.0.13). Below is the code we tried:Here is the summary of the results:
miniprot
aligned ~169K (77.7%) of the input proteins. Among them, ~22K were aligned to multiple locations.spaln -Q7 -M25
aligned ~146K (67.2%) of the input proteins. Among them, ~8.8K were aligned to multiple locations.spaln -Q7 -M25 -T InsectAp
aligned slightly more (by a couple of hundreds) but not as much.spaln -Q4
did not add much compared tospaln -Q7
....
Looking at the
miniprot
code, we relaxed some parameters to allow less homologous alignments and longer introns:With these modifications,
miniprot
produced more homologous protein alignment evidence for gene model predictions. The complete BUSCO contents (hymenoptera_odb10
) were 0.2% higher when usingminiprot
alignments thanspaln
when all other processes in our genome annotation pipeline were identical:We are now comparing gene models annotated and protein alignments by
miniprot
andspaln
to see differences not captured in BUSCO evaluations....
My questions are
spaln
parameters that allow reporting less homologous secondary alignments and longer introns similar to theminiprot
command we used?spaln
sensitivity (i.e., more proteins aligned) you would recommend?Thanks a lot!
Cheers,
Dong-Ha
The text was updated successfully, but these errors were encountered: