-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Glitches in GFF3 match format output (-O2) #78
Comments
Dear Dong-Ha, Such an anomaly can occur at the first or the last exon when a predicted codon does not align with any query residue. You may see what happens by running Spaln with -O1 option (alignment mode). For your example, the following command (-T option is added for potentially better performance) $ spaln -Q7 -d gnm_test -T InsectAp -O1 ' toy_example.fa (2)' will produces
Actually, I have no idea how to properly represent this situation in Gff3 format. In the default (-O4) output format, you can see that the genomic sequence does not match any part of the query. I guess one possibility that the query amino acid sequence is not full length, lacking a C-terminus region, so that Spaln forcibly finds a nearby termination codon in this example. I thought that -LS or -LC (local similarity) option can prevent such anomalies, but the current implementation does not work as expected. I will check the behavior of Spaln when -LS (or -LC) option is given. Osamu, |
Dear Osamu, @ogotoh
I followed your instruction from issue 77 to successfully map and align protein sequences to a genome. However, the "match format" GFF3 appears to have some glitches.
toy_example.fa
) was as follows:version 3.0.6a <240916>
)Then the output
toy_example.gff3
includes multiple places where the start and end locations appear swapped:For example, please see this part where the end (2108443) is smaller than the start (2108444):
And also, the 9th column of the same line (181 > 180):
This pattern happens at the end of each alignment for these sequences.
Please let me know if you have an issue reproducing this result or anything else. Thanks again!
Cheers,
Dong-Ha
The text was updated successfully, but these errors were encountered: