-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
short gene for spaln output #39
Comments
Dear Fangdm, Thank you for your comment. Because of its intrinsic algorithmic nature, Spaln is rather poor at mapping short queries; Spaln is tuned to map relatively long queries, so that short queries are often escape from detection. I consider it difficult to run Spaln at the present speed for all range of query lengths. If your queries consist of only short sequences, it might deserve to try either or both of the following options: spaln -O0 -Q7 -pw -T homosapi -t12 -M4 -dHsap -XS -Xf0.2 ENSP00000248150, ENSP00000480507, … > Out.gff3 will retry to map the failed queries. In a future release, I would like to make Spaln automatically adjust proper options depending on the query length. Osamu, |
Hi, thanks for your reply and updates. I've tried as you suggested with Ver2.4.4, but I get short genes with false coordinates, whether the short genes with short exons and/or long introns caused higher false positives. I've also tried changing -Xf 0.1 to -Xf 0.4, but it actually produced same result (no result when -Xf 0.1). Were there other parameters that I can do to improve the result for short genes? Meanwhile, I had tried to annotate my de novo genome with spaln, but I cann’t find a close relative species in table/gnm2tab for -T parameter, how to generate my own species-specific subdirectory with the published relative species. Thank you. |
Thank you for your rapid response. I was not much surprised with your negative report. Probably more fundamental change in the Spaln’s algorithm would be necessary to properly deal with short queries. For a while, I suggest you to use Tblastn or Blat to find relevant genomic loci, then apply Spaln to the found loci in the alignment only mode (-Q0-3). To generate a species-specific parameter set, we need a considerable number (at least 1000) of transcript (cDNA, EST, TSA (Transcriptome Shotgun Assembly)) sequences, together with the genomic sequence, of that species. Then,
The third part is somewhat messy and I have not enough time to write a manual. If you tell me the public site of the genomic and transcript sequences of the relative species, I will make the parameter set of that species. Osamu, |
Hi, many thanks for your kind help. I've tried as you suggested with the commands: Spaln -O0 -Q3 -pw -T homosapi -dHsap ‘$chromesome left right’ query. But I'm very sure there was corresponding chromosomes in my genome sequence. And I used another way that I extracted the sequences near the target gene, and successfully predicted the corresponding genes. For this way, in my project, it also had great difficulties if the prediction with tblastn or blat were not accurate. For a species-specific parameter set, for me, it's a complex process that involves a series of steps, and it takes some experience to correct it. I'll trouble you later, HaHa. Anyway, thanks again for your help! |
Hi, many thanks for making and maintaining this great program. I've run spaln for a performance test, using human genome and its own proteins set with different size from 30 to 5000 (amino acids). It's a great tool! It runs quickly and less gene error messages. However, I noticed a problem that short genes (< 80 aa) had no result except for the following report:
ENSP00000248150 < 0 67 12 11607 11606 0.00 6.59 24 30 2 2 6 5
ENSP00000480507 > 0 40 12 12387 12388 0.00 0.00 18 11 1 0 3 3
ENSP00000300873 > 0 70 13 19963 19963 10.54 6.68 32 37 2 2 6 6
ENSP00000477297 > 0 50 10 3146 3146 17.06 0.00 35 23 1 1 5 3
ENSP00000412536 > 0 30 13 21852 21852 12.02 6.68 15 5 1 0 3 1
…
And I had tried to modify some parameters, for example: -LS, -Xk3, -Xs3 and -yx0, but there is still no any result, so how to solve the problem. Here are my commands:
makeidx.pl -ip Hsap.mfa.gz
spaln -O0 -Q7 -pw -T homosapi -t12 -M4 -dHsap Hsap_test1.pep > Hsap_test1.spaln.gff3
Thank you.
The text was updated successfully, but these errors were encountered: