-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question on job parallelization #42
Comments
Thank you for your interest in Spaln. Frankly speaking, I have not used spaln and sortgrcd in the way that you suggested, after the time when spaln supported multi-thread operations; In my environment, I cannot easily use cluster machines. So, probably you know better than me about the performance of the combined use of spaln and sortgrcd under multi-machine environments. However, please wait a few days before you start your large-scale calculation. I have found a few bugs that can cause segmentation faults (see issue #41) in rare situations. I have fixed them and am now testing the modified version on real data. I will announce you through this medium when I release the fixed version. Osamu, |
Thanks a lot Osamu. I would like to wait for your new/modified version of spaln. |
Although it took unexpectedly long time, I have finished modification of spaln. Tested upon more than 100 pairs of genomic and assembled transcript DNA sequences in the DDBJ database of various sequence similarity levels, the new version (Ver.2.4.6) runs without segmentation faults. For protein queries, tests have not been done in this detail. However, it works fine for a few examples. Thus, I wanted not to further delay the release of this version. I thank you for your patience. If you encounter any problems with this or previous versions of spaln, please let me know at your convenience. Osamu, |
Hello Osamu
First off, thanks a lot for the great program and continued support/enhancement on this.
I had a question on job parallelization. Assume I have a protein query file of 15K sequences and I used 2 approaches
sortgrcd
to get gff3 filessortgrcd
In both cases, spaln was called appropriately after formatting the database:
spaln -t20 -Q7 -O12 -M1 [other options] -dDatabase Query
for Job 1aspaln -t20 -Q7 -O12 -M1 [other options] -dDatabase Query_1[2,3]
for Job 2a, Job 2b, Job 2c for each the appropriate query filesThe question: Will there be any major differences with the 2 outputs
sortgrcd -P40 -C50 -O0 Query.grd > spaln_single_job.gff3
sortgrcd -P40 -C50 -O0 Query_1.grd Query_2.grd Query_3.grd > spaln_multi_job.gff3
-- this is done after ensuring all the relevant*.{erd, qrd}
files are in the same directory as well as ensuring that*.{ent, idx, grp, seq}
files of the database are also present in the directory where thesortgrcd
job is runningI did look thru' both outputs in many different ways and could not find any differences. I am going to
productionize
a pipeline and I felt I should ask you if there would be any specific caveats I should be aware of if I use Approach 2Thanks in advance,
The text was updated successfully, but these errors were encountered: