Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault in command line parsing #46

Open
tillea opened this issue Jan 23, 2022 · 7 comments
Open

Segmentation fault in command line parsing #46

tillea opened this issue Jan 23, 2022 · 7 comments

Comments

@tillea
Copy link

tillea commented Jan 23, 2022

Hi,
I realised that spaln was moved to github and thus I wanted to release an updated Debian package. We have created a CI test for version 2.4.1, which basically calls:

perl makeidx.pl -v -inp dictdisc_g.gf.gz
spaln -Q7 -d dictdisc_g -T dictdisc dictdisc.faa.gz
spaln -Q7 -d dictdisc_g -yS -T dictdisc -O12 -g dictdisc.cf.gz
sortgrcd -O15 -F2 dictdisc.grd.gz

I realised that I need to apply this patch since it can not assumed that the current directory is inside PATH. (BTW, it would probably be better to not ship the same code twice.)

That way the test is progressing, however the line

spaln -Q7 -d dictdisc_g -yS -T dictdisc -O12 -g dictdisc.cf.gz

segfaults and the actual code line is marked in this debug patch. I guess there is somewhere an issue in the command line parsing code which just crashes here. You can see a full build log to see how spaln was build and here you find the call that leads to segfault.

Kind regards, Andreas.

@ogotoh
Copy link
Owner

ogotoh commented Jan 28, 2022

Dear Andreas,

Thank you very much for your reports. I have just uploaded a bug-fix version (ver.2.4.7) to the GitHub site.

Compared with ver.2.4.1, the latest version shows not dramatically but significantly better performance in speed and mappability.

Use of “makeidx.pl -inp input” is somewhat obsolete. Use of “spaln -W -K[A|D|P] input” would be more convenient in most cases.

The meaning of -yS option has slightly changed from the original one; species-specific parameter set will be used whenever -T option is specified. So, you usually do not need to set this option (that is why I could not notice the bug).

Osamu,

@tillea
Copy link
Author

tillea commented Jan 28, 2022

Dear Osamu,

Thank you very much for your reports. I have just uploaded a bug-fix version (ver.2.4.7) to the GitHub site.

Ahhh, I just realise now that version 2.4.11 is older than 2.4.2 etc. Our automatic download tool has fetched this one as last release. Now I have tweaked it to consider this 2.4.1.1 (I recommend separating even micro and nano version by a '.' for clarification.)

Compared with ver.2.4.1, the latest version shows not dramatically but significantly better performance in speed and mappability.

Sounds good!

Use of “makeidx.pl -inp input” is somewhat obsolete. Use of “spaln -W -K[A|D|P] input” would be more convenient in most cases.

I do not insist in using makeidx.pl - I simply continued to use the test that was once invented for a certain version. In other words: I'd be super happy if you suggest a better way to test than what we are using in our CI test. Feel free to suggest a total rewrite of what you can read starting at line 18 in this file.

The meaning of -yS option has slightly changed from the original one; species-specific parameter set will be used whenever -T option is specified. So, you usually do not need to set this option (that is why I could not notice the bug).

Thanks a lot for the quick fix.

Unfortunately the test we used to work keeps on failing Any hint how to rewrite that test sensibly would be welcome.
Kind regards, Andreas.

@ogotoh
Copy link
Owner

ogotoh commented Jan 29, 2022

Dear Andreas,

Change the last line of CItest script from
sortgrcd -O15 -F2 dictdisc.grd.gz
to
sortgrcd -O15 -F2 dictdisc.cf.grd.gz

and then test again.

Osamu,

@tillea
Copy link
Author

tillea commented Jan 30, 2022 via email

@ogotoh
Copy link
Owner

ogotoh commented Jan 31, 2022

Dear Andreas,

Thank you again for your continued efforts.

I admit that the error message is not much informative. I will change it in the next release.

As mentioned earlier, I prefer to use “spaln -W” rather than to use “makeidx.pl”, because the latter no longer depends on “make” so that it runs in any directory without any care about the location of “Makefile”. One minor disadvantage is that we must run spaln twice if we want to format the genomic sequence for both DNA and protein queries. Thus, the recommended alternative to Line 18 would be:

spaln -W -KD [-t4] dictdisc_g.gf.gz
spaln -W -KP [-t4] dictdisc_g.gf.gz

The optional -t4 option would approximately halve the calculation time, if your system supports up to four-fold multi-thread operation. (On my PC with 8 cores, 4 was the best; more than 5 did not further accelerate calculation).

Likewise, -tN option would be useful in the map-and-search mode. In this mode, the acceleration rate is nearly linier with respect to N.

Osamu,

@tillea
Copy link
Author

tillea commented Jan 31, 2022 via email

@ogotoh
Copy link
Owner

ogotoh commented Feb 1, 2022

Dear Andreas,

Thanks. I have an additional comment. The -t${NCPUS} option would be more beneficial in the map/align mode rather than in the format mode.

spaln -Q7 -d dictdisc_g -T dictdisc -t${NCPUS} dictdisc.faa.gz
spaln -Q7 -d dictdisc_g -T dictdisc -t${NCPUS} -O12 -g dictdisc.cf.gz

Osamu,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants