Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R-SAIGE on bioconda #272

Closed
matuskosut opened this issue Nov 11, 2020 · 26 comments
Closed

R-SAIGE on bioconda #272

matuskosut opened this issue Nov 11, 2020 · 26 comments

Comments

@matuskosut
Copy link
Contributor

matuskosut commented Nov 11, 2020

Hi @weizhou0 @weizhouUMICH and others,

I managed to revise Saige dependencies and build conda packages for Saige (https://anaconda.org/bioconda/r-saige), such that it could make installation as easier as possible.

Guide

You will need bioconda and conda-forge channels (added in this order):

conda config --add channels bioconda
conda config --add channels conda-forge

Feel free to try out the new packages and share feedback:

conda create -n saige -c bioconda r-saige=0.44.5
conda activate saige

step1_fitNULLGLMM.R --help
step2_SPAtests.R --help
createSparseGRM.R --help

Available versions

Latest version: r-saige latest_date

  • r-saige 0.44.5 (R r-base=3.6.3,r-base=4.0)
conda create -n saige -c conda-forge -c bioconda r-base=4.0 r-saige=0.44.5
conda activate saige

Plaintext dosage files

  • r-saige 0.35.8.8 (R r-base=3.6.3,r-base=4.0.3)
conda create -n saige -c conda-forge -c bioconda r-base=4.0 r-saige=0.35.8.8
conda activate saige

Other versions previously released

  • r-saige 0.39 (R r-base=3.5.1, r-base=3.6.1,r-base=4.0)
  • r-saige 0.42 (R r-base=3.6.1,r-base=4.0)
  • r-saige 0.42.1 (R r-base=3.6.1,r-base=4.0)
  • r-saige 0.43.0 (R r-base=3.6.1,r-base=4.0)
  • r-saige 0.44.0 (R r-base=3.6.3,r-base=4.0)
  • r-saige 0.44.1 (R r-base=3.6.3,r-base=4.0)
  • r-saige 0.44.2 (R r-base=3.6.3,r-base=4.0)
  • r-saige 0.44.5 (R r-base=3.6.3,r-base=4.0)

To create conda environment with specific R version:

conda create -n saige -c conda-forge -c bioconda r-base=3.6.1 r-saige=0.43.0 savvy=1.3.0
conda activate saige

Versions 0.44.0 or older might require savvy=1.3.0. You will need to install them as shown in the example above, if you experience this issue:

... libsavvy.so: cannot open shared object file: No such file or directory
Execution halted

For those installing r-saige with R 3.6.1 I recommend installing lower version of optparse (this doesn't apply for R4.0):

conda install -c conda-forge "r-optparse=1.6.2=r36h6115d3f_1"

Related

Some of you may also be interested in other packages that were added:

@hannahpky
Copy link

Hi @matuskosut thank you very much for sharing these conda packages for saige. I've been attempting to run step1 of saige using the step1_fitNULLGLMM.R with optparse R command line parser package; however, I keep receiving segfault errors: " *** caught segfault *** address 0x19, cause 'memory not mapped' " (saige_error.txt
). This error makes me wonder if it's something to do with my install configuration (i.e. saige is unable to access a necessary memory location). I followed the github instructions to create a conda environment (https://github.com/weizhouUMICH/SAIGE/wiki/Genetic-association-tests-using-SAIGE#installing-saige) and then used your conda r-saige=0.42.1 version with r-base=3.6.1. Any ideas on how to troubleshoot? I'm a newbie when it comes to these errors, so any help would be appreciated. Thank you!

@matuskosut
Copy link
Contributor Author

matuskosut commented Nov 16, 2020

Hi @hannahpky, thanks for the feedback. Based on the saige_error.txt that you shared it looks like an incompatibility between optparse and r-base version. Could you shared the exact versions of r-optparse and r-base? There is a special hash string that identifies the build number of that version, for example: "r-base 3.6.1 h8900bf8_2". You can get them using these commands in your conda environment:

conda list | grep optparse
conda list | grep r-base

I think that if we manage to pinpoint the matching version of optparse that we could make it work and perhaps also set a requirement in the package such that it could work for others.

EDIT: getting full conda list and conda info could actually help debugging

@matuskosut
Copy link
Contributor Author

@hannahpky could you maybe try to install lower version of optparse? e.g.:

conda install -c conda-forge "r-optparse=1.6.2=r36h6115d3f_1"

@hannahpky
Copy link

hannahpky commented Nov 16, 2020

Thanks Matúš - I installed a the lower version of optparse (1.6.2). I no longer receive the following message; however, I am still receiving the same " *** caught segfault *** address 0x19, cause 'memory not mapped'" message.

Loading required package: optparse 
Warning message:
package ‘optparse’ was built under R version 3.6.3

I included my conda info and list details to help troubleshoot (
conda_info.txt
conda_list.txt). I'm wondering if I need a different r-base version or some other package version?

@matuskosut
Copy link
Contributor Author

@hannahpky interesting, I would definitely try r-base=4.0 or check other version of saige, e.g. 0.42 is available. I will try to have a closer look on this, any other info about inputs (types, parameters,..) that you could share?

@hannahpky
Copy link

I tried r-base=4.0 with r-saige=0.42.1, but received the same error. These are the inputs I've been using:

--plinkFile=/vgipiper05/hannah/stuttering/population_analysis/merged/potential_ctrls/selected_controls_taketwo_09092020/ALL_stut_merged_casesmatchedctrls_LDprunednoHLAinv \
--phenoFile=/vgipiper05/hannah/stuttering/population_analysis/merged/potential_ctrls/selected_controls_taketwo_09092020/PHENO.txt \
--phenoCol=STUT \
--covarColList=PC1,PC2,PC3,PC4,PC5,PC6 \
--sampleIDColinphenoFile=IID \
--traitType=binary \
--nThreads=4 \
--LOCO=FALSE \
--outputPrefix=/vgipiper05/hannah/stuttering/population_analysis/merged/potential_ctrls/selected_controls_taketwo_09092020/SAIGE_11162020/Stut_SAIGE_fitNullGLMM```

All the inputs appear to be read-in correctly and then immediately afterward I receive the "*** caught segfault ***
address 0x19, cause 'memory not mapped' " error.

@matuskosut
Copy link
Contributor Author

matuskosut commented Nov 18, 2020

@hannahpky I have got the new version in, could you try to upgrade or maybe recreate your conda environment such that new version is installed..

conda update r-saige

@Esther19960123
Copy link

createSparseGRM.R --help

It works well so I installed saige successfully which I spent two days installing and failed. Thank you a lot

@matuskosut
Copy link
Contributor Author

It works well so I installed saige successfully which I spent two days installing and failed. Thank you a lot

@Esther19960123 Great! Happy to hear that everything is working well 😊 in case you will encounter any bug feel free to open a new issue a tag me in..

@hannahpky
Copy link

Yep, thanks @matuskosut the new r-saige version seems to be working well! I'm no longer receiving segfault errors :)

@matuskosut matuskosut changed the title Saige on bioconda R-SAIGE on bioconda Nov 20, 2020
@weizhou0
Copy link
Contributor

Hi @matuskosut,

Thank you so much for sharing the bioconda SAIGE!!!

Wei

@alecmchiu
Copy link

@matuskosut

Works like a charm! Spent several days trying to install this on the cluster that I work on to no avail. However, your bioconda installation was able to get me to all the help messages successfully! Hopefully it runs without a hitch too!

@matuskosut
Copy link
Contributor Author

@alecmchiu awesome! You probably installed 0.42.1 which now seems pretty stable. But as I write this a new release 0.43.0 is being uploaded in case you are interested.

@huanghe0220
Copy link

Hi @matuskosut ,I have followed your steps to install saige.
conda create -n saige -c conda-forge -c bioconda r-base=3.6.1 r-saige=0.43.0
conda install -c conda-forge "r-optparse=1.6.2=r36h6115d3f_1"
However, when I came to this step:
step1_fitNULLGLMM.R --help
I received a bug:
Error: package or namespace load failed for ‘SAIGE’ in dyn.load(file, DLLpath = DLLpath, ...): unable to load shared object '/home/huanghe/miniconda3/lib/R/library/SAIGE/libs/SAIGE.so': libsavvy.so: cannot open shared object file: No such file or directory Execution halted

Do you know how to solve this problem? I have been stuck with this package for 1 week, so any help would be appreciated. Thank you!

@matuskosut
Copy link
Contributor Author

Hi @huanghe0220, thanks for reporting. I updated the guide. There was a bigger change in the new version of savvy library.

You will need to use savvy=1.3.0 to get the correct match to your saige version. Example:

conda create -n saige -c conda-forge -c bioconda r-base=3.6.1 r-saige=0.43.0 savvy=1.3.0

Or you can add it to your existing environment:

conda install -n saige -c conda-forge -c bioconda savvy=1.3.0

@matuskosut
Copy link
Contributor Author

@weizhou0 @weizhouUMICH does it seem ok to include info about r-saige package in README? I made a Pull request #343 Feel free to comment/change if there is anything off.

@huanghe0220
Copy link

@matuskosut Thanks for your reply. I suppose the saige has been installed successfully. But when I came to step 2 with the example dataset, Rscript step2_SPAtests.R --dosageFile=./input/dosage_10markers.txt --dosageFileNrowSkip=1 --dosageFileNcolSkip=6 --dosageFilecolnamesSkip=CHR,SNP,CM,POS,EFFECT_ALLELE,ALT_ALLELE --minMAF=0.0001 --sampleFile=./input/sampleIDindosage.txt --GMMATmodelFile=./output/example.rda --varianceRatioFile=./output/example.varianceRatio.txt --SAIGEOutputFile=./output/example.plainDosage.SAIGE.txt --numLinesOutput=2 --IsOutputAFinCaseCtrl=TRUE
I came into this bug:
Error in getopt_options(object, args) : Error in getopt(spec = spec, opt = args) : long flag "dosageFile" is invalid
If I use my own data, I would bump into another bug in step 1(Rscript step1_fitNULLGLMM.R):
Error in terms.formula(formula, data = data) : invalid model formula in ExtractVars
Do you know why? Looking forward to your reply. Thanks!

Best wishes,
Huang He

@matuskosut
Copy link
Contributor Author

@huanghe0220 is this plain text dosage files being used? If so, you need to use this version r-saige=0.38.8.8.

@huanghe0220
Copy link

@matuskosut I think you mean r-saige=0.35.8.8? I couldn't find 0.38.8.8, and installed version 0.35.8.8, but I still got the bug in step 2 with example dataset, and the same bug in step 1 with my own dataset. I got really confused :(

@matuskosut
Copy link
Contributor Author

matuskosut commented May 12, 2021

@huanghe0220 yes, thanks for correcting. I fixed the version typo in the issue text.

I see that your step 2 issue was long flag "dosageFile" is invalid and looking at the source code of 0.35.8.8 --dosageFile is present in step2: https://github.com/weizhouUMICH/SAIGE/blob/0.35.8.8/extdata/step2_SPAtests.R#L10 (it is not present in newer versions)

Could you make an issue for this if you will still have problem? https://github.com/weizhouUMICH/SAIGE/issues/new

@huanghe0220
Copy link

@matuskosut Thanks for your reply. I will make an issue.

@weizhouUMICH
Copy link
Owner

Hi @matuskosut,

There seems the query issue for bgen file in the current bioconda version. I wonder whether you used the bgenix version in the /thirdparty folder when you compile SAIGE. If not, that would cause the issue. Thanks!

Wei

@matuskosut
Copy link
Contributor Author

matuskosut commented Jan 25, 2022

Hi @weizhouUMICH,

Is this related to issue: #393 ? Were you also able to reproduce? Could it be possible to reproduce with some nonsensitive data? since I don't have access to any datasets.

I built bgenix based on same version as in /thirdparty. I was applying the patches that I identified in your repo. I noticed only one that was missing - although I think it is unrelated - but still I rebuilt bgenix and saige and published in conda in case you want to try it. (e.g.: linux-64/r-saige-0.44.6.5-r40h6d4de14_3.tar.bz2 - https://anaconda.org/bioconda/r-saige)

It is possible to also try bgenix separately to isolate the issue: https://anaconda.org/conda-forge/bgenix

@weizhou0
Copy link
Contributor

weizhou0 commented Jan 26, 2022 via email

@matuskosut
Copy link
Contributor Author

Hi @weizhouUMICH (@weizhou0 ),

One thing to explain is the principle from conda-forge and bioconda - the package should contain only one tool/library which is also implied by the rule that there should not be multiple licenses packaged together. So each package has to be built separately and stated clearly as dependency. Hence we have to build bgenix from separate repo.
Maybe you could have a look? This is the branch for 1.1.4 version that is used for R-SAIGE:

https://github.com/huntdatacenter/rbgen/tree/build-v1.1.4

Let me know in case you spot any difference. I was aware of some of your earlier changes and passed the patches into that branch, plus I was also patching waf to run with python3. Since this problem appeared only now, I think it still might be some other cause.

In the meantime conda package for 0.45 was released, check here: https://anaconda.org/bioconda/r-saige

@weizhouUMICH
Copy link
Owner

Hi @matuskosut,

We have just released a new version 1.0.0. It no longer depends on the bgen library.

It has computational efficiency improvements for both Step 1 and Step 2 for single-variant and set-based tests. We have created a new program github page https://github.com/saigegit/SAIGE with the documentation provided https://saigegit.github.io/SAIGE-doc/

Thanks!
Wei

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants