Framework for efficient high-dimensional association analyses.
run_ExampleStudy.sh
script runs example of association study with 20.000 SNPs, 1000 phenotypes and 1000 subjects.
It runs analysis by chunk of 5000 SNPs (which you can define in config.py
file). Standard output looks like this:
START regression mode...
reading file example_study.csv
There are 1000 ids and 1000 columns
reading file example_study.csv
There are 1000 ids and 3 columns
There are 1000 ids
There are 1000 common ids
...
...
...
time to compute GWAS for 1000 phenotypes and 5000 SNPs .... 0.681949138641 sec
Read 15000, processed 15000, total 20000
...
time to compute GWAS for 1000 phenotypes and 5000 SNPs .... 0.565479040146 sec
Read 20000, processed 20000, total 20000
...
experiment finished in 10.0326929092 s
Navigate to directory where you want to install HASE and clone this repository:
git clone https://github.com/roshchupkin/hase.git
You can update HASE to newest version using git
. Navigate to your HASE folder (where you cloned git repository):
git pull
Your system might already satisfied requirements, we suggest first try to run test example from Testing header below.
-
HDF5 software (python packages
tables
andh5py
require this installation). If it is not installed on you system, you can download to your home directory the latest source code hdf5.tar -xf ~/hdf5-1.8.16.tar.gz cd ~/hdf5-1.8.16/ ./configure make make install
Then you need to add one line to your
.bachrc
or.bash_profile
file in your home directory.export HDF5_DIR=~/hdf5-1.8.16/hdf5/
-
BLAS and LAPACK linear algebra libraries for
scipy
andnumpy
.sudo apt-get install gfortran libopenblas-dev liblapack-dev
If this does not work or raise errors, then you might need to follow instruction from scipy website.
-
You need to install python. You can download python from official website python or install one of the python distribution for scientific research, such as Anaconda, Enthought Canopy or Python(x,y). And then you need to install (or first uninstall)
scipy
andnumpy
python libraries.pip install scipy pip install numpy
To check linkage in numpy:
python >>> import numpy as np >>> np.__config__.show()
And you should see something like this:
lapack_opt_info: libraries = ['openblas', 'openblas'] library_dirs = ['/cm/shared/apps/openblas/0.2.9-rc2/lib'] language = f77 blas_opt_info: libraries = ['openblas', 'openblas'] library_dirs = ['/cm/shared/apps/openblas/0.2.9-rc2/lib'] language = f77 openblas_info: libraries = ['openblas', 'openblas'] library_dirs = ['/cm/shared/apps/openblas/0.2.9-rc2/lib'] language = f77 blas_mkl_info: NOT AVAILABLE
-
Install python packages listed in
requirements.txt
file. (you can use package manager which comes with your pythonpip
orconda
to install packages):- bitarray
- argparse
- cython
- matplotlib
- scipy
- numpy
- pandas
- h5py
- tables
- Navigate to HASE directory and type
python hase.py -h
, you should see help message. - Navigate to HASE directory and type
sh run_ExampleStudy.sh
, it should start running toy example of high-dimensional GWAS.
wiki.
- HDF5 software.
- BLAS and LAPACK linear algebra libraries.
- Python.
- Python packages:
- bitarray
- argparse
- cython
- matplotlib
- scipy
- numpy
- pandas
- h5py
- tables
- Git.
If you use HASE framework, please cite:
This project is licensed under Apache-2.0 License.
Gennady V. Roshchupkin (Department of Epidemiology, Radiology and Medical Informatics, Erasmus MC, Rotterdam, Netherlands)
Hieab H. Adams (Department of Epidemiology, Erasmus MC, Rotterdam, Netherlands)
If you have any questions/suggestions/comments or problems do not hesitate to contact us!