Skip to content

A collection of modules to process and analyze IMGT-HLA sequences.

Notifications You must be signed in to change notification settings

WansonChoi/HATK

Repository files navigation

HLA Analysis Toolkit (HATK; v2.0(beta))

(1) Introduction

HATK(HLA Analysis Tool-Kit) is a collection of tools and modules to perform HLA fine-mapping analysis, which is to identify which HLA allele or amino acid position of the HLA gene is driving the disease. HLA fine-mapping analysis is an indispensable analysis in studies of autoimmune diseases.

In GWAS(Genome-wide Association Test) and its fine-mapping analysis, researchers can obtain candidate causal variants of the target disease. However, the association test performed on the variants in the HLA(Human Leukocyte Antigen) region, chromosome 6p21, usually shows unreliable results because this region has an outlandish polymorphism. Consequently, Performing conventional association test based on SNP array panel may generate inaccurate signals in the HLA region.

On the other hand, the IPD-IMGT/HLA, which is a specialist database, provides the official and most detailed information of the HLA region. Being updated 4 times a year, they keep and manage whole HLA allele information and name those alleles based on the nomenclature defined by the 'WHO Nomenclature Committee For Factors of the HLA System’. Furthermore, they provide each HLA allele's (1) amino acid and (2) DNA sequence information. To use these data, Exact HLA allele information of patients is required and researchers may have to employ expensive HLA typing technologies. However, thanks to the recent development of HLA imputation and inference technologies, researchers now can obtain hundreds to thousands of patients’ HLA allele information and detour the cost issue of using HLA typing service.

Ultimately, HATK aims to perform an association test targeted to the HLA region. Based on patients’ HLA type information and its corresponding Amino acid and DNA sequence information distributed by the IMGT-HLA database, HATK builds a marker panel including not only the typical intergenic genomic variants(i.e. SNPs) markers but also variants of HLA region. Also, HATK provides the additional association test method so that researchers can analyze the signals arising in the amino acid sequence position.

README_Main_Pipeline_WorkFlow



(2) Installation

First, Prepare OS X(Mac) or Linux operating system. HATK currently doesn't support Windows.

Then, Download this project in somewhere directory of your OS X or Linux system. It will be assumed that 'git' command is already installed in your system.

$ git clone https://github.com/WansonChoi/HATK.git
$ cd HATK

We strongly recommend using 'Anaconda(or Miniconda)' to set up HATK.

  1. Install Anaconda or Miniconda.

    Miniconda is a minimal version of Anaconda with fewer default packages. I recommend Miniconda if you want less storage uptake.


  1. Create a new independent Python virtual environment for HATK with the given YML file.

    By using the 'HATK_LINUX.yml' or 'HATK_OSX.yml' file in the project folder depending on your operating system, Create a new Python virtual environment.

    $ conda env create -f HATK_OSX.yml          ## OS X(Mac)
    $ conda env create -f HATK_LINUX.yml        ## Linux
    

    The above command will generate a new Python virtual environment named 'HATK', which contains dependent Python packages, R and R libraries, independent to your original Python system.

    For more detailed explanation about how Anaconda manages Python virtual environment, Please check this reference(https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#create-env-file-manually).


  1. Activate the HATK environment.

    If the new virtual environment has been succuessfully installed, then activate it.

    $ conda activate HATK    # HATK will be implemented in this virtual environment.
    

(Tip) Type 'conda deactivate' if you want to go back to the previous environment. (https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#deactivating-an-environment)

(Tip) Type 'conda env remove -n HATK' if you want to remove this newly created virtual environment for HATK forever. (https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#removing-an-environment)



(3) Usage example

python HATK.py \
    --hg 18 \
    --hped example/wtccc_filtered_58C_RA.hatk.300+300.hped2 \
    --bfile example/wtccc_filtered_58C_RA.hatk.300+300.hg18.chr6.29-34mb \ # "--variants" in the v1.
    --pheno example/wtccc_filtered_58C_RA.hatk.300+300.phe \
    --pheno-name RA \
    --imgt 3320 \
    --imgt-dir example/IMGTHLA3320/ \
    --multiprocess 8 \
    --out HATKv2_wholeImple/wtccc_58C+RA.hg18.chr6.29-34mb.ALL' \
    --java-mem 4g \
    --nthreads 4

This command will implement (1) IMGT2Seq, (2) NomenCleaner, (3) bMarkerGenerator, (4) Association Test(e.g. logistic regression), (5) Manhattan Plot, (6) Heatmap Plot, (7) Phasing, and (8) Omnibus test.

You can run each module independently. The README files of each module are prepared in the 'Wiki' section of this repository. The Wiki includes more detailed explanation and usage examples. (You can find the v1 README files in 'docs/' folder, too. The Wiki is currently under construction for the v2. (2022. 10. ~))

Check which Human Genome version(i.e. hg18, 19, or 38) is being used in your study. HATK will not be responsible for misuse/mismatch of Human Genome versions. (ex. Passing hg19 genotype data to the '--bfile' and '18' to the '--hg' argument.)



(4) Version 2.0

In the version 2, HATK provides HLA fine-mapping to Non-Classical HLA genes(ex. HLA-MICA/B, -V, -E, or -G, etc.), too.

Also, For CookHLA users, HATKv2 provides a module to generate a custom reference panel that can be directly used in CookHLA(https://github.com/WansonChoi/CookHLA).

(Citation) S. Cook, W. Choi, H. Lim, Y. Luo, K. Kim, X. Jia, S. Raychaudhuri and B. Han, CookHLA: Accurate Imputation of Human Leukocyte Antigens. (https://www.nature.com/articles/s41467-021-21541-5)

For more detail, Please refer to the Wiki.



(5) Citation

HATK: HLA analysis toolkit - Wanson Choi, Yang Luo, Soumya Raychaudhuri, Buhm Han (https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btaa684/5879278)



(6) License

The HATK Software Code is freely available for non-commercial academic research use. If you would like to obtain a license to the Code for commercial use, please contact Wanson Choi (WC) at wansonchoi@snu.ac.kr and Buhm Han (BH) at buhm.han@snu.ac.kr. WE (WC and BH) MAKE NO REPRESENTATIONS OR WARRANTIES WHATSOEVER, EITHER EXPRESS OR IMPLIED, WITH RESPECT TO THE CODE PROVIDED HERE UNDER. IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE WITH RESPECT TO CODE ARE EXPRESSLY DISCLAIMED. THE CODE IS FURNISHED "AS IS" AND "WITH ALL FAULTS" AND DOWNLOADING OR USING THE CODE IS UNDERTAKEN AT YOUR OWN RISK. TO THE FULLEST EXTENT ALLOWED BY APPLICABLE LAW, IN NO EVENT SHALL WE BE LIABLE, WHETHER IN CONTRACT, TORT, WARRANTY, OR UNDER ANY STATUTE OR ON ANY OTHER BASIS FOR SPECIAL, INCIDENTAL, INDIRECT, PUNITIVE, MULTIPLE OR CONSEQUENTIAL DAMAGES SUSTAINED BY YOU OR ANY OTHER PERSON OR ENTITY ON ACCOUNT OF USE OR POSSESSION OF THE CODE, WHETHER OR NOT FORESEEABLE AND WHETHER OR NOT WE HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES, INCLUDING WITHOUT LIMITATION DAMAGES ARISING FROM OR RELATED TO LOSS OF USE, LOSS OF DATA, DOWNTIME, OR FOR LOSS OF REVENUE, PROFITS, GOODWILL, BUSINESS OR OTHER FINANCIAL LOSS.

About

A collection of modules to process and analyze IMGT-HLA sequences.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages