This is the updated version of the diHMM model (Marco et al. 2017, Nat Comm). The original package was written in MATLAB and can be accessed at gcyuan/diHMM. In this updated version, we have increased computational efficiency in two major ways: 1. Implementing the code in C++ along with a Python wrapper to increase computational speed. 2. To use an ensemble approach to aggregate information from multiple models each trained on a different sample. The details of these changes are described in (Kai et al. 2020 bioRxiv)
diHMM stands for Hierarchical Hidden Markov Model. diHMM is a computational method for finding chromatin states at multiple scales. The model takes as input a multidimensional set of histone modifications for several cell types and classifies the genome into a preselected number of nucleosome-level and domain-level hidden states.
The diHMM model was originally developed by Eugenio Marco, with assistance from Wouter Meuleman, Jialiang Huang, Luca Pinello, Manolis Kellis and Guo-Cheng Yuan. The method was originally implemented in MATLAB. The code and sample data are available at (https://github.com/gcyuan/diHMM).
In this updated version, the computational efficiency is siginificantly improved by implementing in C++. Additional improvement is achieved by using an ensemble clustering approach. The development of this newer version was led by Stephanos Tsoucas and Yan Kai with assistance from Shengbao Suo, Xuan Cao, and Guo-Cheng Yuan.
References: Marco E*, Meuleman W*, Huang J*, Glass K, Pinello L, Wang J, Kellis M†, Yuan GC†. Multi-scale chromatin state annotation using a hierarchical hidden Markov model. Nature Commun. 2017 Apr 7;8:15011. (https://www.nature.com/articles/ncomms15011)).
Kai Y, Tsoucas S, Suo S, Yuan GC. Multi-scale annotations of chromatin states in 127 human cell-types. bioRxiv. (https://www.biorxiv.org/content/10.1101/2020.12.22.424078v1).
We applied diHMM v1.0 to generate the multi-scale chromatin state annotations for the 127 human reference epigenomes in the Roadmap and ENCODE consortia. Detailed information of the information on the 127 epigenomes can be found at Roadmap Epigenomics Consortium and Kundaje et.al or this site.
We generated the chromatin state maps at the nucleosome (200bp resolution) and domain (4kb resolution) level. Here we show a snapshot of diHMM states across 127 epigenomes below.
Our multi-scale annotations can be freely downloaded from the table below. After unzipping, those maps can be directly uploaded to genome browsers (e.g IGV) for visualization. Please note that our state annotations are based on hg19 reference genome.
To see full meta information about each reference epigenome, please visit here.
Epigenome ID (EID) | Nucleosome | Domain | GROUP | Standardized Epigenome name | ANATOMY |
---|---|---|---|---|---|
E017 | download | download | IMR90 | IMR90 fetal lung fibroblasts Cell Line | LUNG |
E002 | download | download | ESC | ES-WA7 Cells | ESC |
E008 | download | download | ESC | H9 Cells | ESC |
E001 | download | download | ESC | ES-I3 Cells | ESC |
E015 | download | download | ESC | HUES6 Cells | ESC |
E014 | download | download | ESC | HUES48 Cells | ESC |
E016 | download | download | ESC | HUES64 Cells | ESC |
E003 | download | download | ESC | H1 Cells | ESC |
E024 | download | download | ESC | ES-UCSF4 Cells | ESC |
E020 | download | download | iPSC | iPS-20b Cells | IPSC |
E019 | download | download | iPSC | iPS-18 Cells | IPSC |
E018 | download | download | iPSC | iPS-15b Cells | IPSC |
E021 | download | download | iPSC | iPS DF 6.9 Cells | IPSC |
E022 | download | download | iPSC | iPS DF 19.11 Cells | IPSC |
E007 | download | download | ES-deriv | H1 Derived Neuronal Progenitor Cultured Cells | ESC_DERIVED |
E009 | download | download | ES-deriv | H9 Derived Neuronal Progenitor Cultured Cells | ESC_DERIVED |
E010 | download | download | ES-deriv | H9 Derived Neuron Cultured Cells | ESC_DERIVED |
E013 | download | download | ES-deriv | hESC Derived CD56+ Mesoderm Cultured Cells | ESC_DERIVED |
E012 | download | download | ES-deriv | hESC Derived CD56+ Ectoderm Cultured Cells | ESC_DERIVED |
E011 | download | download | ES-deriv | hESC Derived CD184+ Endoderm Cultured Cells | ESC_DERIVED |
E004 | download | download | ES-deriv | H1 BMP4 Derived Mesendoderm Cultured Cells | ESC_DERIVED |
E005 | download | download | ES-deriv | H1 BMP4 Derived Trophoblast Cultured Cells | ESC_DERIVED |
E006 | download | download | ES-deriv | H1 Derived Mesenchymal Stem Cells | ESC_DERIVED |
E062 | download | download | Blood & T-cell | Primary mononuclear cells from peripheral blood | BLOOD |
E034 | download | download | Blood & T-cell | Primary T cells from peripheral blood | BLOOD |
E045 | download | download | Blood & T-cell | Primary T cells effector/memory enriched from peripheral blood | BLOOD |
E033 | download | download | Blood & T-cell | Primary T cells from cord blood | BLOOD |
E044 | download | download | Blood & T-cell | Primary T regulatory cells from peripheral blood | BLOOD |
E043 | download | download | Blood & T-cell | Primary T helper cells from peripheral blood | BLOOD |
E039 | download | download | Blood & T-cell | Primary T helper naive cells from peripheral blood | BLOOD |
E041 | download | download | Blood & T-cell | Primary T helper cells PMA-I stimulated | BLOOD |
E042 | download | download | Blood & T-cell | Primary T helper 17 cells PMA-I stimulated | BLOOD |
E040 | download | download | Blood & T-cell | Primary T helper memory cells from peripheral blood 1 | BLOOD |
E037 | download | download | Blood & T-cell | Primary T helper memory cells from peripheral blood 2 | BLOOD |
E048 | download | download | Blood & T-cell | Primary T CD8+ memory cells from peripheral blood | BLOOD |
E038 | download | download | Blood & T-cell | Primary T helper naive cells from peripheral blood | BLOOD |
E047 | download | download | Blood & T-cell | Primary T CD8+ naive cells from peripheral blood | BLOOD |
E029 | download | download | HSC & B-cell | Primary monocytes from peripheral blood | BLOOD |
E031 | download | download | HSC & B-cell | Primary B cells from cord blood | BLOOD |
E035 | download | download | HSC & B-cell | Primary hematopoietic stem cells | BLOOD |
E051 | download | download | HSC & B-cell | Primary hematopoietic stem cells G-CSF-mobilized Male | BLOOD |
E050 | download | download | HSC & B-cell | Primary hematopoietic stem cells G-CSF-mobilized Female | BLOOD |
E036 | download | download | HSC & B-cell | Primary hematopoietic stem cells short term culture | BLOOD |
E032 | download | download | HSC & B-cell | Primary B cells from peripheral blood | BLOOD |
E046 | download | download | HSC & B-cell | Primary Natural Killer cells from peripheral blood | BLOOD |
E030 | download | download | HSC & B-cell | Primary neutrophils from peripheral blood | BLOOD |
E026 | download | download | Mesench | Bone Marrow Derived Cultured Mesenchymal Stem Cells | STROMAL_CONNECTIVE |
E049 | download | download | Mesench | Mesenchymal Stem Cell Derived Chondrocyte Cultured Cells | STROMAL_CONNECTIVE |
E025 | download | download | Mesench | Adipose Derived Mesenchymal Stem Cell Cultured Cells | FAT |
E023 | download | download | Mesench | Mesenchymal Stem Cell Derived Adipocyte Cultured Cells | FAT |
E052 | download | download | Myosat | Muscle Satellite Cultured Cells | MUSCLE |
E055 | download | download | Epithelial | Foreskin Fibroblast Primary Cells skin01 | SKIN |
E056 | download | download | Epithelial | Foreskin Fibroblast Primary Cells skin02 | SKIN |
E059 | download | download | Epithelial | Foreskin Melanocyte Primary Cells skin01 | SKIN |
E061 | download | download | Epithelial | Foreskin Melanocyte Primary Cells skin03 | SKIN |
E057 | download | download | Epithelial | Foreskin Keratinocyte Primary Cells skin02 | SKIN |
E058 | download | download | Epithelial | Foreskin Keratinocyte Primary Cells skin03 | SKIN |
E028 | download | download | Epithelial | Breast variant Human Mammary Epithelial Cells (vHMEC) | BREAST |
E027 | download | download | Epithelial | Breast Myoepithelial Primary Cells | BREAST |
E054 | download | download | Neurosph | Ganglion Eminence derived primary cultured neurospheres | BRAIN |
E053 | download | download | Neurosph | Cortex derived primary cultured neurospheres | BRAIN |
E112 | download | download | Thymus | Thymus | THYMUS |
E093 | download | download | Thymus | Fetal Thymus | THYMUS |
E071 | download | download | Brain | Brain Hippocampus Middle | BRAIN |
E074 | download | download | Brain | Brain Substantia Nigra | BRAIN |
E068 | download | download | Brain | Brain Anterior Caudate | BRAIN |
E069 | download | download | Brain | Brain Cingulate Gyrus | BRAIN |
E072 | download | download | Brain | Brain Inferior Temporal Lobe | BRAIN |
E067 | download | download | Brain | Brain Angular Gyrus | BRAIN |
E073 | download | download | Brain | Brain_Dorsolateral_Prefrontal_Cortex | BRAIN |
E070 | download | download | Brain | Brain Germinal Matrix | BRAIN |
E082 | download | download | Brain | Fetal Brain Female | BRAIN |
E081 | download | download | Brain | Fetal Brain Male | BRAIN |
E063 | download | download | Adipose | Adipose Nuclei | FAT |
E100 | download | download | Muscle | Psoas Muscle | MUSCLE |
E108 | download | download | Muscle | Skeletal Muscle Female | MUSCLE |
E107 | download | download | Muscle | Skeletal Muscle Male | MUSCLE |
E089 | download | download | Muscle | Fetal Muscle Trunk | MUSCLE |
E090 | download | download | Muscle | Fetal Muscle Leg | MUSCLE_LEG |
E083 | download | download | Heart | Fetal Heart | HEART |
E104 | download | download | Heart | Right Atrium | HEART |
E095 | download | download | Heart | Left Ventricle | HEART |
E105 | download | download | Heart | Right Ventricle | HEART |
E065 | download | download | Heart | Aorta | VASCULAR |
E078 | download | download | Sm. Muscle | Duodenum Smooth Muscle | GI_DUODENUM |
E076 | download | download | Sm. Muscle | Colon Smooth Muscle | GI_COLON |
E103 | download | download | Sm. Muscle | Rectal Smooth Muscle | GI_RECTUM |
E111 | download | download | Sm. Muscle | Stomach Smooth Muscle | GI_STOMACH |
E092 | download | download | Digestive | Fetal Stomach | GI_STOMACH |
E085 | download | download | Digestive | Fetal Intestine Small | GI_INTESTINE |
E084 | download | download | Digestive | Fetal Intestine Large | GI_INTESTINE |
E109 | download | download | Digestive | Small Intestine | GI_INTESTINE |
E106 | download | download | Digestive | Sigmoid Colon | GI_COLON |
E075 | download | download | Digestive | Colonic Mucosa | GI_COLON |
E101 | download | download | Digestive | Rectal Mucosa Donor 29 | GI_RECTUM |
E102 | download | download | Digestive | Rectal Mucosa Donor 31 | GI_RECTUM |
E110 | download | download | Digestive | Stomach Mucosa | GI_STOMACH |
E077 | download | download | Digestive | Duodenum Mucosa | GI_DUODENUM |
E079 | download | download | Digestive | Esophagus | GI_ESOPHAGUS |
E094 | download | download | Digestive | Gastric | GI_STOMACH |
E099 | download | download | Other | Placenta Amnion | PLACENTA |
E086 | download | download | Other | Fetal Kidney | KIDNEY |
E088 | download | download | Other | Fetal Lung | LUNG |
E097 | download | download | Other | Ovary | OVARY |
E087 | download | download | Other | Pancreatic Islets | PANCREAS |
E080 | download | download | Other | Fetal Adrenal Gland | ADRENAL |
E091 | download | download | Other | Placenta | PLACENTA |
E066 | download | download | Other | Liver | LIVER |
E098 | download | download | Other | Pancreas | PANCREAS |
E096 | download | download | Other | Lung | LUNG |
E113 | download | download | Other | Spleen | SPLEEN |
E114 | download | download | ENCODE2012 | A549 EtOH 0.02pct Lung Carcinoma Cell Line | LUNG |
E115 | download | download | ENCODE2012 | Dnd41 TCell Leukemia Cell Line | BLOOD |
E116 | download | download | ENCODE2012 | GM12878 Lymphoblastoid Cells | BLOOD |
E117 | download | download | ENCODE2012 | HeLa-S3 Cervical Carcinoma Cell Line | CERVIX |
E118 | download | download | ENCODE2012 | HepG2 Hepatocellular Carcinoma Cell Line | LIVER |
E119 | download | download | ENCODE2012 | HMEC Mammary Epithelial Primary Cells | BREAST |
E120 | download | download | ENCODE2012 | HSMM Skeletal Muscle Myoblasts Cells | MUSCLE |
E121 | download | download | ENCODE2012 | HSMM cell derived Skeletal Muscle Myotubes Cells | MUSCLE |
E122 | download | download | ENCODE2012 | HUVEC Umbilical Vein Endothelial Primary Cells | VASCULAR |
E123 | download | download | ENCODE2012 | K562 Leukemia Cells | BLOOD |
E124 | download | download | ENCODE2012 | Monocytes-CD14+ RO01746 Primary Cells | BLOOD |
E125 | download | download | ENCODE2012 | NH-A Astrocytes Primary Cells | BRAIN |
E126 | download | download | ENCODE2012 | NHDF-Ad Adult Dermal Fibroblast Primary Cells | SKIN |
E127 | download | download | ENCODE2012 | NHEK-Epidermal Keratinocyte Primary Cells | SKIN |
E128 | download | download | ENCODE2012 | NHLF Lung Fibroblast Primary Cells | LUNG |
E129 | download | download | ENCODE2012 | Osteoblast Primary Cells | BONE |
These maps can be freely downloaded from here.
- Create Conda Environment python version: 2.7
conda create -y -n dihmm python=2.7
conda activate dihmm
- Downlaod dihmm-cpp
git clone https://github.com/gcyuan/diHMM-cpp.git
- Install Go into the build dir and run
cd diHMM-cpp/build
cmake ..
-- The C compiler identification is GNU 9.4.0
-- The CXX compiler identification is GNU 9.4.0
-- Detecting C compiler ABI info-- Detecting C compiler ABI info - done
-- Check for working C compiler: /sc/arion/projects/YuanLab/gcproj/xuan/anaconda3/envs/dihmm/bin/x86_64-conda-linux-gnu-cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /sc/arion/projects/YuanLab/gcproj/xuan/anaconda3/envs/dihmm/bin/x86_64-conda-linux-gnu-c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Armadillo: /sc/arion/projects/YuanLab/gcproj/xuan/anaconda3/envs/dihmm/include (found version "11.2.0")
-- Found PythonLibs: /sc/arion/projects/YuanLab/gcproj/xuan/anaconda3/envs/dihmm/lib/libpython2.7.so (found suitable version "2.7.18", minimum required is "2.7")
-- Found Boost: /sc/arion/projects/YuanLab/gcproj/xuan/anaconda3/envs/dihmm/lib/cmake/Boost-1.72.0/BoostConfig.cmake (found version "1.72.0") found components: python numpy filesystem
-- Looking for sgemm_
-- Looking for sgemm_ - found
-- Found BLAS: /sc/arion/projects/YuanLab/gcproj/xuan/anaconda3/envs/dihmm/lib/libopenblas.so
-- Configuring done
-- Generating done
-- Build files have been written to: /sc/arion/projects/YuanLab/gcproj/xuan/dihmm-cpp/build
make
Consolidate compiler generated dependencies of target dihmm
[ 14%] Building CXX object CMakeFiles/dihmm.dir/Model.cpp.o
[ 28%] Building CXX object CMakeFiles/dihmm.dir/Emissions.cpp.o
[ 42%] Building CXX object CMakeFiles/dihmm.dir/Forward_Backward.cpp.o
[ 57%] Linking CXX shared library libdihmm.so
[ 71%] Built target dihmm
Consolidate compiler generated dependencies of target dihmm_ext
[ 85%] Building CXX object CMakeFiles/dihmm_ext.dir/dihmm_ext.cpp.o
[100%] Linking CXX shared library dihmm_ext.so
[100%] Built target dihmm_ext
- Install the dependency in your environment
bedtools, wigToBigWig, fetchChromSizes, bigWigToBedGraph required
conda install -c bioconda bedtools
wget https://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64.v369/wigToBigWig
wget https://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64.v369/fetchChromSizes
wget https://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64.v369/bigWigToBedGraph
- Set path
export PYTHONPATH=${your_dihmm_dir}/diHMM-cpp/build
Then you can open a Python shell in the same dir and do
>>> import dihmm_ext
Training a diHMM model can be done by using the script Train_diHMM.py, after making necessary changes to input data path and other parameters.
Including applying a diHMM model for chromatin state annotation. Annotation can be done with the script annotation.py using in Train_diHMM.py.
python dihmm-cpp/Train_diHMM.py -h
usage: Train_diHMM.py [-h] -i INPUT_DIR --clusters CLUSTERS --chroms CHROMS
-o OUT_DIR [--n_bin_states N_BIN_STATES]
[--n_domain_states N_DOMAIN_STATES]
[--domain_size DOMAIN_SIZE] [--tolerance TOLERANCE]
[--max_iter MAX_ITER] [--bin_res BIN_RES]
Train diHMM runner.
optional arguments:
-h, --help show this help message and exit
-i INPUT_DIR The input binarized files dir. File name:
X1_chr1_binary.txt.
--clusters CLUSTERS Clusters/cell_types names used to train model.
Example: X1,X2 .
--chroms CHROMS chrs used to train model. Example: chr1,chr2 .
-o OUT_DIR Output dir.
--n_bin_states N_BIN_STATES
Number of bin states. Default=2.
--n_domain_states N_DOMAIN_STATES
Number of domain states. Default=4.
--domain_size DOMAIN_SIZE
Number of bins to merge as domain. Default=8.
--tolerance TOLERANCE
Tolerance. Default=1e-6.
--max_iter MAX_ITER Max iter number. Default=500.
--bin_res BIN_RES bin length used to generate binarized files.
Default=500.
Here is the tutorial for applying diHMM-cpp for H3K4me3 in hESC H1 Cells.