DEbPeak
aims to explore, visualize, interpret multi-omics data and unravel the regulation of gene expression by combining RNA-seq with peak-related data (eg: ChIP-seq, ATAC-seq, m6a-seq et al.). It contains eleven functional modules:
- Parse GEO: Extract study information, raw count matrix and metadata from GEO database.
- Quality Control (QC): QC on count matrix and samples.
- QC on count matrix: Proportion of genes detected in different samples under different CPM thresholds and the saturation of the number of genes detected.
- QC on samples: Euclidean distance and pearson correlation coefficient of samples across different conditions, sample similarity on selected principal components (check batch information and conduct batch correction) and outlier detection with robust PCA.
- Principal Component Analysis (PCA): this module can be divided into three sub modules, basic info, loading related and 3D visualization.
- Basic info: scree plot (help to select the useful PCs), biplot (sample similarity with corresponding genes with larger loadings) and PC pairs plot (sample similarity under different PC combinations).
- Loading related: visualize genes with larger positive and negative loadings on selected PCs, conduct GO enrichment analysis on genes with larger positive and negative loadings on selected PCs.
- 3D visualization: visualize samples on three selected PCs.
- Differential Analysis and Visualization: this module includes seven powerful visualization methods (Volcano Plot, Scatter Plot, MA Plot, Rank Plot, Gene/Peak Plot, Heatmap, Pie Plot for peak-related data).
- Functional Enrichment Analysis (FEA): GO enrichment analysis, KEGG enrichment analysis, Gene Set Enrichment Analysis (GSEA).
- GO (Biological Process, Molecular Function, Cellular Component) and KEGG on differential expression genes or accessible/binding peaks.
- GSEA on all genes (Notice: GSEA is not available for peak-related data)
- Predict transcription factors (PredictTFs): Identify transcription factors with differentially expressed genes,
DEbPeak
provides three methods (BART, ChEA3 and TFEA.ChIP). - Motif analysis:
- de novo motif discovery
- motif enrichment
- Integrate RNA-seq with peak-related data:
- Get consensus peaks: For multiple peak files, get consensus peaks; for single peak file, use it directly (used in consensus integration mode).
- Peak profile plots: Heatmap of peak binding to TSS regions, Average Profile of ChIP peaks binding to TSS region, Profile of ChIP peaks binding to different regions (used in consensus integration mode).
- Peak annotaion (used in consensus integration mode).
- Integrate RNA-seq with peak-related data (consensus mode): Integrate RNA-seq with peak-related data to find direct targets, including up-regulated and down-regulated.
- Integrate RNA-seq with peak-related data (differential mode): Integrate RNA-seq and peak-related data based on differential analysis.
- Integration summary: include venn diagram and quadrant diagram (differential mode).
- GO enrichment on integrated results.
- Find motif on integrated results: Due to the nature of ATAC-seq, we usually need to find motif on integrated results to obtain potential regulatory factors.
- Integrate RNA-seq with RNA-seq:
- Integration summary: include venn diagram and quadrant diagram.
- GO enrichment on integrated results.
- Integrate peak-related data with peak-related data:
- Integration summary: include venn diagram and quadrant diagram (differential mode).
- GO enrichment on integrated results.
- Utils: useful functions, including creating enrichment plot for selected enrichment terms, gene ID conversion and count normalization(DESeq2’s median of ratios, TMM, CPM, TPM, RPKM).
To enhance the ease of use of the tool, we have also developed an web server for DEbPeak
that allows users to submit files to the web page and set parameters to get the desired results. Unlike the standalone R package, the web server has built-in DESeq2
for differential analysis, while the R package can accept user input results from DESeq2
or edgeR
, which will be more flexible.
By the way, all plots generated are publication-ready , and most of them are based on ggplot2
, so that users can easily modify them according to their needs. We also provide various color palettes, including discrete and continuous, color blind friendly and multiple categorical variables.
You can install the package via the Github repository:
# install.packages("devtools") #In case you have not installed it.
# install prerequisites for enrichplot and ChIPseeker
devtools::install_version("ggfun", version = "0.0.6", repos = "https://cran.r-project.org")
devtools::install_version("aplot", version = "0.1.6", repos = "https://cran.r-project.org")
devtools::install_version("scatterpie", version = "0.1.7", repos = "https://cran.r-project.org")
# For mac, you may need to install xquartz: brew install --cask xquartz
# install DEbPeak
devtools::install_github("showteeth/DEbPeak")
In general, it is recommended to install from Github repository (update more timely).
For other issues about installation, please refer Installation guide.
Install additional tools:
# install MSPC --- consensus peak
wget --quiet https://github.com/Genometric/MSPC/releases/latest/download/linux-x64.zip -O MSPC_linux_x64.zip && unzip -q MSPC_linux_x64.zip -d mspc && cd mspc && chmod +x mspc
# install meme --- motif anaysis
## install from source
cd /opt && wget --quiet https://meme-suite.org/meme/meme-software/5.5.5/meme-5.5.5.tar.gz -O meme-5.5.5.tar.gz && tar -zxf meme-5.5.5.tar.gz && cd meme-5.5.5 && ./configure --prefix=`pwd`/meme-5.5.5/meme --enable-build-libxml2 --enable-build-libxslt && make && make install
## install from conda: conda install -c bioconda meme
# install homer --- motif enrichment
## install from source
mkdir homer && cd homer && wget --quiet http://homer.ucsd.edu/homer/configureHomer.pl -O configureHomer.pl && chmod +x configureHomer.pl && perl configureHomer.pl -install
## install from conda: conda install -c bioconda homer
## Downloading Homer Packages: http://homer.ucsd.edu/homer/introduction/install.html
# install deeptools and bart
pip install deeptools numpy pandas scipy tables scikit-learn matplotlib
wget --quiet https://virginia.box.com/shared/static/031noe820hk888qzcxvw1cazol1gdhi0.gz -O bart_v2.0.tar.gz && tar -zxf bart_v2.0.tar.gz
## Download the resources and setup the configuration file
## https://zanglab.github.io/bart/index.htm#install
We also provide a docker image to use:
# pull the image
docker pull soyabean/debpeak:1.2
# run the image
docker run --rm -p 8888:8787 -e PASSWORD=passwd -e ROOT=TRUE -it soyabean/debpeak:1.2
Notes:
- After running the above codes, open browser and enter
http://localhost:8888/
, the user name isrstudio
, the password ispasswd
(set by-e PASSWORD=passwd
) - If port
8888
is in use, change-p 8888:8787
- The
meme suit
path:/opt/meme-5.5.5/meme/bin
. - The
homer suit
path:/opt/homer/bin
. - The
configureHomer.pl
path:/opt/homer
. - The
bart
path:/opt/bart_v2.0/bin
- You still need to download the resources and setup the configuration file for bart and download species packages for homer.
Detailed usage is available in here. We divide these vignettes into four categories:
-
For parse GEO:
-
For analyzing RNA-seq:
-
For analyzing peak-related data:
-
Integrating RNA-seq (differential expression analysis) with peak-related data (consensus peak):
-
Integrating RNA-seq (differential expression analysis) with peak-related data (differential accessible/binding analysis):
Type | Function | Description | Key packages |
---|---|---|---|
Parse GEO | ParseGEO | Extract study information, raw count matrix and metadata from GEO database | GEOquery |
Quality Control | CountQC | Quality control on count matrix (gene detection sensitivity and sequencing depth saturation) | NOISeq |
SampleRelation | Quality control on samples (sample clustering based on euclidean distance and pearson correlation coefficient) | stats | |
OutlierDetection | Detect outlier with robust PCA | rrcov | |
QCPCA | PCA related functions used in quality control (batch detection and correction, outlier detection) | stats, sva, rrcov | |
Principal Component Analysis | PCA | Conduct principal component analysis | stats |
PCABasic | Generated PCA baisc plots, including screen plot, biplot and pairs plot | PCAtools | |
ExportPCGenes | Export genes of selected PCs | tidyverse | |
LoadingPlot | PCA loading plot, including bar plot and heatmap | ggplot2, ComplexHeatmap | |
LoadingGO | GO enrichment on PC’s loading genes | clusterProfiler | |
PCA3D | Create 3D PCA plot | plot3D | |
Differential Analysis | ExtractDA | Extract differential analysis results | tidyverse |
VolcanoPlot | VolcanoPlot for differential analysis results | ggplot2 | |
ScatterPlot | ScatterPlot for differential analysis results | ggplot2 | |
MAPlot | MA-plot for differential analysis results | ggplot2 | |
RankPlot | Rank plot for differential analysis results | ggplot2 | |
GenePlot | Gene expresion or peak accessibility/binding plot | ggplot2 | |
DEHeatmap | Heatmap for differential analysis results | ComplexHeatmap | |
DiffPeakPie | Stat genomic regions of differential peaks with pie plot | ggpie | |
ConductDESeq2 | Conduct differential analysis with DESeq2 | NOISeq, stats, sva, rrcov, PCAtools, DESeq2, ggplot2, ComplexHeatmap, clusterProfiler, plot3D, tidyverse | |
Functional Enrichment Analysis | ConductFE | Conduct functional enrichment analysis (GO and KEGG) | clusterProfiler |
ConductGSEA | Conduct gene set enrichment analysis (GSEA) | clusterProfiler | |
VisGSEA | Visualize GSEA results | enrichplot | |
Predict Transcription Factors | InferRegulator | Predict TFs from RNA-seq data with ChEA3, BART2 and TFEA.ChIP | ChEA3, BART2, TFEA.ChIP |
VizRegulator | Visualize the Identified TFs | ggplot2 | |
Motif Analysis | MotifEnrich | Motif enrichment for differentially accessible/binding peaks | HOMER |
MotifDiscovery | de novo motif discovery with STREME | MEME | |
MotifCompare | Map motifs against a motif database with Tomtom | MEME | |
Peak-related Analysis | PeakMatrix | Prepare count matrix and sample metadata for peak-related data | DiffBind, ChIPseeker |
GetConsensusPeak | Get consensus peak from replicates | MSPC | |
PeakProfile | Visualize peak accessibility/binding profile | ChIPseeker | |
AnnoPeak | Assign peaks with the genomic binding region and nearby genes | ChIPseeker | |
PeakAnnoPie | Visualize peak annotation results with pie plot | ggpie | |
Integrate RNA-seq with Peak-related Data | DEbPeak | Integrate differential expression results and peak annotation/differential analysis results. | tidyverse |
DEbPeakFE | GO enrichment on integrated results | clusterProfiler | |
DEbCA | Integrate differential expression results and peak annotation results (two kinds of peak-related data) | tidyverse | |
ProcessEnhancer | Get genes near differential peaks | IRanges | |
InteVenn | Create a Venn diagram for integrated results (support DEbPeak, DEbDE, PeakbPeak) | ggvenn | |
InteDiffQuad | Create quadrant diagram for differential expression analysis of RNA-seq and peak-related data | ggplot2 | |
NetViz | Visualize enhancer-gene network results | igraph, ggnetwork | |
FindMotif | Find motif on integrated results | HOMER | |
Integrate RNA-seq with RNA-seq | DEbDE | Integrate Two Differential Expression Results | tidyverse |
DEbDEFE | GO Enrichment on Two Differential Expression Integration Results. | clusterProfiler | |
Integrate Peak-related Data with Peak-related Data | PeakbPeak | Integrate Two Peak Annotation/Differential Analysis Results. | tidyverse |
PeakbPeakFE | GO Enrichment on Two Peak Annotation/Differential Analysis Integration Results. | clusterProfiler | |
Utils |
EnrichPlot | Create a bar or dot plot for selected functional enrichment analysis results (GO and KEGG) | ggplot2 |
IDConversion | Gene ID conversion between ENSEMBL ENTREZID SYMBOL | clusterProfiler | |
GetGeneLength | Get gene length from GTF | GenomicFeatures, GenomicRanges | |
NormalizedCount | Perform counts normalization (DESeq2’s median of ratios, TMM, CPM, RPKM, TPM) | DESeq2, edgeR, tidyverse |
- The KEGG API has changed, to perform KEGG enrichment, you'd better update
clusterProfiler
>=4.7.1
.
For any question, feature request or bug report please write an email to songyb0519@gmail.com.
Please note that the DEbPeak project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.