The msigdbr R package provides Molecular Signatures Database (MSigDB) gene sets typically used with the Gene Set Enrichment Analysis (GSEA) software:
- in an R-friendly "tidy" format with one gene pair per row
- for multiple frequently studied model organisms, such as mouse, rat, pig, zebrafish, fly, and yeast, in addition to the original human genes
- as gene symbols as well as NCBI Entrez and Ensembl IDs
- without accessing external resources and requiring an active internet connection
The package can be installed from CRAN.
install.packages("msigdbr")
Recent releases are not available on CRAN and can be installed from GitHub (specific version can be specified):
remotes::install_github("igordot/msigdbr", ref = "v2022.1.1")
The package data can be accessed using the msigdbr()
function, which returns a data frame of gene sets and their member genes. For example, you can retrieve mouse genes from the C2 (curated) CGP (chemical and genetic perturbations) gene sets.
library(msigdbr)
genesets = msigdbr(species = "mouse", category = "C2", subcategory = "CGP")
Check the documentation website for more information.