Skip to content

UserGuide: GO Enrichment Analysis

S.Alves edited this page Sep 2, 2021 · 12 revisions

GO Enrichment Analysis

We uses the GOs annotations file to perform enrichment analysis on differentially expressed gene. For this, you define several parameters:

1) for enrichment

  • parameters$GO_threshold  the significant threshold used to filter p-values
  • parameters$GO_min_num_genes  the minimum number of genes for each GO terms in the genome
  • parameters$GO  gene set chosen for analysis "up", "down", "both" (up+down)
  • parameters$GO_algo  algorithms for runTest function ("classic", "elim", "weight", "weight01", "lea", "parentchild")
  • parameters$GO_stats  statistical tests for runTest function ("fisher", "ks", "t", "globaltest", "sum", "ks.ties")

2) for visualization

  • parameters$Ratio_threshold  the min enrichment ratio to display GO in graph
  • parameters$GO_max_top_terms  the maximum number of GO terms plot for each GO category
  • parameters$GO_min_sig_genes  the minimum number of significant gene(s) behind the enriched GO-term to display GO in graph

If provided by the user, genes are linked to Gene Ontology (GO) annotations in the GO annotation file (see Input files description section) making GO enrichment analysis possible. Each gene can be annotated with several terms which define biological pathways in which the genes are involved. GO terms are classified into 3 categories (MF describing the molecular activity of a gene, BP describing a broader biological process in which the gene is involved in coordination with other genes, and CC describing the cellular location in which the gene performs its function).

GO enrichment is automatically performed using the topGO package on the DE genes (up, down, or both, depending on the parameters$GO) of each contrast of the experiment. The developed GOenrichment function generates numerous tables and plots.

Be careful: As enrichment tests are based on proportions, the lower the number of genes, the less reliable the test is. The interpretation of the enrichment is then hazardous. Do not draw hasty conclusions for gene lists of less than 100 genes.

The commands for running GO enrichment analysis are:

# Parameters for GO enrichment
parameters$GO_threshold = 0.05
parameters$GO_min_num_genes = 10
parameters$GO = "both" 
parameters$GO_algo = "weight01" 
parameters$GO_stats = "fisher" 

# Parameters for GO enrichment graphs
parameters$Ratio_threshold = 1
parameters$GO_max_top_terms = 10
parameters$GO_min_sig_genes = 5  

# run analysis
GOenrichment(resDEG, data, parameters)

A "DEG_test/GOenrichment/OnContrasts/" directory will be created with all GO images and tables of statistics.

Graphical outputs

Global graphs

Global graphs are created, for each contrast, highlighting either the p-value or the enrichment ratio in the 3 GO categories.

Example of global graphs:

Pvalue_BUBBLESgraph Ratio_BUBBLESgraph

Detailed graphs for each GO category

To satisfy the maximum number of users, individualized graphs by GO category are also created, regrouping the p-value and enrichment ratio information.

Example of detailed graphs:

BP_GOgraph CC_GOgraph MF_GOgraph


Tabular outputs

Global ouput

Example of one statistical table:

GO.ID Term Annotated Significant Expected statisticTest Ratio GO_cat
GO:0003735 structural constituent of ribosome 135 51 31.31 0.000086 1.628873 MF
GO:0004812 aminoacyl-tRNA ligase activity 51 24 11.83 0.000150 2.028741 MF
GO:0019843 rRNA binding 23 13 5.33 0.000570 2.439024 MF
...

Explications of some columns:

  • Annotated: number of genes in your genome (the gene universe) annotated with the GO-term.
  • Significant: number of genes in the list annotated with the GO-term.
  • Expected: number of genes expected in the list if the proportion of the genes in the list was quite equal to its proportion in the gene universe (meaning no enrichment)
  • statisticTest: p-value of the statistic test chosen

Tabular outputs for each enriched GO

Sub-directories are created in the "DEG_test/GOenrihcment/OnContrasts/" directory for each contrast in which the user can find many tables resuming informations on the genes behind each enriched GO-term (name of the genes, their description if an annotation file is provided, their DE status in all contrasts, and their normalized expression in CPM in all experimental conditions).

Example:

Gene Gene_description AC1vsAC2 AC1vsAC3 AC2vsAC3 BC1vsBC2 BC1vsBC3 BC2vsBC3 AC1vsBC1 AC2vsBC2 AC3vsBC3 AC1 AC2 AC3 BC1 BC2 BC3 GO_ID GO_term GO_cat
Gene_002357 phenylalanine-trna ligase beta subunit 1 0 0 0 0 0 0 -1 -1 266.349500 210.51620 221.547300 254.519300 266.1183000 300.3326000 GO:0000162 tryptophan biosynthetic process BP
Gene_002384 tyrosine-trna ligase 1 0 0 0 0 0 0 -1 0 64.172870 51.79736 51.085950 60.415040 71.3212400 61.5877900 GO:0000162 tryptophan biosynthetic process BP
Gene_003773 tyrosyl-trna synthetase 1 0 0 0 0 0 0 -1 -1 375.692400 300.71430 306.424000 333.957000 380.2730000 402.9832000 GO:0000162 tryptophan biosynthetic process BP
...