We design and create a framework for benchmarking and comparing Graph Neural Network (GNN) architectures implemented in a robust and reproducible way using the scientific workflow system Nextflow, popular with computational biologists. We include support for nine different GNN architectures on binary node classification tasks. To demonstrate the versatility of our framework, we consider a task of significant biological importance - that of identifying cancer-driver genes (CDG) in a protein-protein interaction (PPI) network. Data was sourced from the Pan-Cancer Analysis of Whole Genomes (PCAWG), the Pathway Indicated Drivers (PID), the COSMIC Cancer Gene Census (COSMIC-CGC), and STRING and BioGRID PPI databases. On this task, GNNs were able to effectively make use of the network structure of the data. Nevertheless, different architectures performed remarkably similar, emphasising the importance of the quality of the training data for such tasks. We make our pipeline publically available to enable other researchers to perform similar investigations into other areas of computational biology. We believe that this will lead to improved benchmarking standards in the GNN literature.
The following models are included:
- Graph Convolutional Networks (GCN)
- Graph Attention Networks (GAT)
- Hierarchical Graph Convolutional Networks (HGCN)
- Parallel Hierarchical Graph Convolutional Networks (PHGCN)
- Graph SAmpling and aggreGatE (GraphSAGE)
- Graph Transformer Networks (GTN)
- Graph Isomorphism Networks (GIN)
- Graph Convolutional Networks II (GCNII)
nextflow pull stracquadaniolab/gnn-suite
nextflow run stracquadaniolab/gnn-suite -profile docker,test
nextflow run stracquadaniolab/gnn-suite -profile docker,<experiment_file>
The results of the experimetn will be stored in the results/data/<experiment_file>/
and results/figures/<experiment_file>/
directory.
For more information on Nextflow
, you can visit the official documentation at nextflow.io/docs.
View the gnn-suite
Docker image on GitHub Container Registry, you can also download it using:
docker pull ghcr.io/stracquadaniolab/gnn-suite:latest
-
Create a Config File: Create a new configuration file
<experiment_file>.config
with the parameters for the experiment:// profile to test the string workflow params { resultsDir = "${baseDir}/results/" networkFile = "${baseDir}/data/<network_file>.tsv" geneFile = "${baseDir}/data/<feature_file>.csv" epochs = [300] models = ["gcn2", "gcn", "gat", "gat3h", "hgcn", "phgcn", "sage", "gin", "gtn"] replicates = 10 verbose_interval = 1 dropout = 0.2 alpha = 0.1 theta = 1 dataSet = "<experiment_file_tag>" }
-
Update
base.config
: Add a new profile for your experiment inbase.config
:profiles { // existing profiles... // test profile for the biogrid cosmic network defining some data <config_file> { includeConfig '<experiment_file>.config' } }
-
Run the Experiment: Execute the pipeline with the new profile using:
nextflow run main.nf -profile docker, <experiment_file>
or
nextflow run stracquadaniolab/gnn-suite -profile docker,<experiment_file>
-
Create Model: Implement the new model class in
models.py
:class NewModel(torch.nn.Module): def __init__(self, num_features, num_classes, num_hidden=16, num_layers=2, dropout=0.5): super(NewModel, self).__init__() # Define layers def forward(self, data): # Define forward pass
-
Import Model: Add your model to the imports in
gnn.py
:from models import GCN, GAT, ..., NewModel
-
Update
build_model
: Add your model to thebuild_model
function ingnn.py
:elif name == "new_model": return NewModel(num_features, num_classes, dropout=dropout)
-
Include in Experiment: Add the new model name to the
models
list in your experiment config (<experiment_file>.config
):models = ["gcn", "gat", ..., "new_model"]
To run the hyperparameter optimization workflow using optuna
defined in hyperopt.py
, run the hyperparameter optimization workflow:
nextflow run main.nf -profile docker,<experiment_file> -entry hyperopt
The results of the search will be stored in the results/hyperparameters/<experiment_file>/
directory. You can find the best trial information in the best_trial_<model>_<experiment>.txt
file.
For more information on optuna
, you can visit the official documentation at https://optuna.readthedocs.io.
If you encounter the following error message when attempting to execute the script:
Command error:
.command.sh: line 2: ../gnn-suite/bin/plot.py: Permission denied
You need to grant the necessary execution permissions to the specific python scripts. You can do this by running (e.g. plot.py
):
chmod +x /home/<path>/code/gnn-suite/bin/plot.py
- Forthcoming
- Sebestyén Kamp
- Ian Simpson
- Giovanni Stracquadanio