A Research-oriented Federated Learning Library and Benchmark Platform for Graph Neural Networks. Accepted to ICLR-DPML and MLSys21 - GNNSys'21 workshops.
After git clone
-ing this repository, please run the following command to install our dependencies.
conda create -n fedgraphnn python=3.8.3
conda activate fedgraphnn
bash install.sh
- Graph - level
- MoleculeNet -> We provide preprocessed versions of MoleculeNet datasets. To use datasets,first run
bash download_and_unzip.sh
located under each dataset folder indata/moleculenet
- Social Networks -> We use PyTorch Geometric datasets for our social network datasets. For details, please see this link
- MoleculeNet -> We provide preprocessed versions of MoleculeNet datasets. To use datasets,first run
- Sub-graph Level
- Knowledge Graphs -> Please first run bash file inside
data/subgraph-level
- Recommendation Systems -> We provide preprocessed versions of Ciao and Epinions
- Knowledge Graphs -> Please first run bash file inside
- Node-level
- Coaauthor & Citation Networks (Ego Networks) -> Details
- Graph Level
- MoleculeNet Centralized Experiments Federated Experiments
- Social Networks Federated Experiments
- Sub-graph Level
- Recommendation Systems Federated Experiments
- Node Level
- Ego Networks (Citation & Coauthor Networks) Federated Experiments
Our framework supports PyTorch and PyTorch Geometric based models. To do so,
- Create a Pytorch/PyG based model and place it under model folder
- Prepare a trainer module (example) by inheriting the base class in
FedML/fedml-core/trainer/fedavg_trainer.py
. - Prepare an experiment file similar to files in
experiments/
folder.
If it is a PyTorch Geometric dataset, please see this link
Otherwise, do the following:
- Create new folder under
data_preprocessing
folder and re-definedata_preprocessing/data_loader.py
based on your new dataset. - Rewrite
data_loader.py
file underdata_preprocessing
folder
Splits and Non-I.I.D.'ness methods are located under data_preprocessing
library. By default, we provide I.I.D. and non-I.I.D. sampling(create_non_uniform_split.py
, Dirichlet distribution sampling) based on sample size of the dataset.
To create custom splitting method based on the sample size, you can create a new function or modify create_non_uniform_split.py
function.
-
FedML
: A soft repository link generated usinggit submodule add https://github.com/FedML-AI/FedML
. -
data
: Provide data downloading scripts and store the downloaded datasets. Note that inFedML/data
, there also exists datasets for research, but these datasets are used for evaluating federated optimizers (e.g., FedAvg) and platforms. FedGraphNN supports more advanced datasets and models for federated training of graph neural networks. Currently, we have molecular machine learning datasets. -
data_preprocessing
: Domain-specific PyTorch/PyG Data Loaders for centralized and distributed training. -
model
: GNN models written in Pytorch/PyG. -
trainer
: please define your owntrainer.py
by inheriting the base class inFedML/fedml-core/trainer/fedavg_trainer.py
. Some tasks can share the same trainer. -
experiments/distributed
:
experiments
is the entry point for training. It contains experiments in different platforms. We start fromdistributed
.- Every experiment integrates FOUR building blocks
FedML
(federated optimizers),data_preprocessing
,model
,trainer
. - To develop new experiments, please refer the code at
experiments/distributed/text-classification
.
experiments/centralized
:
- please provide centralized training script in this directory.
- This is used to get the reference model accuracy for FL.
- You may need to accelerate your training through distributed training on multi-GPUs and multi-machines. Please refer the code at
experiments/centralized/DDP_demo
.
cd FedML
git checkout master && git pull
cd ..
git add FedML
git commit -m "updating submodule FedML to latest"
git push
Please cite our FedML and FedGraphNN papers if it helps your research. You can describe us in your paper like this: "We develop our experiments based on FedML".
@misc{he2021fedgraphnn,
title={FedGraphNN: A Federated Learning System and Benchmark for Graph Neural Networks},
author={Chaoyang He and Keshav Balasubramanian and Emir Ceyani and Carl Yang and Han Xie and Lichao Sun and Lifang He and Liangwei Yang and Philip S. Yu and Yu Rong and Peilin Zhao and Junzhou Huang and Murali Annavaram and Salman Avestimehr},
year={2021},
eprint={2104.07145},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
@misc{he2020fedml,
title={FedML: A Research Library and Benchmark for Federated Machine Learning},
author={Chaoyang He and Songze Li and Jinhyun So and Xiao Zeng and Mi Zhang and Hongyi Wang and Xiaoyang Wang and Praneeth Vepakomma and Abhishek Singh and Hang Qiu and Xinghua Zhu and Jianzong Wang and Li Shen and Peilin Zhao and Yan Kang and Yang Liu and Ramesh Raskar and Qiang Yang and Murali Annavaram and Salman Avestimehr},
year={2020},
eprint={2007.13518},
archivePrefix={arXiv},
primaryClass={cs.LG}
}