Supplement of code for "ReLMole: Molecular Representation Learning based on Two-Level Graph Similarities"
We implement our model on Python 3.7
. These packages are mainly used:
torch 1.7.1
torch-cluster 1.5.8
torch-geometric 1.6.3
torch-scatter 2.0.5
torch-sparse 0.6.8
torch-spline-conv 1.2.0
numpy 1.19.5
scikit-learn 0.24.1
rdkit 2020.09.1.0
deepchem 2.5.0
We use 250k "lead-like" compounds from ZINC15, which is available in DeepChem package. You get get it by calling deepchem.molnet.load_zinc15()
function. We convert the dataset into text format and saved it into data/ZINC15/zinc15_250k
.
We load MoleculeNet datasets using DeepChem package.
We download the DDI datasets from CASTER. The splitting results used in ReLMole are available in directory data/DDI
Run gen_fg_corpus.py
to generate FG corpus and the corpus file will be saved into data/ZINC15/fg_corpus.txt
.
Run sim_cdf.py
to sample molecule pairs from the pre-training dataset and plot the CDF curve of two-level similarities. The figures will be saved in directory data/ZINC15
.
Run pretrain_cl.py
to pre-train ReLMole and the pre-trained model will be saved in directory pretrained_model_cl_zinc15_250k
.
For the molecular property prediction task, run task_property/run_${dataset}.py
to fine-tune the pre-trained model.
For the DDI prediction task, run task_ddi/run_ddi.py
to fine-tune the pre-trained model.
We apply fine-tuned models for each dataset in directory finetuned
.