- Clone the repository and move into the directory:
git clone https://github.com/BioSystemsUM/deepmol_case_studies.git
cd deepmol_case_studies
- Create a conda environment and activate it:
conda create -n deepmol_case_studies python=3.10
conda activate deepmol_case_studies
- Install the dependencies:
pip install -r requirements.txt
pip install --no-deps deepmol[all]==1.1.7
- Install the package:
pip install .
AutoML experiments can be found in here.
We used podman/docker for the experiments, the Dockerfile can be found in this repository.
The "run" file can be found in here.
AutoML experiments for comparison between DeepMol and QSARTuna can be found in here.
We used podman/docker for the experiments, the Dockerfile can be found in this repository.
The "run_automl_benchmark" file can be found in here.
The scripts to evaluate computational resources and runtimes of each method in DeepMol are here.
All the dataframes with this information are available here.
To train and evaluate DeepMol models:
from dcs.evaluation import get_results
results = get_results(tdc_dataset_name="Bioavailability_Ma", pipeline="bioavailability_optimal")
The tdc_dataset_name parameter is used to download the TDC commons benchmark datasets. Available datasets:
- "AMES"
- "BBB_Martins"
- "Bioavailability_Ma"
- "Caco2_Wang"
- "Clearance_Hepatocyte_AZ"
- "Clearance_Microsome_AZ"
- "HIA_Hou"
- "Pgp_Broccatelli"
- "Solubility_AqSolDB"
- "Lipophilicity_AstraZeneca"
- "VDss_Lombardo"
- "CYP2C9_Veith"
- "CYP2D6_Veith"
- "CYP3A4_Veith"
- "CYP2C9_Substrate_CarbonMangels"
- "CYP2D6_Substrate_CarbonMangels"
- "CYP3A4_Substrate_CarbonMangels"
- "DILI"
- "Half_Life_Obach"
- "hERG"
- "LD50_Zhu"
- "PPBR_AZ"
While the pipeline parameter is for internal pipeline loading. The pipelines are listed according to the paper, where there are the pipelines created based on the first AutoML experiment and the ones that were further optimized (optimal). Available pipelines:
- "ames"
- "bbb"
- "bioavailability"
- "bioavailability_optimal"
- "caco"
- "clearance_hepatocyte"
- "clearance_microsome"
- "hia"
- "pgp"
- "solubility"
- "lipophilicity"
- "lipophilicity_optimal"
- "vdss"
- "cyp2c9"
- "cyp2d6"
- "cyp3a4"
- "cyp2c9_substrate"
- "cyp2d6_substrate"
- "cyp3a4_substrate"
- "dili"
- "half_life"
- "herg"
- "hia"
- "ld50"
- "ppbr"
If intended, the default pipelines (all but the optimal) for each dataset can be called as follows:
from dcs.evaluation import get_results
results = get_results(tdc_dataset_name="Bioavailability_Ma")