DRPLLM: Drug Response Prediction Large Language Model

DRPLLM (Drug Response Prediction Large Language Model) is an innovative framework leveraging the power of large language models (LLMs) to predict drug responses in cancer using multi-omic data. By encoding complex biological and chemical relationships into natural language prompts, DRPLLM harnesses the robust feature extraction capabilities of LLMs, particularly Llama-3 with its 8 billion parameters, to offer enhanced predictive performance over traditional models. This approach has the potential to significantly advance personalized medicine and streamline drug discovery processes.

Project Overview

The DRPLLM framework integrates various types of data—including genomic, transcriptomic, and drug properties—into a cohesive model that predicts how cancer cells respond to different treatments. This method not only captures the intrinsic complexity of tumor biology but also adapts to the unique challenges posed by therapies such as monoclonal antibodies, where specific compound information may be limited.

Methodology

The DRPLLM architecture involves several key steps:

Data Integration: Multi-omic features along with drug compound characteristics are collated.
Prompt Engineering: These features are transformed into carefully designed prompts that simulate a natural language understanding task for the LLM.
Embedding Extraction: Llama-3 processes these prompts, and embeddings from the last hidden layer are extracted as rich, nuanced features for drug response prediction.
Model Training: The embeddings are then used as inputs to train four types of regression models—Linear Regression, Multi-Layer Perceptron (MLP), XGBoost, and a Deep Neural Network (DNN).
Evaluation: The models are evaluated based on their ability to predict drug responses, using metrics like the Spearman rank correlation coefficient (SCC) and area under the curve (AUC) from datasets such as CCLE and GDSCv2.

Results

Initial results have demonstrated that DRPLLM can effectively predict drug responses with a high degree of accuracy. The DNN model, in particular, showed superior performance, suggesting that deep learning combined with LLM-derived features can provide significant advantages in predictive accuracy and generalizability.

Contributions and Feedback

This project is the result of a collaborative effort during a two-day hackathon at Argonne National Laboratory, Illinois.

Setup and Usage

# Clone the repository
git clone https://github.com/dimi-lab/drpllm.git
cd drpllm

# Install dependencies
pip install -r requirements.txt

# generate embeddings
python create_llama_emb_multigpu.py

# Evaluate the models
python DDRPM_model.py

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
figure		figure
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DRPLLM: Drug Response Prediction Large Language Model

Project Overview

Methodology

Results

Contributions and Feedback

Setup and Usage

About

Releases

Packages

Languages

dimi-lab/drpllm

Folders and files

Latest commit

History

Repository files navigation

DRPLLM: Drug Response Prediction Large Language Model

Project Overview

Methodology

Results

Contributions and Feedback

Setup and Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages