The official implementation of DNAGPT: A Generalized Pre-trained Tool for Versatile DNA Sequence Analysis Tasks.
git clone https://github.com/TencentAILabHealthcare/DNAGPT.git
You can download the weights from
and save model weights to checkpoint dir
cd DNAGPT/checkpoints
# download or copy model weight to this default directory
- dna_gpt0.1b_h.pth: DNAGPT 0.1B params model pretrained with human genomes
- dna_gpt0.1b_m.pth: DNAGPT 0.1B params model pretrained with mutli-organism genomes
- dna_gpt3b_m.pth: DNAGPT 3B params model pretrained with mutli-organism genomes
- regression.pth: Human RNA experssion level regression model
- classification.pth: Human AATAAA GSR classification model
- python >= 3.8
cd DNAGPT
pip install -r requirements.txt
python test.py --task=<task type> --input=<your dna data> --weight=<path to the pre-trained weight> --name=<the model you want to use> --num_samples=<number of samples seq>
go to directory "scripts" for more test examples.
# gpt 0.1b human genomes model
python test.py --task 'generation' --input '<R>AGAGAAAAGAGT' --name 'dna_gpt0.1b_h' --weight 'checkpoints/dna_gpt0.1b_h.pth' --num_samples 10 --max_len 256
# gpt 0.1b multi-organism model
python test.py --task 'generation' --input '<R>AGAGAAAAGAGT' --name 'dna_gpt0.1b_m' --weight 'checkpoints/dna_gpt0.1b_m.pth' --num_samples 10 --max_len 256
# gpt 3b multi-organism model
python test.py --task 'generation' --input '<R>AGAGAAAAGAGT' --name 'dna_gpt3b_m' --weight 'checkpoints/dna_gpt3b_m.pth' --num_samples 10 --max_len 256
python test.py --task 'regression' --input xxxxx --numbers xxxxx --name 'dna_gpt0.1b_h' --weight 'checkpoints/regression.pth'
python test.py --task 'classification' --input xxxxx --name 'dna_gpt0.1b_m' --weight 'checkpoints/classification.pth'
- 'dna_gpt0.1b_m' supports a maximum input length of 24564 bps and 'dna_gpt0.1b_s', 'dna_gpt3b_m' support a maximum input length of 3060 bps.
- The spec_token is set default to 'R' which means human. special token should use with "<", ">", like ""
DNAGPT
@article{zhang2023dnagpt,
title={DNAGPT: A Generalized Pretrained Tool for Multiple DNA Sequence Analysis Tasks},
author={Zhang, Daoan and Zhang, Weitong and He, Bing and Zhang, Jianguo and Qin, Chenchen and Yao, Jianhua},
journal={bioRxiv},
pages={2023--07},
year={2023},
publisher={Cold Spring Harbor Laboratory}
}