promSEMBLE: Hard Pattern Mining and Ensemble Learning for Detecting DNA Promoter Sequences

Bindi M. Nagda, Van Minh Nguyen, Ryan T. White

Motivation:

Accurate identification of DNA promoter sequences is of crucial importance in unraveling the underlying mechanisms that regulate gene transcription. Initiation of transcription is controlled through regulatory transcription factors binding to promoter core regions in the DNA sequence. Detection of promoter regions is necessary if we are to build genetic regulatory networks for biomedical and clinical applications. We propose a novel ensemble learning technique using deep recurrent neural networks with convolutional feature extraction and hard negative pattern mining to detect several types of promoter sequences, including promoter sequences with the TATA-box and without the TATA-box, within DNA sequences of both humans and mice. Using previously published results and extensive independent tests demonstrates our method sets a new state of the art in all four categories for accurately and precisely recognizing the stretch of base pairs that code for the promoter region within the DNA sequences.

Data

EPDNew Database

Results

Our method shows superiority to 4 other state-of-the-art models since it minimizes the rate of both false positives and false negatives. The model presented is unrivaled in multiple measures of performance including Matthews Correlation Coefficient (MCC), precision, sensitivity and specificity. Our model yields the best MCC values across all organisms, achieving a greater than 99% score for all organisms except fruit fly with and without TATA where it achieves a 98.1% score. It goes on to achieve $\geq$ 98% across all 4 performance metrics evaluated for all 8 organisms.

Contact

Go to contact information

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
modules		modules
trained_classifiers		trained_classifiers
Promotor.jpg		Promotor.jpg
README.md		README.md
dataloader.py		dataloader.py
promSEMBLE.ipynb		promSEMBLE.ipynb
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

promSEMBLE: Hard Pattern Mining and Ensemble Learning for Detecting DNA Promoter Sequences

Bindi M. Nagda, Van Minh Nguyen, Ryan T. White

Motivation:

Data

Results

Contact

About

Releases

Packages

Languages

bindi-nagda/promSEMBLE

Folders and files

Latest commit

History

Repository files navigation

promSEMBLE: Hard Pattern Mining and Ensemble Learning for Detecting DNA Promoter Sequences

Bindi M. Nagda, Van Minh Nguyen, Ryan T. White

Motivation:

Data

Results

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages