Skip to content

ebarnell/geneoscopy_dev

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 

Repository files navigation

geneoscopy_dev

This project applies feature selection to highly confident differentailly expressed (DE) transcripts, and employs supervised machine learning algorithms (e.g. Random Forest, Support Vector Machine, Neural Network, Gradient Boosting) to train model and predict patient labels (i.e. cancer, polys, normal).

Requirement

Install pip:
python get-pip.py
Intall Python packages:
sudo pip install numpy
sudo pip install scipy
sudo pip install matplotlib
sudo pip install scikit-learn

Data Resouces

Put normalized expression matrix, quality control file, sample sheet file into

data/<project name>

Run pipeline

This pipeline includes assessment of sample quality, split of training/testing sets, analysis of significantly DE genes in training set, training of ML models, and testing of prediction quality. Input arguments in scripts/run_pipeline.sh:

DIR_SCRIPTS=<complete path to script directory>
DIR_DATA=<complete path to data resources>
NUM_SAMPLES=<number of samples>
GROUP=<label comparison>
THLD_PVAL=<threshold of p value>
THLD_FC=<threshold of fold change>
NORMALIZED_CHIPDATA=<filename of normalized expression matrix>
QC_TABLE=<filename of quality control table>
SAMPLE_SHEET=<filename of sample sheet>
Example:
bash scripts/run_pipeline.sh

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 73.7%
  • Shell 21.7%
  • R 4.6%