-
Notifications
You must be signed in to change notification settings - Fork 0
FAhtisham/DCNN-YY1
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
*************************************************** ****************** Word2vec.ipynb ***************** *************************************************** 1- How to generate w2vec embeddings? Open w2vec.ipynb - Run all the cells with relevant functions to create w2vec model i.e., (getWord_model) - Run the helper functions to convert DNA sequences to KMERS (DNAToWOrd) - Split the two seqs in different lists (getDNASplit) - (GetAvgFeatureVecs) to get the mean of the embeddings - Some function just help you to write to appropriate npy and csv files Cell 6: (the cell where these functions are used) - Use appropriate "KMER SIZE" (here it is 6) kmer = 6 - Declare the path for train or test data - seqfile = 'cross_test_data.fa' - read the data - DNAseq = pd.read_csv(seqfile,sep = "\t",error_bad_lines=False) - Define the number of positive samples in the file - # pos_number = 2096 # train pos samples - pos_number = 1658 # cross test - split pair of DNA sequences - words1,words2 = getDNA_split(DNAseq,kmer) - get the w2vec model, where Unfile represents the kmers present in the sequences - word_model = getWord_model(kmer,fea_num,min_fea,model,Unfile) Cell 7: - after training or loading the model - Avg out the features as discussed - dataDataVecs = getAvgFeatureVecs(words1,words2,word_model,fea_num) - convert embeddings to csv - A csv file contains label and the relevant feature vector of the seqs fea_svm = '%d_test_vecs.svm'%(kmer) fea_csv = '%d_test.csv'%(kmer) npyTosvm(fea_npy, fea_svm,pos_number) SVMtoCSV(fea_svm, fea_csv) *************************************************** ****************** CNN_model.ipynb **************** *************************************************** Cell 1: - Load all libraries cell 2: - Load the default evaluation measures - METRICS = [ keras.metrics.TruePositives(name='tp'), keras.metrics.FalsePositives(name='fp'), keras.metrics.TrueNegatives(name='tn'), keras.metrics.FalseNegatives(name='fn'), keras.metrics.BinaryAccuracy(name='accuracy'), keras.metrics.Precision(name='precision'), keras.metrics.Recall(name='recall'), keras.metrics.AUC(name='auc'), ] - Function to create the deep learning model (dnn_model()) - Inputs - Conv1(inputs) - cc1 = concatenate(inputs, Conv1) - Conv2(Conv1) - Conv3(cc1) - cc2 = concatenate(inputs, Conv3) - dropout on cc1, cc2 - Dense1(cc1) - Dense2(cc2) - cc3 = concatenate(Dense1, Dense2) - Dense3(Conv2) - cc4 = concatenate(cc3, Dense3) - Dense4, sigmoid => final prediction (ADAM, BCE) Cell 3: - 5-Fold Validation Cell 4: - Independent testing (NOTE PLEASE DECLARE THE POS SAMPLES BY CROSS CHECKING WITH THE RELEVANT FILES PRESENT IN THE SUBSEQUENT DIRECTORIES)
About
The repository contains the codes for the prediction of YY1 chromatin loops using densely connected neural network based on Kmer embeddings.
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published