Categories = ["人文社會", "心理成長", "史地傳記", "冒險推理", "科幻靈異", "科普百科", "旅遊紀實", "商業理財", "感性抒情", "詩詞國學", "運動休閒", "醫療健康", "藝術設計"]
-
Create folders at //BERT-LSTM_Book_Classification/
- val
- folders * number of Categories. named accordingly.
- train
- folders * number of Categories. named accordingly.
- test5
- folders * number of Categories. named accordingly.
- data
- Null . Used for placing vectors created by BERT, and than used by LSTM.
- val
-
Place *.txt of books in their belonged Category.
- val : Books for validation
- train : Books for training
- test5 : Books for testing
Or you can just get the data here Google Drive.
Run BERT_LSTM.py
Output :
- model checkpoint for each epoch. //Output/BertLSTM_for_LSTM_epoch*.bin
- History.txt Contans Loss and Accuracy for each epoch
Run BERT_LSTM_Test.py
Output :
-
Details on prediction (CSV structure). ex : //Output/Test_Output_10_epoch9.txt
- name : Book name
- ans : Real Category
- pre : Predicted Category
- All Category names : How many times the model think it was a certain category
- top3 : Whether the real answer is in the top 3 by vote count or not (true : 1,false : 0)
-
List of real & predicted answer. Plus accuracy of every N sentences. NOT THE ACCURACY OF THE MODEL
Setup
1. Delete all folders but 1,in the test5 folder. Folder name needs to be one of the categories
2. Put books you want to predict in the only folder test5 has.
Run BERT_LSTM_Test.py
Output : Same as in Test