To run this project, you can clone this repository onto your local machine and run the main.py file. The version of python used is python 3.7.
This is an AI school project where I and two group members Che Shao Chen and Yuan Zhang to implement a Naive Bayes algorithms form scratch.
This project contains 4 Python files for the implementation.
- CountVectorizer : Responsible for computing the vector of frequency for each document.
- Main : Contains the main method, where program execution begin.
- NaiveBayes : Responsible for training and prediction.
- PreProcess : Responsible for processing the texts. Folds all characters to lowercase, then tokenize them using regular expression and uses the set of resulting words as the vocabulary.
The main findings of the code can be found in the report folder.