This project implements a Naive Bayes classifier for detecting spam emails. The model is trained using a dataset of emails, which are classified as either spam or ham (non-spam). The classifier uses the frequency of specific words in the emails to predict the likelihood of an email being spam.
Email spam detection is an important application of machine learning, helping to filter out unwanted emails and improve user experience. This project utilizes a Naive Bayes approach to classify emails based on the frequency of certain words.
- Implements a Naive Bayes classifier for spam detection.
- Trains on a dataset of labeled emails.
- Calculates the probability of each word being in a spam or ham email.
- Classifies new emails based on the trained model.
To get started with this project, follow these steps:
- Clone the repository:
git clone https://github.com/amritBskt/email-spam-detection.git
- Navigate to the project directory:
cd email-spam-detection
- Ensure you have a C++ compiler installed.
-
Compile source folder:
g++ spamDetect.cpp Email.cpp -o spamDetect
-
Run the program:
./spamDetect
The dataset consists of two files: training_set.data and test_set.data. Each file contains emails represented by the frequency of certain words and a label indicating whether the email is spam (0) or ham (1).
The program outputs the following statistics after running the classification:
- Total number of emails in the training and test sets.
- Number of spam and ham emails in the training and test sets.
- Number of correct and incorrect classifications.
- Prediction accuracy.
This project is licensed under the MIT License.
- GitHub: amritBskt
- Email: your-amritbskt7@gmail.com