Naieve Bayes Bag-of-Words Sentiment Classifier

Description

Trains a naieve bayes classifier to predict sentiment of a movie review (positive or negative). The assignment code has been cleaned up and streamlined to facilitate reading and usage. This means the complete solution to the assignment is not here, just what I deemed the most relevant part for sharing.

Instructor Implementations

tokenize_doc
train
report_statistics_after_training

Modifications to Instructor Implementations

__init__: Added feature_extractor member that defaults to tokenize_doc
tokenize_and_update_model: Switched to use feature_extractor member rather than tokenize_doc

Implementations I provided

tokenize_doc_stopwords
tokenize_doc_stopwords_custom
tokenize_doc_stopwords_and_stemming
update_model
p_word_given_label
log_likelihood
p_word_given_label_and_psuedocount
log_likelihood
log_prior
unnormalized_log_posterior
classify
likelihood_ratio
evaluate_classifier_accuracy

Demo

To train a Naive Bayes classifier on the large_movie_review_dataset data using a feature extractor that stems, removes stopwords, and custom stopwords:

python nb_sentiment_classify.py

This command trains the model with every pseudocount from 1 to 25 (inclusive), creates a graph of pseudocount vs accuracy, returns the best pseudocount and the accuracy associated with that pseudocount.

Usage

from nb_sentiment_classify import NaiveBayes;

# Initialize model with default feature extractor
nb = NaiveBayes()

# Train model on large_movie_review_dataset
nb.train_model()

# Evaluate accuracy given a pseudocount (1 used in this example)
nb.evaluate_classifier_accuracy(1)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Naieve Bayes Bag-of-Words Sentiment Classifier

A project for CS585 - Introduction to Natural Language Processing

Assignment Description

Starter code

Data

Instructor: Brendan T. O'Connor

Description

Instructor Implementations

Modifications to Instructor Implementations

Implementations I provided

Demo

Usage

Files

README.md

Latest commit

History

README.md

File metadata and controls

Naieve Bayes Bag-of-Words Sentiment Classifier

A project for CS585 - Introduction to Natural Language Processing

Assignment Description

Starter code

Data

Instructor: Brendan T. O'Connor

Description

Instructor Implementations

Modifications to Instructor Implementations

Implementations I provided

Demo

Usage