Skip to content

Latest commit

 

History

History
66 lines (43 loc) · 2.15 KB

File metadata and controls

66 lines (43 loc) · 2.15 KB

Naieve Bayes Bag-of-Words Sentiment Classifier

Description

Trains a naieve bayes classifier to predict sentiment of a movie review (positive or negative). The assignment code has been cleaned up and streamlined to facilitate reading and usage. This means the complete solution to the assignment is not here, just what I deemed the most relevant part for sharing.

Instructor Implementations

  • tokenize_doc
  • train
  • report_statistics_after_training

Modifications to Instructor Implementations

  • __init__: Added feature_extractor member that defaults to tokenize_doc
  • tokenize_and_update_model: Switched to use feature_extractor member rather than tokenize_doc

Implementations I provided

  • tokenize_doc_stopwords
  • tokenize_doc_stopwords_custom
  • tokenize_doc_stopwords_and_stemming
  • update_model
  • p_word_given_label
  • log_likelihood
  • p_word_given_label_and_psuedocount
  • log_likelihood
  • log_prior
  • unnormalized_log_posterior
  • classify
  • likelihood_ratio
  • evaluate_classifier_accuracy

Demo

To train a Naive Bayes classifier on the large_movie_review_dataset data using a feature extractor that stems, removes stopwords, and custom stopwords:

python nb_sentiment_classify.py

This command trains the model with every pseudocount from 1 to 25 (inclusive), creates a graph of pseudocount vs accuracy, returns the best pseudocount and the accuracy associated with that pseudocount.

Usage

from nb_sentiment_classify import NaiveBayes;

# Initialize model with default feature extractor
nb = NaiveBayes()

# Train model on large_movie_review_dataset
nb.train_model()

# Evaluate accuracy given a pseudocount (1 used in this example)
nb.evaluate_classifier_accuracy(1)