A project to predict a Yelp user's elite status based on user activity, popularity/social network size, review sentiment/content, and review structure. Investigates various classification models, including random forest classifiers, naive bayes, logistic regression, and SVM. Final model achieves an accuracy score of 98% and AUC of 99%.
For more information, see my blog post.
yelp-text-processing.ipynb
jupyter notebook with scripts and outputs for processing review textyelp-classification.ipynb
jupyter notebook with scripts and outputs for feature engineering and model selectiond3/
contains scripts for feature importance bar chart and network graph vizpresentation/
contains pdf presentation of findings & recommendations
For the sake of not overloading github, the original data was not uploaded to this repo. Yelp data can be accessed at the link provided at the bottom of this readme.
$ git clone https://github.com/dianalam/yelp-classifier.git
Scripts were written in Python 2.7. You'll need the following modules:
matplotlib >= 1.5.1
nltk >= 3.1
numpy >= 1.10.1
pandas >= 0.17.1
python-dateutil >= 2.4.2
scipy >= 0.16.0
seaborn >= 0.6.0
sklearn >= 0.17
spacy >= 0.100
statsmodels >= 0.6.1
To install modules, run:
$ pip install <module>
To open jupyter notebooks:
jupyter notebook
To run visualizations:
python -m SimpleHTTPServer
Then navigate to instantiated port.
Thanks to: