Starbucks-classification

Exploring how Starbucks app user data can help us target customers who are likely to complete offers.

The blog for this project can be found here.

requirements.txt - Contains the required packages to set up the development envrironment.
main.ipynb - A jupyter notebook which contains the data wrangling and classifications models.
data/portfolio - contains the charatertistics of the starbucks offers.
data/transcript - contains the app events and transaction details.
data/profile - contains the demographic data for the customer.

Project summary

1. Business understanding

The goal is to complete the Data Science project as part of the Udacity nanodegree.

We are aiming to answer two business questions.

Are there differences in demographics between users who successfully complete offers, versus those who do not?
Can we predict whether a user will successfully complete an offer?

2. Data understanding

Data has been provided by Starbucks. We have data for the customers (profile.json), we have the offer charateristics (portfolio.json) and we also have the data for all the app events related to the offers (transcript.json).

The data is in json format.

The demographics data is limited. We will need to create new features to improve our ability to classify our customers. There are columns in here that will need to be dummy coded. The portfolio data is straightforward. There are some columns in here with embedded lists which will need to be extracted. The transcript data is a bit tricky. We will need to wrangle this data alot to work out which transactions can be related to a successful offer.

3. Data preparation

We undertook a lot of data wrangling. I would invite you to view the notebook to view this specifically for each dataset.

4. Modelling

A random forest classfier and an adaboost classifier was chosen to classify the data. The data were split into training and testing sets and then entered a pipeline. The pipeline included a column transformer to scale the numeric data. Both models were trained using GridSearchCV. To account for the class imbalance, I used SMOTE to balance the classes.

5. Evaluation

The F1 score was chosen to evaluate the model as it accounts for both precision an recall. We want to reward classifying the correct response for both 1 and 0. We also used the confusion matrix to interpret the success of the models.

The resulting average F Score was 0.92 for the random forest model and 0.90 for the adaboost model.

Random forest model confusion matrix

Adaboost model confusion matrix

Acknowledgements

Thanks to the mentor pages within the Udacity platform which helped me find the correct information to interpret the dataset in the correct way, as well as many stack overflow articles and the scikit-learn documentations

The following articles also helped me with some decision making.

https://towardsdatascience.com/random-forest-hyperparameters-and-how-to-fine-tune-them-17aee785ee0d https://towardsdatascience.com/using-starbucks-app-user-data-to-predict-effective-offers-20b799f3a6d5

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.idea		.idea
data		data
.gitattributes		.gitattributes
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
main.ipynb		main.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Starbucks-classification

Contents

Installation

How to run

Motivation

Project files

Project summary

1. Business understanding

2. Data understanding

3. Data preparation

4. Modelling

5. Evaluation

Acknowledgements

About

Releases

Packages

Languages

tim-blackmore/Starbucks-classification

Folders and files

Latest commit

History

Repository files navigation

Starbucks-classification

Contents

Installation

How to run

Motivation

Project files

Project summary

1. Business understanding

2. Data understanding

3. Data preparation

4. Modelling

5. Evaluation

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages