King County Housing Price Prediction

Background

The following notebook presents the steps involved in and the thought process we used in predicting house prices based on multiple features using regression analysis. We were presented with a dataset preprocessed for instructional purposes and derived from the dataset provided in the former Kaggle competition to predict housing sale price using regression.

If you would like to explore the original competition on Kaggle, please follow the link below: https://www.kaggle.com/harlfoxem/housesalesprediction/discussion/92376

Names and descriptions of the columns in the provided King County dataset:

id - unique ID for a house
date - Date day house was sold
price - Price is prediction target
bedrooms - Number of bedrooms
bathrooms - Number of bathrooms
sqft_living - square footage of the home
sqft_lot - square footage of the lot
floors - Total floors (levels) in house
waterfront - Whether house has a view to a waterfront
view - Number of times house has been viewed
condition - How good the condition is (overall)
grade - overall grade given to the housing unit, based on King County grading system
sqft_above - square footage of house (apart from basement)
sqft_basement - square footage of the basement
yr_built - Year when house was built
yr_renovated - Year when house was renovated
zipcode - zip code in which house is located
lat - Latitude coordinate
long - Longitude coordinate
sqft_living15 - The square footage of interior housing living space for the nearest 15 neighbors
sqft_lot15 - The square footage of the land lots of the nearest 15 neighbors

Business Problem

Our client representing a cohort of foreign investors has expressed interest in becoming involved in the Seattle area housing market. By gaining better insight into the prediction models for housing prices, they hope to become major players in the market. They have partnered with us to learn how applying supervised machine learning analysis to predict housing prices in the King County.

We set out to answer a few questions for our client:

Do renovated properties have a higher selling price than unrenovated properties?
Does the number of times a property is viewed have any effect on selling price?
Does the grade given to the housing unit have an overall effect on the selling price?

Through the use of statistical tests during our EDA process, we will be able to provide the essential information needed for our clients in their new business venture.

Directory and File Structure

├── /data (folder of all housing and modeling data)
│   ├── model.pickle
│   ├── scaler.pickle
│   ├── housing_preds_Steven_Yan.csv
│   ├── modeling.csv (CSV for import into modeling workbook)
│   └── partials/template
├── /images (folder of all visualizations created)
├── /mapping (folder of mapping files)
├── housing_eda.ipynb (EDA and Feature Engineering workbook)
├── housing_holdout.ipynb (Holdout Set workbook)
├── housing_modeling.ipynb (Modeling workbook)
└── README.md

Methods

We started with an exploratory data analysis to gain a better understanding of the dataset. We created some data visualizations which helped us to learn what features had a stronger correlation than others. The insight gained from our EDA guided our data cleaning and feature engineering processes.

We performed a linear regression model on the entire database with the polynomial and interaction features as well as the dummy variables. We used K-Best for feature selection.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

King County Housing Price Prediction

Background

Business Problem

Directory and File Structure

Methods

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
data		data
images		images
mapping		mapping
.gitignore		.gitignore
README.md		README.md
housing_eda.ipynb		housing_eda.ipynb
housing_holdout.ipynb		housing_holdout.ipynb
housing_modeling.ipynb		housing_modeling.ipynb

datascisteven/King-County-Housing-Price-Prediction

Folders and files

Latest commit

History

Repository files navigation

King County Housing Price Prediction

Background

Business Problem

Directory and File Structure

Methods

About

Topics

Resources

Stars

Watchers

Forks

Languages