Skip to content

Latest commit

 

History

History
677 lines (530 loc) · 36.9 KB

Predicting House prices.md

File metadata and controls

677 lines (530 loc) · 36.9 KB
import graphlab

Load house data

sales=graphlab.SFrame('home_data.gl/')
/opt/conda/lib/python2.7/site-packages/requests/packages/urllib3/connection.py:266: SubjectAltNameWarning: Certificate for beta.graphlab.com has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
[INFO] graphlab.cython.cy_server: GraphLab Create v2.0.1 started. Logging: /tmp/graphlab_server_1471890497.log


This non-commercial license of GraphLab Create for academic use is assigned to dharun199531@gmail.com and will expire on August 17, 2017.
sales
id date price bedrooms bathrooms sqft_living sqft_lot floors waterfront
7129300520 2014-10-13 00:00:00+00:00 221900 3 1 1180 5650 1 0
6414100192 2014-12-09 00:00:00+00:00 538000 3 2.25 2570 7242 2 0
5631500400 2015-02-25 00:00:00+00:00 180000 2 1 770 10000 1 0
2487200875 2014-12-09 00:00:00+00:00 604000 4 3 1960 5000 1 0
1954400510 2015-02-18 00:00:00+00:00 510000 3 2 1680 8080 1 0
7237550310 2014-05-12 00:00:00+00:00 1225000 4 4.5 5420 101930 1 0
1321400060 2014-06-27 00:00:00+00:00 257500 3 2.25 1715 6819 2 0
2008000270 2015-01-15 00:00:00+00:00 291850 3 1.5 1060 9711 1 0
2414600126 2015-04-15 00:00:00+00:00 229500 3 1 1780 7470 1 0
3793500160 2015-03-12 00:00:00+00:00 323000 3 2.5 1890 6560 2 0
view condition grade sqft_above sqft_basement yr_built yr_renovated zipcode lat
0 3 7 1180 0 1955 0 98178 47.51123398
0 3 7 2170 400 1951 1991 98125 47.72102274
0 3 6 770 0 1933 0 98028 47.73792661
0 5 7 1050 910 1965 0 98136 47.52082
0 3 8 1680 0 1987 0 98074 47.61681228
0 3 11 3890 1530 2001 0 98053 47.65611835
0 3 7 1715 0 1995 0 98003 47.30972002
0 3 7 1060 0 1963 0 98198 47.40949984
0 3 7 1050 730 1960 0 98146 47.51229381
0 3 7 1890 0 2003 0 98038 47.36840673
long sqft_living15 sqft_lot15
-122.25677536 1340.0 5650.0
-122.3188624 1690.0 7639.0
-122.23319601 2720.0 8062.0
-122.39318505 1360.0 5000.0
-122.04490059 1800.0 7503.0
-122.00528655 4760.0 101930.0
-122.32704857 2238.0 6819.0
-122.31457273 1650.0 9711.0
-122.33659507 1780.0 8113.0
-122.0308176 2390.0 7570.0
[21613 rows x 21 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

Exploring data

graphlab.canvas.set_target('ipynb')
sales.show(view="Scatter Plot",x="Sq_ft",y="price")
graphlab.canvas.set_target('ipynb')
sales.show(view="Scatter Plot",x="Sqft_living",y="price")
graphlab.canvas.set_target('ipynb')
sales.show(view="Scatter Plot",x="sqft_living",y="price")

create a simple regression model for sqft to price

train_data,test_data = sales.random_split(.8,seed=0)

sqft_model=graphlab.linear_regression.create(train_data, target='price',features=['sqft_living'])

# evaluate model
print test_data['price'].mean()
543054.042563
print sqft_model.evaluate(test_data)
{'max_error': 4176275.8423837754, 'rmse': 255137.5216304084}

lets show our predictions

import matplotlib.pyplot as plt
/opt/conda/lib/python2.7/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
  warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')
%matplotlib inline
plt.plot(test_data['sqft_living'],test_data['price'],'.',test_data['sqft_living'],sqft_model.predict(test_data),'-')
[<matplotlib.lines.Line2D at 0x7f8931cb9290>,
 <matplotlib.lines.Line2D at 0x7f8931cb9350>]

png

sqft_model.get('coefiecients')
[ERROR] graphlab.toolkits._main: Toolkit error: Field 'coefiecients' does not exist. Use list_fields() for a list of fields that can be queried.




---------------------------------------------------------------------------

ToolkitError                              Traceback (most recent call last)

<ipython-input-15-a88474f12a5c> in <module>()
----> 1 sqft_model.get('coefiecients')


/opt/conda/lib/python2.7/site-packages/graphlab/toolkits/regression/linear_regression.pyc in get(self, field)
    514 
    515         _mt._get_metric_tracker().track('toolkit.regression.linear_regression.get')
--> 516         return super(LinearRegression, self).get(field)
    517 
    518 


/opt/conda/lib/python2.7/site-packages/graphlab/toolkits/_supervised_learning.pyc in get(self, field)
    224                 'model_name': self.__name__,
    225                 'field': field}
--> 226         response = _graphlab.toolkits._main.run('supervised_learning_get_value', opts)
    227         return _map_unity_proxy_to_object(response['value'])
    228 


/opt/conda/lib/python2.7/site-packages/graphlab/toolkits/_main.pyc in run(toolkit_name, options, verbose, show_progress)
     87         _get_metric_tracker().track(metric_name, value=1, properties=track_props, send_sys_info=False)
     88 
---> 89         raise ToolkitError(str(message))


ToolkitError: Field 'coefiecients' does not exist. Use list_fields() for a list of fields that can be queried.
sqft_model.get('coefficients')
name index value stderr
(intercept) None -39311.9219431 5009.91644924
sqft_living None 277.860068712 2.20247859729
[2 rows x 4 columns]

explore other features

my_features=['bedrooms','bathrooms','sqft_living','sqft_lot','floors','zipcode']
sales[my_features].show()

sales.show(view="BoxWhisker Plot",x="zipcode",y="price")

build a regression model with my_features model

my_features_model=graphlab.linear_regression.create(train_data,target='price',features=my_features)

print sqft_model.evaluate(test_data)
print my_features_model.evaluate(test_data)
{'max_error': 4176275.8423837754, 'rmse': 255137.5216304084}
{'max_error': 3479747.5878821, 'rmse': 179762.26007426204}

apply model to predict 3 houses

house1=sales[sales['id']=='5309101200']
house1
id date price bedrooms bathrooms sqft_living sqft_lot floors waterfront
5309101200 2014-06-05 00:00:00+00:00 620000 4 2.25 2400 5350 1.5 0
view condition grade sqft_above sqft_basement yr_built yr_renovated zipcode lat
0 4 7 1460 940 1929 0 98117 47.67632376
long sqft_living15 sqft_lot15
-122.37010126 1250.0 4880.0
[? rows x 21 columns]
Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated.
You can use sf.materialize() to force materialization.
<img src="5309101200.jpg">
  File "<ipython-input-25-f756c3827cc4>", line 1
    <img src="5309101200.jpg">
    ^
SyntaxError: invalid syntax
<img src="house-5309101200.jpg">
  File "<ipython-input-26-670e3b68d34b>", line 1
    <img src="house-5309101200.jpg">
    ^
SyntaxError: invalid syntax
print sqft_model.predict(house1)
[627552.2429651106]
print my_feature_model.predict(house1)
---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

<ipython-input-28-bf4c9bcc1a7a> in <module>()
----> 1 print my_feature_model.predict(house1)


NameError: name 'my_feature_model' is not defined
print my_features_model.predict(house1)
[724547.3652961545]