import graphlab
sales=graphlab.SFrame('home_data.gl/')
/opt/conda/lib/python2.7/site-packages/requests/packages/urllib3/connection.py:266: SubjectAltNameWarning: Certificate for beta.graphlab.com has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
SubjectAltNameWarning
[INFO] graphlab.cython.cy_server: GraphLab Create v2.0.1 started. Logging: /tmp/graphlab_server_1471890497.log
This non-commercial license of GraphLab Create for academic use is assigned to dharun199531@gmail.com and will expire on August 17, 2017.
sales
id | date | price | bedrooms | bathrooms | sqft_living | sqft_lot | floors | waterfront |
---|---|---|---|---|---|---|---|---|
7129300520 | 2014-10-13 00:00:00+00:00 | 221900 | 3 | 1 | 1180 | 5650 | 1 | 0 |
6414100192 | 2014-12-09 00:00:00+00:00 | 538000 | 3 | 2.25 | 2570 | 7242 | 2 | 0 |
5631500400 | 2015-02-25 00:00:00+00:00 | 180000 | 2 | 1 | 770 | 10000 | 1 | 0 |
2487200875 | 2014-12-09 00:00:00+00:00 | 604000 | 4 | 3 | 1960 | 5000 | 1 | 0 |
1954400510 | 2015-02-18 00:00:00+00:00 | 510000 | 3 | 2 | 1680 | 8080 | 1 | 0 |
7237550310 | 2014-05-12 00:00:00+00:00 | 1225000 | 4 | 4.5 | 5420 | 101930 | 1 | 0 |
1321400060 | 2014-06-27 00:00:00+00:00 | 257500 | 3 | 2.25 | 1715 | 6819 | 2 | 0 |
2008000270 | 2015-01-15 00:00:00+00:00 | 291850 | 3 | 1.5 | 1060 | 9711 | 1 | 0 |
2414600126 | 2015-04-15 00:00:00+00:00 | 229500 | 3 | 1 | 1780 | 7470 | 1 | 0 |
3793500160 | 2015-03-12 00:00:00+00:00 | 323000 | 3 | 2.5 | 1890 | 6560 | 2 | 0 |
view | condition | grade | sqft_above | sqft_basement | yr_built | yr_renovated | zipcode | lat |
---|---|---|---|---|---|---|---|---|
0 | 3 | 7 | 1180 | 0 | 1955 | 0 | 98178 | 47.51123398 |
0 | 3 | 7 | 2170 | 400 | 1951 | 1991 | 98125 | 47.72102274 |
0 | 3 | 6 | 770 | 0 | 1933 | 0 | 98028 | 47.73792661 |
0 | 5 | 7 | 1050 | 910 | 1965 | 0 | 98136 | 47.52082 |
0 | 3 | 8 | 1680 | 0 | 1987 | 0 | 98074 | 47.61681228 |
0 | 3 | 11 | 3890 | 1530 | 2001 | 0 | 98053 | 47.65611835 |
0 | 3 | 7 | 1715 | 0 | 1995 | 0 | 98003 | 47.30972002 |
0 | 3 | 7 | 1060 | 0 | 1963 | 0 | 98198 | 47.40949984 |
0 | 3 | 7 | 1050 | 730 | 1960 | 0 | 98146 | 47.51229381 |
0 | 3 | 7 | 1890 | 0 | 2003 | 0 | 98038 | 47.36840673 |
long | sqft_living15 | sqft_lot15 |
---|---|---|
-122.25677536 | 1340.0 | 5650.0 |
-122.3188624 | 1690.0 | 7639.0 |
-122.23319601 | 2720.0 | 8062.0 |
-122.39318505 | 1360.0 | 5000.0 |
-122.04490059 | 1800.0 | 7503.0 |
-122.00528655 | 4760.0 | 101930.0 |
-122.32704857 | 2238.0 | 6819.0 |
-122.31457273 | 1650.0 | 9711.0 |
-122.33659507 | 1780.0 | 8113.0 |
-122.0308176 | 2390.0 | 7570.0 |
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.
graphlab.canvas.set_target('ipynb')
sales.show(view="Scatter Plot",x="Sq_ft",y="price")
graphlab.canvas.set_target('ipynb')
sales.show(view="Scatter Plot",x="Sqft_living",y="price")
graphlab.canvas.set_target('ipynb')
sales.show(view="Scatter Plot",x="sqft_living",y="price")
train_data,test_data = sales.random_split(.8,seed=0)
sqft_model=graphlab.linear_regression.create(train_data, target='price',features=['sqft_living'])
# evaluate model
print test_data['price'].mean()
543054.042563
print sqft_model.evaluate(test_data)
{'max_error': 4176275.8423837754, 'rmse': 255137.5216304084}
import matplotlib.pyplot as plt
/opt/conda/lib/python2.7/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')
%matplotlib inline
plt.plot(test_data['sqft_living'],test_data['price'],'.',test_data['sqft_living'],sqft_model.predict(test_data),'-')
[<matplotlib.lines.Line2D at 0x7f8931cb9290>,
<matplotlib.lines.Line2D at 0x7f8931cb9350>]
sqft_model.get('coefiecients')
[ERROR] graphlab.toolkits._main: Toolkit error: Field 'coefiecients' does not exist. Use list_fields() for a list of fields that can be queried.
---------------------------------------------------------------------------
ToolkitError Traceback (most recent call last)
<ipython-input-15-a88474f12a5c> in <module>()
----> 1 sqft_model.get('coefiecients')
/opt/conda/lib/python2.7/site-packages/graphlab/toolkits/regression/linear_regression.pyc in get(self, field)
514
515 _mt._get_metric_tracker().track('toolkit.regression.linear_regression.get')
--> 516 return super(LinearRegression, self).get(field)
517
518
/opt/conda/lib/python2.7/site-packages/graphlab/toolkits/_supervised_learning.pyc in get(self, field)
224 'model_name': self.__name__,
225 'field': field}
--> 226 response = _graphlab.toolkits._main.run('supervised_learning_get_value', opts)
227 return _map_unity_proxy_to_object(response['value'])
228
/opt/conda/lib/python2.7/site-packages/graphlab/toolkits/_main.pyc in run(toolkit_name, options, verbose, show_progress)
87 _get_metric_tracker().track(metric_name, value=1, properties=track_props, send_sys_info=False)
88
---> 89 raise ToolkitError(str(message))
ToolkitError: Field 'coefiecients' does not exist. Use list_fields() for a list of fields that can be queried.
sqft_model.get('coefficients')
name | index | value | stderr |
---|---|---|---|
(intercept) | None | -39311.9219431 | 5009.91644924 |
sqft_living | None | 277.860068712 | 2.20247859729 |
my_features=['bedrooms','bathrooms','sqft_living','sqft_lot','floors','zipcode']
sales[my_features].show()
sales.show(view="BoxWhisker Plot",x="zipcode",y="price")
my_features_model=graphlab.linear_regression.create(train_data,target='price',features=my_features)
print sqft_model.evaluate(test_data)
print my_features_model.evaluate(test_data)
{'max_error': 4176275.8423837754, 'rmse': 255137.5216304084}
{'max_error': 3479747.5878821, 'rmse': 179762.26007426204}
house1=sales[sales['id']=='5309101200']
house1
id | date | price | bedrooms | bathrooms | sqft_living | sqft_lot | floors | waterfront |
---|---|---|---|---|---|---|---|---|
5309101200 | 2014-06-05 00:00:00+00:00 | 620000 | 4 | 2.25 | 2400 | 5350 | 1.5 | 0 |
view | condition | grade | sqft_above | sqft_basement | yr_built | yr_renovated | zipcode | lat |
---|---|---|---|---|---|---|---|---|
0 | 4 | 7 | 1460 | 940 | 1929 | 0 | 98117 | 47.67632376 |
long | sqft_living15 | sqft_lot15 |
---|---|---|
-122.37010126 | 1250.0 | 4880.0 |
Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated.
You can use sf.materialize() to force materialization.
<img src="5309101200.jpg">
File "<ipython-input-25-f756c3827cc4>", line 1
<img src="5309101200.jpg">
^
SyntaxError: invalid syntax
<img src="house-5309101200.jpg">
File "<ipython-input-26-670e3b68d34b>", line 1
<img src="house-5309101200.jpg">
^
SyntaxError: invalid syntax
print sqft_model.predict(house1)
[627552.2429651106]
print my_feature_model.predict(house1)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-28-bf4c9bcc1a7a> in <module>()
----> 1 print my_feature_model.predict(house1)
NameError: name 'my_feature_model' is not defined
print my_features_model.predict(house1)
[724547.3652961545]