-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathanalysis.Rmd
100 lines (78 loc) · 3.57 KB
/
analysis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
---
title: "Real State Exploration"
author: "Gabriel Bessa"
date: "August 17, 2018"
output:
html_document:
df_print: paged
---
# Introduction
Moving away from home for the first time is often a experience that most of people will always remember. This experience can be either really good or really bad. I know that someday it will be my turn, so I decided to prepare my self to do it, and the first step will be understanding how much would cost to rent an apartment in my hometown.
This report will explore a dataset containing rental prices and attributes for approximately 2218 apartments that are located at Belo Horizonte, Brazil. This data set was crawled from [NetImoveis](https://www.netimoveis.com/) using the [NetSpider](https://github.com/gabrielvcbessa/real-state-crawlers/blob/master/net_spider.py) crawler.
```{r echo=FALSE, message=FALSE, warning=FALSE}
df <- read.csv('net_apartments.csv')
```
# Analysis
## Plotting locations
When handling a dataset that has locations on each observation, the first thing that we might want to do is plotting each observation in a map, and this is what we are going to do.
```{r echo=FALSE, message=FALSE, warning=FALSE}
library(ggmap)
library(dplyr)
library(maps)
invisible(gmaps.key <- scan('maps.key', what = 'string'))
invisible(gmaps.style <- scan('ultra_light.style', what = 'string'))
map.apartments <- get_googlemap(
center = c(lon = mean(df$longitude), lat = mean(df$latitude)),
maptype = 'roadmap',
zoom = 3,
style = gmaps.style,
key = gmaps.key)
ggmap(map.apartments) +
geom_point(aes(x = longitude, y = latitude),
data = df, alpha = .75) +
theme(plot.margin = grid::unit(c(0, 0, 0, 0), 'mm'))
```
The plot show us that we have some misplaced points. Some of them have 0 value for the latitude and longitude, and others are placed on different brazilian capitals. We are going to impute those values and generate our map again.
```{r echo=FALSE, message=FALSE, warning=FALSE}
library(geojsonio)
library(gridExtra)
library(rgdal)
outlier_values <- boxplot.stats(df$longitude)$out
outlier_indexes <- which(df$longitude %in% outlier_values)
outliers <- df[outlier_indexes, ]
# We are going to input some values to the missing locations
df.by_neigh <- df[-outlier_indexes, ] %>%
group_by(neighborhood) %>%
summarise(latitude = mean(latitude),
longitude = mean(longitude))
outliers.imputed <-
transform(outliers,
latitude = df.by_neigh[neighborhood == neighborhood, ]$latitude,
longitude = df.by_neigh[neighborhood == neighborhood, ]$longitude)
df.imputed <- df
df.imputed[outlier_indexes, ] <- outliers.imputed
bh.json <- geojson_read('bh.geojson', what = 'sp')
bh.poly <- fortify(bh.json)
bh.map <- get_googlemap(
center = c(lon = mean(bh.poly$long), lat = mean(bh.poly$lat)),
maptype = 'roadmap',
zoom = 11,
style = gmaps.style,
key = gmaps.key)
bh.bbox <- attr(bh.map, 'bb')
ggmap(bh.map, padding = 0) +
geom_polygon(data = bh.poly,
aes(long, lat),
size = .55,
colour = '#6692CC',
fill = NA,
alpha = 0.1) +
stat_density2d(data = df.imputed,
aes(longitude, latitude,
fill = ..level..,
alpha = ..level..),
size = 1, bins = 50, geom = 'polygon') +
scale_fill_gradient('Real estate\nDensity', low = 'blue', high = 'orange') +
scale_alpha(range = c(.1, .3), guide = FALSE)
```
With this heatmap we can see that we have observations from most neighborhoods of the city, but in the south-west region we have a higher density.