-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathCMPD Traffic Stops
821 lines (541 loc) · 29.3 KB
/
CMPD Traffic Stops
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
---
title: "problem_set_2"
author: "Yusuf Kemal Demir; Ph.D."
date: "10/22/2020"
output:
word_document:
toc: yes
toc_depth: '2'
pdf_document:
toc: yes
toc_depth: '2'
html_document:
theme: united
toc: yes
toc_depth: 2
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Visual Analytics
## Task: CMPD Traffic Stops
# Details
- Who did you collaborate with: Myself
- Approximately how much time did you spend on this problem set: 3-5 hours
- What, if anything, gave you the most trouble: xxxxx
## Instructions
In addition to running the code, you will need to do two things: either write in complete sentences a response or write R code to fill in a chunk.
- "xxxxx" = Write, in complete English sentences, your response. 1-2 sentences is sufficient.
- "**Plot**" = Write R code in the chunk corresponding to the instructions.
# Analyzing CMPD Traffic Stops
For this problem set, we'll examine a data set of stops by the Charlotte-Mecklenburg Police Department (CMPD).
Our focus will be to understand what factors are related to whether someone is searched or not for a traffic stop.
For our data set, you'll load the csv file we saved in the setup. This should be in your data folder.
## Install packages
Install the `tidyverse`,`scales`,`ggspatial`,`sf` packages
```{r install tidyverse, scales, ggpatial, sf, message=F, warning=F}
# install.packages('tidyverse', repos="http://cran.us.r-project.org")
# install.packages('scales')
# install.packages('ggspatial')
# install.packages('sf')
```
## Load libraries and df
Load the `tidyverse`,`scales`,`ggspatial`,`sf` libraries
```{r load_packages-data, warning=F,message=F}
library(tidyverse)
library(scales)
library(ggspatial)
library(sf)
df<-read.csv(file='data/Officer_Traffic_Stops.csv')
```
```{r load-packages-data, warning=FALSE, message=FALSE}
library(tidyverse)
library(scales)
library(ggspatial)
df <- read_csv("data/Officer_Traffic_Stops.csv")
```
## Demographics of drivers
First, look at the data using the `glimpse()` function from `dplyr`
```{r glimpse}
glimpse (df)
```
Notice the different variable types: character (chr), num (numeric), and datetime (POSIXct).
Let's consider our target variable: `Was_a_Search_Conducted`.
**Plot** a bar chart that counts the number of records by `Was_a_Search_Conducted`.
```{r digest, message=F, warning=F}
# install.packages('digest')
```
```{r library digest, message=F, warning=F}
library(digest)
```
```{r bar Was_a_Search_Conducted}
ggplot(data=df)+
geom_bar(mapping = aes(x=Was_a_Search_Conducted))
```
```{r bar Was_a_Search_Conducted, coord flip}
ggplot(data=df)+
geom_bar(mapping = aes(x=Was_a_Search_Conducted))+coord_flip()
```
How well balanced is the data set by this field?
Bar charts can be used for ordinal and nominal data sets. Geom_bar requires only one discrete variable as an aesthetic. Y variable counts by default the number of counts. However geom_col requires x and y variables inside the aesthetic of data.
It is unbalanced, and the skewness is towards to " Search Not Conducted". The total count was 68488, and "Search Not Conducted" was around 65K-70K.
Next, let's consider the age range of the driver.
**Plot** a histogram of `Driver_Age`. Determine an appropriate number of bins.
```{r histogram Driver_Age}
ggplot(data = df)+
geom_histogram(mapping=aes(x=Driver_Age))
```
Once you go above (around) 40-50 bins, you'll notice some points stick out.
What is happening?
Histogram chart is only used to plot the frequency of score occurrences in a continuous data set that has been divided into classes, called bins.Notice that, unlike a bar chart, there are no "gaps" between the bars (although some bars might be "absent" reflecting no frequencies).
Driver_Age (continuous variable) data was split into intervals, called bins. Each bin contains the number of occurrences of scores in the data set that are contained within that bin. For the above data set, the frequencies in each bin have been tabulated along with the scores that contributed to the frequency in each bin.
There is no right or wrong answer as to how wide a bin should be, but there are rules of thumb. You need to make sure that the bins are not too small or too large. Above-mentioned histogram use same data as the below-mentioned data, but have smaller bins.
We can see from the above-mentioned histogram that the default bins=30 is not very convenient to show data and the underlying pattern of the data to be easily seen. Below-mentioned histogram's bins provide more insight. For instance, it shows the frequency of some age groups (25,29, 35, 43, 48, 53,57, 59, 69) can be listed in descending order, which means this stick out points are stopped more.
```{r histogram Driver_Age, bins 45}
ggplot(data = df)+
geom_histogram(mapping = aes(x=Driver_Age),bins = 45)
```
**Plot** a density plot of `Driver_Age`. Add in a fill to be "lightblue". Determine an appropriate kernel density to use (`adjust`).
```{r density plot Driver_Age}
ggplot(data=df)+
geom_density(mapping = aes(Driver_Age), fill='lightblue')
```
``` {r density plot Driver_Age, fill lightblue, kernel adjust, coisine}
ggplot(df, aes(Driver_Age))+
geom_density(fill='lightblue', adjust=2.0, kernel='cosine')
```
```{r density plot Driver_Age, fill lightblue, kernel adjust, gaussian default}
ggplot(data=df,aes(Driver_Age))+
geom_density(fill='lightblue',adjust=2.0) #adjust= 10 very smooth, #adjust=0.1 crazy bumpy
```
**Plot** a box plot with `Was_a_Search_Conducted` on the x-axis and `Driver_Age` on the y-axis.
```{r boxplot Was_a_Search_Conducted` vs Driver_Age}
ggplot(df,aes(Was_a_Search_Conducted, Driver_Age))+
geom_boxplot()
```
```{r Was_a_Search_Conducted` vs Driver_Age, outlier.shape=NA}
ggplot(data=df)+
geom_boxplot(mapping = aes(x = Was_a_Search_Conducted, y = Driver_Age),outlier.shape = NA)+
ylim(10,78)+
scale_y_continuous(breaks=seq(10, 80, 4)) # Ticks from 0-10, every 4
```
**Plot** a violin plot.
```{r violin}
ggplot(df,aes(Was_a_Search_Conducted, Driver_Age))+
geom_violin(color='purple',fill='lightblue',linetype= 'dashed', alpha=0.2)
```
From the plots above, do you think the age of the driver is a significant factor in whether a search was conducted? Why or why not?
For search not conducted:Q1= 25, Median (Q2)=33, Q3=45, and outliers are condensed in age 75-110. The IQR (Q3-Q1)= 20
For search conducted:Q1= 21, Median (Q2)=25, Q3=35, and outliers are condensed in age 50-77.The IQR (Q3-Q1)= 14
Violin plots also confirm the distribution difference for search of the Driver_Age .
The Driver_Age seems to be a factor whether search conducted or not.
## Date of stop
Let's plot the number of stops by time.
Recalling part one, the `Month_of_Stop` variable is a character, not a date variable. The datatime's are simply when the data was collected; not when the stop occurred. Therefore, we'll need to convert the `Month_of_Stop` variable from a character to a Date format.
Let's first cleanup the date field using `tidyverse` packages like `stringr` and `lubridate`.
```{r df}
df
```
```{r date-cleanup}
library(stringr)
library(lubridate)
df <- mutate(df, Month_of_Stop = str_replace_all(Month_of_Stop, "/","-")) # replace "/" with "-"
df <- mutate(df, Month_of_Stop = paste(df$Month_of_Stop,"-01")) # add in day
df <- mutate(df, Date = ymd(Month_of_Stop)) # created a date field
df
```
**Plot** a line chart with the number of traffic stops for each month (hint: start with the `count()` function by Date then feed into ggplot. Remember the count variable is named 'n'.).
```{r stops count Date}
df_stops<-df%>%count(Date)
df_stops
```
```{r stops}
df_stops<-df%>%
count(Date)
ggplot(df_stops,aes(Date,n))+geom_line()+scale_x_date(date_breaks = '2 months',date_labels = '%Y%m')
df_stops
```
What is the trend (i.e., long term rate of change) of the number of traffic stops in Charlotte?
There is almost 'V' shape trend between the two peaks,201604 to 201705, where the deep cup was on 201609. From 201601 to 201712, there was around 250 cases of drop. Stops were started to slow down on 201604 for 2 months; and showed kinky increase for further 2 months.There is a steady increase for 5 months. The peak point reached on 201704. Stops were decreased around 1250 cases from end of second quarter till the end of 2017.
**Plot** the same plot but add in `facet_wrap()` by the `Reason_for_Stop` variable.
```{r stops-by-reason count}
df_s_reason<-df%>%count(Reason_for_Stop, Date)
df_s_reason
```
```{r stops-by-reason}
ggplot(df_s_reason,aes(Date, n))+
geom_line()+
facet_wrap(~ Reason_for_Stop)
```
What is a problem with this plot?
Facets divide a plot into subplots based on the values of one or more discrete variables.
1-First problem is, by default, the same scales are used for all panels.
Each subplot has same y-axis value range (0-1500) that does not convenient to show data and the underlying pattern in most of the panels.
2-Second issue is overlapped x-axis values (Date).
To address this problem, you will need to figure out how to adjust the scale. To do this, you need to use R's documentation to see whether there is a parameter in `facet_wrap`.
Go to your RStudio console and type `?facet_wrap`.
What parameter allows you to modify the scales of `facet_wrap`?
1- Scales can be fixed ("fixed", the default), free ("free"), or free in one dimension ("free_x", "free_y")?
By default, the same scales are used for all panels. You can allow scales to vary across the panels with the `scales` argument.
Free scales make it easier to see patterns within each panel, but harder to compare across panels.
Set scales to let axis limits vary across facets
x and y axis limits adjust to individual facets
• "free_x" - x axis limits adjust
• "free_y" - y axis limits adjust
```{r stops by reason, scales free_y, limit x and position }
ggplot(df_s_reason, aes(Date,n))+
geom_line()+
facet_wrap(~ Reason_for_Stop, scales='free_y')+
scale_x_date(limits = as.Date(c('2016-01-01','2017-12-01')))
```
```{r stops by reason, scales free, scale x break and labels}
ggplot(df_s_reason,aes(Date,n))+
geom_line()+
facet_wrap(~ Reason_for_Stop, scales='free')+
scale_x_date(date_breaks = '1 year',date_labels = '%Y')
```
2-Control the number of rows and columns with nrow and ncol
```{r stops-by-reason nrow and ncol}
ggplot(df_s_reason,aes(Date,n))+geom_line(colour='red')+
facet_wrap(~ Reason_for_Stop, nrow = 2, ncol=5, scales='free_y', strip.position = 'top')+
scale_x_date(date_breaks = '1 year',date_labels = '%Y')
```
3-Cosmetics
```{r stops-by-reason nrow and ncol, scale_x_date, scales, strip.position, labs, strip.text, aspect ratio}
ggplot(df_s_reason,aes(Date,n))+geom_line(colour='red')+
facet_wrap(~ Reason_for_Stop, nrow = 2, ncol=5, scales='free_y', strip.position = 'top')+
scale_x_date(date_breaks = '1 year', date_labels = '%Y')+
labs(title = 'CMPD stop reason time series')+
theme(strip.background = element_rect(colour = 'blue', fill='white'),strip.text = element_text(colour = "blue", face = "bold", size =6),
text = element_text(size=10), aspect.ratio = 5/3)
```
4- Vars, labeller also could work if we had multiple variables.
**Plot** the same plot but with a free y-axis scale.
```{r stops-by-reason-2}
ggplot(df_s_reason,aes(Date,n))+geom_line()+facet_wrap(~ Reason_for_Stop, scales='free_y')+scale_x_date(date_breaks = '1 year', date_labels = '%Y')
```
Which type of police stop has had the most volatility (i.e., big swings in number of stops)?
Vehicle Equipment has the most volatility since the reason of police stops changed 3.4 fold (from 150 to 525)in 2016-2017.
What is one problem with allowing the y-axis be free?
By default, the same scales are used for all panels. You can allow scales to vary across the panels with the `scales` argument.
Free scales make it easier to see patterns within each panel, but harder to compare across panels.
In order to see data pattern in y-axis for each panel, we needed free_y scales.
Small multiples tends to be less effective when each of the variables are on different scales or magnitudes.
Let's consider instead CMPD traffic stops but by CMPD division. These are more even spread by division than the type of stop.
**Plot** a line chart (optional points too) for stops by `Date` (x axis) and counts ('n', or whatever you named your count variable) (y axis). (hint: to modify how the date is shown, use the layer `scale_x_date(date_labels = "%Y") +` to show only the year. Feel free to modify by looking at `?scale_x_date`.)
```{r stops-by-division}
df_division<-df%>%count(CMPD_Division,Date)
df_division
```
```{r stops-by-division geom line point nrow ncol}
ggplot(df_division,aes(Date,n))+
geom_line(colour='red')+geom_point()+facet_wrap(~ CMPD_Division, nrow = 2, ncol=7)+
scale_x_date(date_break='1 year',date_labels = "%y")+
labs(title = 'CMPD Division time series')+
theme(strip.background = element_rect(colour = 'blue', fill='white'),strip.text = element_text(colour = "blue", face = "bold", size =6),text = element_text(size=10))
```
What are three observations you can make about the number of police stops by divison? (hint: just write about what's in the data.)
1. Not identified locations marked as the minimum of stops (stops 26, date 20170501) by CMPD Division.
2. Providence Division has the highest peak (stops 495, date 20170701), which was followed by North Tryon Division (stops 483, date 20170201).
3. The number of stops has 7.6 fold (min 64, max 485) and 3.4 fold (min 137, max 468) increase in North Tryon Division, and North Division, respectively.
Next, this doesn't help tell us where these areas are. For that, let's use a shape file to create a chloropleth of stops by division.
## Geography
For this example, we'll create a cholorpleth for the number of police stops by police division.
To do this, we need to use the `sf` package. (For help along the way, see [this tutorial](http://strimas.com/r/tidy-sf/) on `sf` package.)
```{r load-sf}
library(sf); library(viridis)
cmpd <- st_read("./data/CMPD_Police_Divisions/CMPD_Police_Divisions.shp")
```
Note that while we have five files, we only load in the shapefile (`.shp`) file. This is typical but know that to use this file you would need the other four files in the same folder as your shapefile.
**Plot** cmpd using the `geom_sf` package where you provide `fill = DNAME` as the only aesthetic. Add in a title saying "CMPD Divisions" and add the `theme_bw()` theme to make translate the file into the black and white template.
```{r geom_sf, warning=FALSE, message=FALSE}
ggplot(cmpd)+
geom_sf(aes(fill=DNAME))+
labs(title='CMPD Divisions')
```
```{r geom_sf2, warning=FALSE, message=FALSE}
ggplot(cmpd)+
geom_sf(aes(fill=DNAME))+
theme_bw()+
labs(title='CMPD Divisions')
```
One problem with this map is it's hard to read the division names. That is, it may be better to remove the legend and put the labels of each division within the plot.
To do this, we can use the related `geom_sf_label()` geom, using the name of the division as the aesthetic label.
**Plot** the same plot from above but with the name of the division as the label.
1. You'll likely need to reduce the size of the label, using the `size` paramater. You should likely set the `size` to under 2.
2. Make sure to remove the legend (it's redundant and no longer necessary).
3. Create a new variable named `Name` that removes the term " Division". This term is redundant and takes up a lot of space in the labels from `DNAME`. To do this step, use this snippet of code at the top of your pipeline:
```{r eval=F, warning=F, message=F}
cmpd<-cmpd %>%
mutate(Name = as.character(DNAME)) %>%
mutate(Name = str_replace_all(Name, " Division",""))
```
4. Save it as an object named `g`. Make sure to call it once so that the map will output.
```{r eval=F, geom-labels, warning=FALSE, message=FALSE}
g<-ggplot(cmpd)+
geom_sf(aes(fill=Name))+
theme_bw()+
labs(title='CMPD Divisions')+
geom_sf_label(aes(label=Name),size = 1.9)+
theme(legend.position = "none")+
theme(axis.title.x=element_blank())
g
```
## Advanced Plot:
Now, let's create a chloropleth. Below is the code to create an advanced plot.
In this problem, you need to explain what each step below is doing:
- `mutate()`: adds new variables and preserves existing ones.
- `inner_join()`: The mutating joins add columns from y to x, matching rows based on the keys, where inner_join is one of them. inner_join() includes all rows in x and y. In essence, it joins variables from two data frame.
- `mutate()`: creates new variable and preserves existing ones, as executed in the case of 'Name' of CMPD_Division example.
- `geom_sf()`: It is an unusual geom because it will draw different geometric objects depending on what simple features are present in the data: you can get points, lines, or polygons (as executed in Charlotte Mecklenburg division). For text and labels, you can use geom_sf_text() and geom_sf_label().
- `scale_fill_viridis()`: Uses and fills the viridis color scale. Colorful,
Perceptually uniform, Robust to colorblindness,Pretty.
- `labs()`: Change axis labels and legend titles
- `annotation_scale()`:Spatial-aware scalebar annotation. location, width_hint can be used along with it.
- `facet_wrap()`:wraps a 1d sequence of panels into 2d. This is generally a better use of screen space than facet_grid() because most displays are roughly rectangular.Facets divide a plot into subplots based on the values
of one or more discrete variables. It wraps facets into a rectangular layout
- `theme_bw()`: The classic dark-on-light ggplot2 theme. May work better for presentations displayed with a projector.
- `theme()`: (what are each of the options doing in `theme()`?) Themes are a powerful way to customize the non-data components of your plots: i.e. titles, labels, fonts, background, gridlines, and legends. Themes can be used to give plots a consistent customized look.
- `ggsave()`: is a convenient function for saving a plot. It defaults to saving the last plot that you displayed, using the size of the current graphics device. It also guesses the type of graphics device from the extension.
```{r adv-plot}
cmpd_chloropleth <- cmpd %>%
mutate(CMPD_Division = as.character(DNAME)) %>%
inner_join(count(df, CMPD_Division, Date), by = "CMPD_Division") %>%
mutate(Year = lubridate::year(Date)) %>%
ggplot() +
geom_sf(aes(fill = n)) +
scale_fill_viridis("Traffic Stops", labels = scales::comma) +
labs(title = "CMPD Traffic stops by CMPD Division",
caption = "Source: CMPD") +
annotation_scale(location = "bl", width_hint = 0.2) +
facet_wrap(~Year) +
theme_bw() +
theme(legend.position = "bottom",
plot.title = element_text(face = "bold", size = rel(1.5)),
axis.text.x=element_blank(),
axis.text.y=element_blank(),
axis.ticks.x=element_blank(),
axis.ticks.y=element_blank())
cmpd_chloropleth
ggsave(cmpd_chloropleth, filename = "cmpd_chloropleth.pdf",
width = 7, height = 5, units = "in")
ggsave(cmpd_chloropleth, filename = "cmpd_chloropleth.png",
width = 7, height = 5, units = "in")
```
###Bonus (play it for fun)
## Choosing a ggplot extensions
Now, select **one** of the ggextension libraries below and install the package (through CRAN):
- [`ggridges`](https://cran.r-project.org/web/packages/ggridges/vignettes/introduction.html) / [example to recreate](https://cran.r-project.org/web/packages/ggridges/vignettes/introduction.html#varying-fill-colors-along-the-x-axis)^[Run both plots. Make sure to install the `viridis` package or else you'll get an error!]
- [`ggalt`](https://github.com/hrbrmstr/ggalt) / [example to recreate](https://github.com/hrbrmstr/ggalt#horzon-chart)^[Make sure to install `hrbrthemes`!]
- [`ggrepel`](https://github.com/slowkow/ggrepel) / [example to recreate](https://github.com/slowkow/ggrepel/blob/master/vignettes/ggrepel.md#align-labels-on-the-left-or-right-edge)
- [`ggstatsplot`](https://github.com/IndrajeetPatil/ggstatsplot) / [example to recreate](https://github.com/IndrajeetPatil/ggstatsplot#ggscatterstats)^[Run all three examples in the `ggscatterstats` section.]
**Plot** the related example
```{r ggextension-examples 1, ggridges, message=F, warning=F}
# install.packages('ggridges')
library(ggplot2)
library(ggridges)
d <- data.frame(
x = rep(1:5, 3) + c(rep(0, 5), rep(0.3, 5), rep(0.6, 5)),
y = c(rep(0, 5), rep(1, 5), rep(3, 5)),
height = c(0, 1, 3, 4, 0, 1, 2, 3, 5, 4, 0, 5, 4, 4, 1))
ggplot(d, aes(x, y, height = height, group = y, fill = factor(x+y))) +
geom_ridgeline_gradient() +
scale_fill_viridis_d(direction = -1, guide = "none")
```
```{r ggextension-examples 2, ggridges, message=F, warning=F}
ggplot(lincoln_weather, aes(x = `Mean Temperature [F]`, y = Month, fill = stat(x))) +
geom_density_ridges_gradient(scale = 3, rel_min_height = 0.01) +
scale_fill_viridis_c(name = "Temp. [F]", option = "D") +
labs(title = 'Temperatures in Lincoln NE in 2016')
```
```{r lincoln weather, message=F, warning=F}
lincoln_weather
```
Now, with the same package you ran, make a **plot** with that package and the gapminder data. You can choose any of the data frames (i.e., years). Make sure your plot has at least six functions (e.g., ggplot() + geom_point() is two functions and `dplyr` functions count as well.)
```{r gapminder, message=F, warning=F}
# install.packages("gapminder")
library(gapminder)
head(gapminder)
tail(gapminder)
gapminder
```
```{r ggextension-custom 1, warning=F, message=F}
ggplot(gapminder, aes(x=gdpPercap,y=continent, fill = stat(x)))+
geom_density_ridges_gradient(scale = 0.9, rel_min_height = 0.02, show.legend = T) +
scale_fill_viridis_c(name = "continent", option = "C") +
labs(x = "gdpPercap",y = "continent") +
ggtitle("gdpPercap vs Continent") +
theme(plot.title = element_text(hjust = 0.5))
```
```{r ggextension-custom 2, warning=F, message=F}
ggplot(gapminder[gapminder$year==2007,], aes(gdpPercap, year))+
geom_density_ridges(rel_min_height = 0.005) +
scale_y_discrete(expand = c(0.01, 0)) +
scale_x_continuous(expand = c(0.01, 0)) +
theme_ridges()+
labs(x = "gdpPercap",y = "year 2007") +
ggtitle("gdpPercap vs 2007 year") +
theme(plot.title = element_text(hjust = 0.5),
axis.text.y=element_blank(),axis.ticks=element_blank())
```
```{r ggextension-custom 3, warning=F, message=F}
ggplot(gapminder[gapminder$year==2007,], aes(gdpPercap, year))+
geom_point(color='blue')+
labs(x = "gdpPercap",y = "continent") +
ggtitle("gdpPercap vs Continent") +
theme(plot.title = element_text(hjust = 0.5))+
facet_wrap(~continent)+
scale_y_continuous(breaks=seq(0, 2007, 2007))
```
```{r ggextension-custom 4, warning=F, message=F}
ggplot(gapminder, aes(gdpPercap, lifeExp, label=country, color=continent))+
geom_point(position = position_jitter(), size=0.5, alpha=1)+
labs(x = "gdpPercap",y = "lifeExp") +
ggtitle("gdpPercap vs lifeExp") +
theme(plot.title = element_text(hjust = 0.5))+
coord_flip()+
facet_wrap(~continent)+
scale_x_continuous(breaks=seq(0, 120000, 30000))# Ticks from 0-120000, every 30000)
```
```{r ggextension-custom 5, warning=F, message=F}
ggplot(
filter(gapminder, year==2007, gdpPercap>35000), aes(gdpPercap, lifeExp, label = country, color=continent))+
geom_point( color = "red") +
facet_wrap(~continent) +
theme_bw() +
coord_flip() +
theme(legend.position="none")+
labs(title = "2007 High Quality Life")
```
```{r ggextension-custom 6, warning=F, message=F}
ggplot(
filter(gapminder, year==2007, gdpPercap<6000), aes(gdpPercap, lifeExp, label = country, color=continent))+
geom_point( color = "red") +
facet_wrap(~continent) +
theme_bw() +
coord_flip() +
theme(legend.position="none")+
labs(title = "2007 Low Quality Life")
```
```{r ggextension-custom 7, warning=F, message=F}
Am<-subset(gapminder,continent=='Americas')
ggplot(Am, aes(lifeExp, country, fill = stat(x)))+
geom_density_ridges_gradient(scale =3, rel_min_height = 0.01, show.legend = T)+
scale_fill_viridis_c(name = "lifeExp", option = "C") +
theme(plot.title = element_text(face = "bold", size = rel(1.5)),
axis.text.y=element_text(size=8))+
labs(x = "lifeExp",y = "Americas") +
ggtitle("lifeExp @ America") +
scale_x_continuous(breaks=seq(0, 100, 20))# Ticks from 0-100, every 20
```
```{r ggextension-custom 8, warning=F, message=F}
ggplot(Am, aes(gdpPercap, country, fill = stat(x)))+
geom_density_ridges_gradient(scale =4, rel_min_height = 0.02, show.legend = T)+
scale_fill_viridis_c(name = "gdpPercap", option = "C") +
theme(plot.title = element_text(face = "bold", size = rel(1.5)),
axis.text.y=element_text(size=8))+
labs(x = "gdpPercap",y = "Americas") +
ggtitle("gdpPercap @ America") +
scale_x_continuous(breaks=seq(0, 50000, 10000))
```
```{r ggextension-custom 9, warning=F, message=F}
Om<-subset(gapminder,continent=='Oceania')
ggplot(Om, aes(lifeExp, country, fill = stat(x)))+
geom_density_ridges_gradient(scale =1, rel_min_height = 0.01, show.legend = T)+
scale_fill_viridis_c(name = "lifeExp", option = "C") +
theme(plot.title = element_text(face = "bold", size = rel(1.5)),
axis.text.y=element_text(size=8))+
labs(x = "lifeExp",y = "Oceania") +
ggtitle("lifeExp @ Oceania") +
scale_x_continuous(breaks=seq(0, 100, 20))# Ticks from 0-100, every 20
```
```{r summary gapminder}
summary(Om)
```
```{r summary Am}
summary (Am)
```
Describe what you have found using that plot (write at least 3 sentences):
Ridgelines drawn based on data density estimation;
1. Graph#1 shows that Oceania continent had the highest gdpPercapita.
2. Graph#2 shows that majority had <20000 gdpPercap in 2007.
3. Graph#3 shows that Africa countries have the lowest gdpPercap (<15000) in 2007.
4. Graph#4 shows that countries in Asia and Europe have the highest gdpPercap.
5. Graph#5 shows countries that have high lifeExp and gdpPercap (>35000) in 2007.
6- Graph#6 shows that countries that have low lifeExp and gdpPercap (<6000) mostly consolidated in Africa.
7- Graph#7 shows that USA&Canada have the highest lifeExp in America continent.
8- Graph#8 shows that USA has the highest gdpPercapita in America continent.
9- Graph#9 Australia seems to have very very slightly higher lifeExp than New Zealand in Oceania continent.
10- Overall in 2007 data, America: USA, Canada, Asia: Kuwait, Singapore, Hong Kong, Europe: Norway, Ireland, Swirzerland.. have the highest life quality in terms of high gdpPercap and lifeExp
11-Overall in 2007 data, Africa: Congo, America: Haiti USA, Canada, Asia: Afghanistan, Europe: Albania have the lowest life quality in terms of low gdpPercap and lifeExp
12- geom_density_ridges_gradient and plotly was not compatible.
## Intro to interactivity
For even more fun, **plot** an interactive HTML plot using the code for any of the plots above (fair warning, some of the ggextensions may not work well).
The easiest way to do this is to use the `plotly` package (install it with the "Packages" panel in RStudio), and then to use its `ggplotly()` function.
I've given you some commented-out code below (commented out so that R doesn't yell at you about the code not working when you knit).
Also, check out [the documentation](https://plot.ly/ggplot2/), especially [this page](https://plot.ly/ggplot2/interactive-tooltip/) about customizing the tooltips that show up when you hover over points or areas.
```{r interactive, message=FALSE, warning=F}
# devtools::install_github("ropensci/plotly")
library(plotly)
```
```{r interactive 1, message=FALSE, warning=F}
my_cool_plot <-ggplot(gapminder, aes(gdpPercap, lifeExp, label=country, color=continent))+
geom_point(position = position_jitter(), size=0.5, alpha=1)+
labs(x = "gdpPercap",y = "lifeExp") +
ggtitle("gdpPercap vs lifeExp") +
theme(plot.title = element_text(hjust = 0.5))+
coord_flip()+
facet_wrap(~continent)+
scale_x_continuous(breaks=seq(0, 120000, 30000))# Ticks from 0-120000, every 30000)
my_cool_plot
```
```{r plotly, my_cool_plot, message=F, warning=F}
ggplotly(my_cool_plot)
```
```{r interactive 2, message=FALSE, warning=F}
my_cool_plot2<-ggplot(
filter(gapminder, year==2007, gdpPercap>35000), aes(gdpPercap, lifeExp, label = country, color=continent))+
geom_point( color = "red") +
facet_wrap(~continent) +
theme_bw() +
coord_flip() +
theme(legend.position="none")+
labs(title = "2007 High Quality Life")
my_cool_plot2
```
```{r plotly, my_cool_plot 2, message=F, warning=F}
ggplotly(my_cool_plot2)
```
```{r interactive 3, message=FALSE, warning=F}
my_cool_plot3<-ggplot(
filter(gapminder, year==2007, gdpPercap<6000), aes(gdpPercap, lifeExp, label = country, color=continent))+
geom_point( color = "red") +
facet_wrap(~continent) +
theme_bw() +
coord_flip() +
theme(legend.position="none")+
labs(title = "2007 Low Quality Life")
my_cool_plot3
ggplotly(my_cool_plot3)
```
```{r interactive 4, message=FALSE, warning=F}
my_cool_plot4<-ggplot(df_division,aes(Date,n))+
geom_line(colour='red')+geom_point()+facet_wrap(~ CMPD_Division, nrow = 2, ncol=7)+
scale_x_date(date_break='1 year',date_labels = "%y")+
labs(title = 'CMPD Division time series')+
theme(strip.background = element_rect(colour = 'blue', fill='white'),strip.text = element_text(colour = "blue", face = "bold", size =6),text = element_text(size=10))
my_cool_plot4
ggplotly(my_cool_plot4)
```
```{r tinytex, message=F, warning=F}
# install.packages('tinytex')
# tinytex::install_tinytex()
```
```{r library tinytex, message=F, warning=F}
library(tinytex)
```
```{r webshot, message=F, warning=F}
# install.packages("webshot")
# webshot::install_phantomjs()
```
***