forked from hadley/mastering-shiny
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathbasic-case-study.Rmd
267 lines (180 loc) · 14.5 KB
/
basic-case-study.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
# Case study: emergency room injuries {#basic-case-study}
```{r include=FALSE}
source("common.R")
options(tibble.print_min = 6, tibble.print_max = 6)
```
## Introduction
I've introduced you to a bunch of new concepts in the last three chapters. So to help them sink in, we'll now walk through a richer Shiny app that explores a fun dataset and pulls together many of the ideas that you've seen so far. We'll start by doing a little data analysis outside of Shiny, then turn it into an app, starting simply, then progressively layering on more detail.
In this chapter, we'll supplement Shiny with vroom (for fast file reading) and the tidyverse (for general data analysis).
```{r setup, message = FALSE}
library(shiny)
library(vroom)
library(tidyverse)
```
## The data
We're going to explore data from the National Electronic Injury Surveillance System (NEISS), collected by the Consumer Product Safety Commission. This is a long-term study that records all accidents seen in a representative sample of hospitals in the United States. It's an interesting dataset to explore because every one is already familiar with the domain, and each observation is accompanied by a short narrative that explains how the accident occurred. You can find out more about this dataset at <https://github.com/hadley/neiss>
In this chapter, I'm going to focus on just the data from 2017. This keeps the data small enough (~10 meg) that it's easy to store in git (along with the rest of the book), which means we don't need to think about sophisticated strategies for importing the data quickly (we'll come back to those later in the book). You can see the code I used to create the extract for this chapter at <https://github.com/hadley/mastering-shiny/blob/master/neiss/data.R>.
The main dataset we'll use is `injuries`, which contains around 250,000 observations:
```{r, message = FALSE}
injuries <- vroom::vroom("neiss/injuries.tsv.gz")
injuries
```
Each row represents a single accident with 10 variables:
* `trmt_date` is date the person was seen in the hospital (not when the
accident occured).
* `age`, `sex`, and `race` give demographic information about the person
who experienced the accident.
* `body_part` is the location of the injury on the body (like ankle or ear);
`location` is the place where the accident occurred (like home or school).
* `diag` gives the basic diagnosis of the injury (like fracture or laceration).
* `prod_code` is the primary product associated with the injury.
* `weight` is statistical weight giving the estimated number of people who
would suffer this injury if this dataset was scaled to the entire population
of the US.
* `narrative` is a brief story about how the accident occurred.
We'll pair it with two other data frames for additional context: `products` lets us look up the product name from the product code, and `population` tells us the total US population in 2017 for each combination of age and sex.
```{r, message = FALSE}
products <- vroom::vroom("neiss/products.tsv")
products
population <- vroom::vroom("neiss/population.tsv")
population
```
## Exploration
Before we create the app, let's explore the data a little. We'll start by looking at the product associated with the most injuries: 1842, "stairs or steps". First we'll pull out the injuries associated with this product:
```{r}
selected <- injuries %>% filter(prod_code == 1842)
nrow(selected)
```
Next we'll perform some basic summaries looking at the diagnosis, body part, and location where the injury occured. Note that I weight by the `weight` variable so that the counts can be interpreted as estimated total injuries across the whole US.
```{r}
selected %>% count(diag, wt = weight, sort = TRUE)
selected %>% count(body_part, wt = weight, sort = TRUE)
selected %>% count(location, wt = weight, sort = TRUE)
```
As you might expect, steps are most often associated with sprains, strains and fractures, of the ankle, occurring at home.
We can also explore the pattern across age and sex. We have enough data here that a table is not that useful, and so I make a plot, Figure \@ref(fig:stairs-raw), that makes the patterns more obvious.
```{r stairs-raw, out.width = "100%", fig.asp = 1/2, fig.cap = "Estimated number of injuries caused by stairs, broken down by age and sex"}
summary <- selected %>%
count(age, sex, wt = weight)
summary
summary %>%
ggplot(aes(age, n, colour = sex)) +
geom_line() +
labs(y = "Estimated number of injuries")
```
We see a big spike when children are learning to walk, a flattening off over middle age, and then a gradual decline after age 50. Interestingly, the number of injuries is much higher for women. (Maybe this is due to high-heeled shoes?)
One problem with interpreting this pattern is that we know that there are fewer older people than younger people, so the population available to be injured is smaller. We can control for this by comparing the number of people injured with the total population and calculating an injury rate. Here I use a rate per 10,000.
```{r}
summary <- selected %>%
count(age, sex, wt = weight) %>%
left_join(population, by = c("age", "sex")) %>%
mutate(rate = n / population * 1e4)
summary
```
Plotting the rate, Figure \@ref(fig:stairs-rate), yields a strikingly different trend after age 50: while the number of injuries decreases, the *rate* of injuries continues to increase.
```{r stairs-rate, out.width = "100%", fig.asp = 1/2, fig.cap = "Estimated rate of injuries per 10,000 people, broken down by age and sex"}
summary %>%
ggplot(aes(age, rate, colour = sex)) +
geom_line(na.rm = TRUE) +
labs(y = "Injuries per 10,000 people")
```
(Note that the rates only go up to age 80 because I couldn't find population data for ages over 80.)
Finally, we can look at some of the narratives. Browsing through these is an informal way to check our hypotheses, and generate new ideas for further exploration. Here I pull out a random sample of 10:
```{r, echo = FALSE}
# Ensure that narratives don't overflow
if (knitr::is_latex_output()) {
pad <- nchar("#> [1] ''")
selected$narrative <- str_trunc(selected$narrative, width = 81 - pad)
}
```
```{r}
selected %>%
sample_n(10) %>%
pull(narrative)
```
Having done this exploration for one product, it would be be very nice if we could easily do it for other products, without having to retype the code. So let's make a Shiny app!
## Prototype
When building a complex app, I strongly recommend starting as simple as possible, so that you can confirm the basic mechanics work before you start doing something more complicated. Here I'll start with one input (the product code), three tables, and one plot.
When making a first prototype, the challenge is the "as simple _as possible_". There's a tension between getting the basics working quickly and planning for the future of the app. Either extreme can be bad: if you design too narrowly, you'll spend a lot of time later on reworking your app; if you design too rigorously, you'll spend a bunch of time writing code that later ends up on the cutting floor. To help get the balance right, I often do a few pencil-and-paper sketches to rapidly explore the UI and reactive graph before committing to code.
Here I decided to have one row for the inputs (accepting that I'm probably going to add more inputs before this app is done), one row for all three tables (giving each table 4 columns, 1/3 of the 12 column width), and then one row for the plot:
```{r code = section_get("neiss/prototype.R", "ui")}
```
Note the use of `setNames()` in the `selectInput()` `choices`: this shows the product name in the UI and returns the product code to the server.
The server function is relatively straightforward. I first convert the `selected` and `summary` variables (defined above) to reactive expressions. This is a reasonable general pattern: you create variables in your data analysis to decompose the analysis into steps, and to avoid recomputing things multiple times, and reactive expressions play the same role in Shiny apps.
Often it's a good idea to spend a little time cleaning up your analysis code before you start your Shiny app, so you can think about these problems in regular R code, before you add the additional complexity of reactivity.
```{r code = section_get("neiss/prototype.R", "server")}
```
Note that creating the `summary` reactive isn't strictly necessary here, as it's only used by a single reactive consumer. But it's good practice to keep computing and plotting separate as it makes the flow of the app easier to understand, and will make it easier to generalise in the future.
A screenshot of the resulting app is shown in Figure \@ref(fig:prototype). You can find the source code at <https://github.com/hadley/mastering-shiny/tree/master/neiss/prototype.R> and try out a live version of the app at XYZ.
```{r, eval = FALSE, include = FALSE}
app <- testAppFromFile("neiss/prototype.R")
app_screenshot(app, "basic-case-study/prototype", width = 800, height = 600)
```
```{r prototype, out.width = NULL, echo = FALSE, out.width = "100%", fig.cap="First prototype of NEISS exploration app"}
knitr::include_graphics("screenshots/basic-case-study/prototype.png", dpi = screenshot_dpi())
```
## Polish tables
Now that we have the basic components in place and working, we can progressively improve our app. The first problem with this app is that it shows a lot of information in the tables, where we probably just want the highlights. To fix this we need to first figure out how to truncate the tables. I've chosen to do that with a combination of forcats functions: I convert the variable to a factor, order by the frequency of the levels, and then lump together all levels after the top 5.
```{r}
injuries %>%
mutate(diag = fct_lump(fct_infreq(diag), n = 5)) %>%
group_by(diag) %>%
summarise(n = as.integer(sum(weight)))
```
Because I knew how to do it, I wrote a little function to automate this for any variable. The details aren't really important here; and don't worry if this looks totally foreign: you could also solve the problem via copy and paste.
```{r, code = section_get("neiss/polish-tables.R", "count_top")}
```
I then use this in the server function:
```{r, code = section_get("neiss/polish-tables.R", "tables"), eval = FALSE}
```
I made one other change to improve the aesthetics of the app: I forced all tables to take up the maximum width (i.e. fill the column that they appear in). This makes the output more aesthetically pleasing because it reduces the amount of incidental variation.
A screenshot of the resulting app is shown in Figure \@ref(fig:polish-tables). You can find the source code at <https://github.com/hadley/mastering-shiny/tree/master/neiss/polish-tables.R> and try out a live version of the app at XYZ.
```{r, eval = FALSE, include = FALSE}
app <- testAppFromFile("neiss/polish-tables.R")
app_screenshot(app, "basic-case-study/polish-tables", width = 800, height = 600)
```
```{r polish-tables, out.width = NULL, echo = FALSE, out.width = "100%", fig.cap="The second iteration of the app improves the display by only showing the most frequent rows in the summary tables"}
knitr::include_graphics("screenshots/basic-case-study/polish-tables.png", dpi = screenshot_dpi())
```
## Rate vs count
So far, we're displaying only a single plot, but we'd like to give the user the choice between visualising the number of injuries or the population-standardised rate. First I add a control to the UI. Here I've chosen to use a `selectInput()` because it makes both states explicit, and it would be easy to add new states in the future:
```{r, code = section_get("neiss/rate-vs-count.R", "first-row"), eval = FALSE}
```
(I default to `rate` because I think it's safer; you don't need to understand the population distribution in order to correctly interpret the plot.)
Then I condition on that input when generating the plot:
```{r, code = section_get("neiss/rate-vs-count.R", "plot"), eval = FALSE}
```
A screenshot of the resulting app is shown in Figure \@ref(fig:rate-vs-count). You can find the source code at <https://github.com/hadley/mastering-shiny/tree/master/neiss/rate-vs-count.R> and try out a live version of the app at XYZ.
```{r, eval = FALSE, include = FALSE}
app <- testAppFromFile("neiss/rate-vs-count.R")
app_screenshot(app, "basic-case-study/rate-vs-count", width = 800, height = 600)
```
```{r rate-vs-count, out.width = NULL, echo = FALSE, out.width = "100%", fig.cap = "In this iteration, we give the user the ability to switch between displaying the count or the population standardised rate on the y-axis."}
knitr::include_graphics("screenshots/basic-case-study/rate-vs-count.png", dpi = screenshot_dpi())
```
## Narrative
Finally, I want to provide some way to access the narratives because they are so interesting, and they give an informal way to cross-check the hypotheses you come up with when looking at the plots. In the R code, I sample multiple narratives at once, but there's no reason to do that in an app where you can explore interactively.
There are two parts to the solution. First we add a new row to the bottom of UI. I use an action button to trigger a new story, and put the narrative in a `textOutput()`:
```{r, code = section_get("neiss/narrative.R", "narrative-ui"), eval = FALSE}
```
The result of an action button is an integer that increments each time it's clicked. Here I just use it to trigger a re-execution of the random selection:
```{r, code = section_get("neiss/narrative.R", "narrative-server"), eval = FALSE}
```
A screenshot of the resulting app is shown in Figure \@ref(fig:narrative). You can find the source code at <https://github.com/hadley/mastering-shiny/tree/master/neiss/narrative.R> and try out a live version of the app at XYZ.
```{r, eval = FALSE, include = FALSE}
app <- testAppFromFile("neiss/narrative.R")
app_screenshot(app, "basic-case-study/narrative", width = 800, height = 600)
```
```{r narrative, out.width = NULL, echo = FALSE, out.width = "100%", fig.cap = "The final iteration adds the ability to pull out a random narrative from the selected rows"}
knitr::include_graphics("screenshots/basic-case-study/narrative.png", dpi = screenshot_dpi())
```
## Exercises
1. Draw the reactive graph for each app.
1. What happens if you flip `fct_infreq()` and `fct_lump()` in the code that
reduces the summary tables?
1. Add an input control that lets the user decide how many rows to show in the
summary tables.
1. Provide a way to step through every narrative systematically with forward
and backward buttons.
Advanced: Make the list of narratives "circular" so that advancing
forward from the last narrative takes you to the first.