Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processing and accessing mobility data #2

Closed
ffinger opened this issue Apr 6, 2020 · 16 comments
Closed

Processing and accessing mobility data #2

ffinger opened this issue Apr 6, 2020 · 16 comments
Labels
high_complexity Requires multiple people and/or specialists to complete. high_priority Urgent for COVID19 analytics new_package Create a new R package

Comments

@ffinger
Copy link
Collaborator

ffinger commented Apr 6, 2020

Google has started providing mobility statistics derived from mobility data from smartphone users: https://www.google.com/covid19/mobility/.

Description

For public health officials and researchers it will be crucial to be able to follow the evolution of those mobility indicators through time and to integrate them in automated analyses. A means to automatically access those indicators (and their evolution) from within R code is thus needed.

  1. See if google provides an API to automatically access indicators
    • If yes: go to 2.
    • If no, examine pdf reports to see if indicators can be extracted automatically
  2. provide an R package that makes the indicators available from within R in tabular form through API or by extracting values from pdf reports

Output

The proposed output is a data frame in long format with columns for country, date, indicator and values.

Impact

This will allow to follow mobility indicators automatically from within R code, which will allow for analysis of (for example) the impact of mobility reduction on transmissibility.

Proposed Timeline

First version of package available on Apr 10.

Focal Point

@ffinger

Links

Could be integrated into https://github.com/epiforecasts/NCoVUtils or live as a separate package.

@ffinger ffinger added low_complexity Can be completed by 1 person in a few hours. low_priority Useful for COVID19 analytics medium_priority Essential for COVID19 analytics new_package Create a new R package and removed low_priority Useful for COVID19 analytics labels Apr 6, 2020
@ffinger ffinger changed the title Help develop a tool to make google mobility indicators available Tool to access google mobility indicators from R Apr 6, 2020
@PaulC91
Copy link

PaulC91 commented Apr 6, 2020

data extraction from pdf to csv with python:
https://github.com/vitorbaptista/google-covid19-mobility-reports

@noamross
Copy link
Collaborator

noamross commented Apr 6, 2020

These mobility data are generally far more useful as a time series, especially going backwards, so that modelers can calibrate their models to the change in mobility that occurred over the past month(s). So the task includes either

  • Digitizing the graphs from these data
  • Extracting data from a similar source with historical values. Cuebiq has published a dashboard that includes weekly mobility changes for U.S. counties but raw data is not available: https://www.cuebiq.com/visitation-insights-covid19/

@jennybc
Copy link

jennybc commented Apr 6, 2020

If this becomes an Google API wrapping project, I might be of use. Unfortunately, it currently looks like a PDF parsing project, at least re: data ingest.

@PaulC91
Copy link

PaulC91 commented Apr 6, 2020

@jennybc yeah I've heard there is 0% chance of google opening an API to this unfortunately. @noamross as far as I can tell the headline figure accompanying each graph is the latest figure on the time axis, so it would just be a case of extracting this every day to start to build a time-series.

@noamross
Copy link
Collaborator

noamross commented Apr 6, 2020

I note there's an R workflow for parsing the PDF using pdftools, as well, but only for the headline figures: https://github.com/mattkerlogue/google-covid-mobility-scrape

@noamross noamross changed the title Tool to access google mobility indicators from R Processing and accessing mobility data Apr 6, 2020
@noamross noamross added high_complexity Requires multiple people and/or specialists to complete. high_priority Urgent for COVID19 analytics and removed low_complexity Can be completed by 1 person in a few hours. medium_priority Essential for COVID19 analytics labels Apr 6, 2020
@PaulC91
Copy link

PaulC91 commented Apr 6, 2020

full extraction including trend lines in R!
https://github.com/nacnudus/google-location-coronavirus

@david-jankoski
Copy link

+1 for this great repo and initiative!
searching if google's api is public i stumbled upon this
https://github.com/pastelsky/covid-19-mobility-tracker
might be useful - though i see @PaulC91 has a nice working version of it in R 👏 awesome!

@noamross
Copy link
Collaborator

noamross commented Apr 7, 2020

If someone could extend @nacnudus's repo/script/data above to include the county-level U.S. data it would be an enormous help.

@nacnudus
Copy link

nacnudus commented Apr 7, 2020

See also this one by the UK Office for National Statistics.
https://github.com/datasciencecampus/mobility-report-data-extractor

@nacnudus
Copy link

nacnudus commented Apr 7, 2020

Data problems have been fixed 🤞 https://github.com/nacnudus/google-location-coronavirus

@ffinger
Copy link
Collaborator Author

ffinger commented Apr 8, 2020

The solution proposed by @nacnudus seems to solve the issue.
Thanks a lot!

Remaining:

  • extension of county level us data as mentioned by @noamross above
  • check if the solution still works on googles next release of mobility reports

Anything else?

@nacnudus
Copy link

nacnudus commented Apr 8, 2020

US counties now in the "region" file https://github.com/nacnudus/google-location-coronavirus/blob/master/2020-03-29-region.tsv. It hasn't been checked extensively though, so take care.

@ffinger
Copy link
Collaborator Author

ffinger commented Apr 9, 2020

Great, thanks a lot @nacnudus.

@ffinger
Copy link
Collaborator Author

ffinger commented Apr 10, 2020

@nacnudus, is your solution still working with the new mobility reports published yesterday?
I haven't seen any movement on your repo.
If so I will assume this issue to be solved.

@nacnudus
Copy link

nacnudus commented Apr 10, 2020

@ffinger, it seems to have run fine on the new reports.

@ffinger
Copy link
Collaborator Author

ffinger commented Apr 12, 2020

Brilliant, will close the issue then.

@ffinger ffinger closed this as completed Apr 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high_complexity Requires multiple people and/or specialists to complete. high_priority Urgent for COVID19 analytics new_package Create a new R package
Projects
None yet
Development

No branches or pull requests

6 participants