Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

forecast ILI #23

Merged
merged 29 commits into from
Dec 23, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
8c495f7
ili forecast in scratch #7
stephenturner Dec 21, 2021
f68006c
first draft on function to forecast ILI #7
stephenturner Dec 21, 2021
df64e62
add constrained parameter, additional examples, and extracted arima p…
stephenturner Dec 21, 2021
a804efc
better plot in examples
stephenturner Dec 21, 2021
8bf0eda
add glm script from fluforce-init into scratch
stephenturner Dec 21, 2021
e8f89f7
replace stuff at top with new functions
stephenturner Dec 21, 2021
96f28ce
Merge branch 'main' into 7-forcast-ili
stephenturner Dec 21, 2021
93df399
updated docs
stephenturner Dec 21, 2021
cb401dd
using forecasts instead of hardcoded
stephenturner Dec 21, 2021
6e96e74
fix merge conflicts
stephenturner Dec 21, 2021
03d60e1
replace all the fable stuff with package functions, and replace hard-…
stephenturner Dec 21, 2021
782f98c
ili-forecast.R is now in workflow.R
stephenturner Dec 21, 2021
d66db66
allow key column to be specified with {{}}
stephenturner Dec 21, 2021
833029c
fix documentation for key variable in make_tsibble
stephenturner Dec 21, 2021
57603f2
small updates to prep for state-level keyed forecasts
stephenturner Dec 21, 2021
3610572
Merge pull request #28 from signaturescience/7-forcast-ili-states
Dec 21, 2021
304e47e
Merge branch 'main' into 7-forcast-ili
stephenturner Dec 21, 2021
ed46111
first cut at states
stephenturner Dec 21, 2021
e69998a
state-level forecasting
stephenturner Dec 21, 2021
3bd3a26
Merge pull request #31 from signaturescience/7-forcast-ili-states
Dec 21, 2021
d268541
updating workflow but fails with metric colunn error I can't debug. @…
stephenturner Dec 22, 2021
8d85d50
fixing ILI workflow script
vpnagraj Dec 22, 2021
cc3c00a
plots
stephenturner Dec 22, 2021
558e3f9
week/year -> epiweek/epiyear
stephenturner Dec 22, 2021
fa2de93
Merge pull request #32 from signaturescience/epiyearweek
Dec 22, 2021
4683e83
include 2019-2020 flu season so we get early 2020 data
stephenturner Dec 22, 2021
366b01f
sensitivity demonstration script
stephenturner Dec 22, 2021
b528501
adding option to forecast_ili for named param_space list
vpnagraj Dec 22, 2021
e11a52c
specify q
stephenturner Dec 23, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ RoxygenNote: 7.1.2
Imports:
cdcfluview,
dplyr,
fable,
fabletools,
lubridate,
magrittr,
MMWRweek,
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Generated by roxygen2: do not edit by hand

export("%>%")
export(forecast_ili)
export(get_cdc_hosp)
export(get_cdc_ili)
export(get_cdc_vax)
Expand Down
9 changes: 9 additions & 0 deletions R/fiphde.R
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,8 @@ if(getRversion() >= "2.15.1") utils::globalVariables(c(".",
"sea_label",
"monday",
"yweek",
"weighted_ili",
".mean",
"0.025",
"0.975",
"<NA>",
Expand All @@ -63,5 +65,12 @@ if(getRversion() >= "2.15.1") utils::globalVariables(c(".",
"truth",
"rmse",
"value",
"unweighted_ili",
"ili",
"miss",
"total",
"pmiss",
"arima",
"x",
"."))

145 changes: 145 additions & 0 deletions R/forecast.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
#' @title Forecast ILI
#' @description Forecasts ILI up to specified weeks in the future. Used in downstream modeling.
#' @details Currently limited to one location only.
#' @param ilidat Data returned from [get_cdc_ili].
#' @param horizon Optional horizon periods through which the forecasts should be generated; default is `4`
#' @param trim_date Earliest start date you want to use for ILI data. Default `NULL` doesn't trim.
#' @param constrained Should the model be constrained to a non-seasonal model? If `TRUE` the parameter space defined in "param_space" argument will be used. See [fable::ARIMA].
#' @param param_space Named list for ARIMA parameter space constraint; only used if "constrained == `TRUE`"; default is `list(P=0,D=0,Q=0,p=1:2,d=0:2,0)`, which sets space to PDQ(0,0,0) and pdq(1:2,0:2,0).
#' @return A named list containing:
#' 1. `ilidat`: The data sent into the function filtered to the location and the `trim_date`. Select columns returned.
#' 1. `ilidat_tsibble`: The `tsibble` class object returned by running [make_tsibble] on the data above.
#' 1. `ili_fit`: The fit from [fabletools::model].
#' 1. `ili_forecast`: The forecast from [fabletools::forecast] at the specified horizon.
#' 1. `ili_future`: The `horizon`-number of weeks of ILI data forecasted into the future.
#' 1. `ili_bound`: The data in 1 bound to the data in 5.
#' 1. `arima_params`: A tibble with ARIMA model parameters for each location.
#' 1. `locstats`: A tibble with missing data information on all locations.
#' 1. `removed`: A tibble with locations removed because of high missing ILI data.
#' @examples
#' \dontrun{
#' # Get data
#' ilidat <- get_cdc_ili(region=c("national", "state"), years=2010:lubridate::year(lubridate::today()))
#'
#' # Using data only from march 2020 forward, for US only
#' ilidat_us <- ilidat %>% dplyr::filter(location=="US")
#' ilifor_us <- forecast_ili(ilidat_us, horizon=4L, trim_date="2020-03-01")
#' ilifor_us$ili_fit
#' ilifor_us$arima_params
#' ilifor_us$ili_forecast
#' head(ilifor_us$ili_bound)
#' tail(ilifor_us$ili_bound, 10)
#' # Plot
#' library(dplyr)
#' library(ggplot2)
#' theme_set(theme_classic())
#' ilifor_us$ili_bound %>%
#' mutate(date=cdcfluview::mmwr_week_to_date(epiyear, epiweek)) %>%
#' filter(date>"2021-03-01") %>%
#' ggplot(aes(date, ili)) +
#' geom_line(lwd=.3, alpha=.5) +
#' geom_point(aes(col=forecasted), size=2)
#'
#' # At the state level
#' ilifor_st <- forecast_ili(ilidat, horizon=4L, trim_date="2020-03-01")
#' ilifor_st$ili_fit
#' ilifor_st$arima_params
#' ilifor_st$ili_forecast
#' head(ilifor_us$ili_bound)
#' tail(ilifor_us$ili_bound, 10)
#' # Plot
#' library(dplyr)
#' library(ggplot2)
#' theme_set(theme_classic())
#' ilifor_st$ili_bound %>%
#' mutate(date=cdcfluview::mmwr_week_to_date(epiyear, epiweek)) %>%
#' filter(date>"2021-08-01") %>%
#' ggplot(aes(date, ili, col=forecasted)) +
#' geom_line(lwd=.3) +
#' geom_point(aes(col=forecasted), size=.7) +
#' facet_wrap(~abbreviation, scale="free_y")
#' }
#' @export
forecast_ili <- function(ilidat, horizon=4L, trim_date=NULL, constrained=TRUE, param_space = list(P=0,D=0,Q=0,p=1:2,d=0:2,q=0)) {

# If trim_date is not null, trim to selected trim_date
if (!is.null(trim_date)) {
ilidat <-
ilidat %>%
dplyr::filter(week_start > as.Date(trim_date, format = "%Y-%m-%d"))
}

# Select just the columns you care about, and call "ili" the measure you're using
ilidat <-
ilidat %>%
dplyr::select(location, epiyear, epiweek, ili=unweighted_ili)

# Get missing data rates
locstats <- ilidat %>%
dplyr::group_by(location) %>%
dplyr::summarize(miss=sum(is.na(ili)), total=dplyr::n()) %>%
dplyr::mutate(pmiss=miss/total) %>%
dplyr::arrange(dplyr::desc(pmiss)) %>%
dplyr::mutate(remove=pmiss>.1)

# Get locations that will be removed
removed <- locstats %>%
dplyr::filter(remove) %>%
dplyr::inner_join(locations, by="location")
if(nrow(removed)>0) message(sprintf("Removed %s row(s) because of missing data. See result$removed.", nrow(removed)))

# Remove those locations
ilidat <- locstats %>%
dplyr::filter(!remove) %>%
dplyr::distinct(location) %>%
dplyr::inner_join(ilidat, by="location")

## make a tsibble. do not chop the last week - because this is weekly data we won't have an incomplete final week
ilidat_tsibble <-
ilidat %>%
make_tsibble(epiyear = epiyear, epiweek = epiweek, key=location, chop=FALSE)

# Defaults to constrained, non-seasonal model.
if (constrained) {
# Nonseasonal fit: PDQ(0, 0, 0)
# Nonseasonal components unrestricted: pdq(0:5,0:5,0:5)
message("Fitting nonseasonal constrained ARIMA model...")
ili_fit <- fabletools::model(ilidat_tsibble,
arima = fable::ARIMA(ili ~ PDQ(param_space$P,param_space$D,param_space$Q) + pdq(param_space$p,param_space$d, param_space$q),
stepwise=FALSE,
approximation=FALSE))
} else {
# If unconstrained, need to set stepwise=TRUE and approxmiation=NULL to speed up.
message("Fitting unconstrained ARIMA model...")
ili_fit <- fabletools::model(ilidat_tsibble,
arima = fable::ARIMA(ili,
stepwise=TRUE,
approximation=NULL))
}

arima_params <-
ili_fit %>%
dplyr::mutate(x=purrr::map(arima, ~purrr::pluck(., "fit") %>% purrr::pluck("spec"))) %>%
tidyr::unnest_wider(col=x) %>%
dplyr::select(-arima)

# Get the forecast
ili_forecast <- fabletools::forecast(ili_fit, h=horizon)

# Get the next #horizon weeks in a tibble
ili_future <- ili_forecast %>%
tibble::as_tibble() %>%
dplyr::mutate(epiyear=lubridate::epiyear(yweek)) %>%
dplyr::mutate(epiweek=lubridate::epiweek(yweek)) %>%
dplyr::select(location, epiyear, epiweek, ili=.mean)

# bind the historical data to the new data
ili_bound <- dplyr::bind_rows(ilidat %>% dplyr::mutate(forecasted=FALSE),
ili_future %>% dplyr::mutate(forecasted=TRUE)) %>%
dplyr::arrange(location, epiyear, epiweek) %>%
dplyr::inner_join(locations, by="location")

# Create results
res <- tibble::lst(ilidat, ilidat_tsibble, ili_fit, ili_forecast, ili_future, ili_bound, arima_params, locstats, removed)
return(res)
}
16 changes: 8 additions & 8 deletions R/retrieve.R
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ get_hdgov_hosp <- function(endpoint="https://healthdata.gov/resource/g62h-syeh.j
#' @references API documentation: <https://dev.socrata.com/foundry/data.cdc.gov/k87d-gv3u>.
#' @examples
#' \dontrun{
#' d <- get_cdc_vax_data()
#' d <- get_cdc_vax()
#' d
#' library(ggplot2)
#' d %>%
Expand Down Expand Up @@ -183,14 +183,14 @@ get_cdc_ili <- function(region=c("national", "state"), years=NULL) {
# Get only relevant columns (drop age group distributions)
# Join to internal package data to get state abbreviations and FIPS codes
d <- d %>%
dplyr::select(region_type, region, year, week, week_start, dplyr::contains("ili"), ilitotal:total_patients) %>%
dplyr::select(region_type, region, epiyear=year, epiweek=week, week_start, dplyr::contains("ili"), ilitotal:total_patients) %>%
dplyr::mutate(region=gsub("National", "US", region)) %>%
dplyr::inner_join(locations, by=c("region"="location_name")) %>%
dplyr::select(location, region_type, abbreviation, region, dplyr::everything())
message(sprintf("Latest week_start / year / epiweek available:\n%s / %d / %d",
max(d$week_start),
unique(d$year[d$week_start==max(d$week_start)]),
unique(d$week[d$week_start==max(d$week_start)])))
unique(d$epiyear[d$week_start==max(d$week_start)]),
unique(d$epiweek[d$week_start==max(d$week_start)])))
return(d)
}

Expand All @@ -211,16 +211,16 @@ get_cdc_hosp <- function(years=NULL) {
dplyr::transmute(location="US",
abbreviation="US",
region="US",
year,
week=year_wk_num,
epiyear=year,
epiweek=year_wk_num,
week_start=wk_start,
week_end=wk_end,
rate,
weeklyrate,
season=sea_label)
message(sprintf("Latest week_start / year / epiweek available:\n%s / %d / %d",
max(d$week_start),
unique(d$year[d$week_start==max(d$week_start)]),
unique(d$week[d$week_start==max(d$week_start)])))
unique(d$epiyear[d$week_start==max(d$week_start)]),
unique(d$epiweek[d$week_start==max(d$week_start)])))
return(d)
}
6 changes: 4 additions & 2 deletions R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,14 @@
#' @param df A `tibble` containing columns `epiyear` and `epiweek`.
#' @param epiyear Unquoted variable name containing the MMWR epiyear.
#' @param epiweek Unquoted variable name containing the MMWR epiweek.
#' @param key Unquoted variable name containing the name of the column to be the tsibble key. See [tsibble::as_tsibble].
#' @param chop Logical indicating whether or not to remove the most current week (default `TRUE`).
#' @return A `tsibble` containing additional columns `monday` indicating the date
#' for the Monday of that epiweek, and `yweek` (a yearweek vctr class object)
#' that indexes the `tsibble` in 1 week increments.
#' @export
#' @md
make_tsibble <- function(df, epiyear, epiweek, chop=TRUE) {
make_tsibble <- function(df, epiyear, epiweek, key=location, chop=TRUE) {
out <- df %>%
# get the monday that starts the MMWRweek
dplyr::mutate(monday=MMWRweek::MMWRweek2Date(MMWRyear={{epiyear}},
Expand All @@ -23,7 +24,7 @@ make_tsibble <- function(df, epiyear, epiweek, chop=TRUE) {
# convert represent as yearweek (see package?tsibble)
dplyr::mutate(yweek=tsibble::yearweek(monday), .after="monday") %>%
# convert to tsibble
tsibble::as_tsibble(index=yweek, key=location)
tsibble::as_tsibble(index=yweek, key={{key}})
# Remove the incomplete week
if (chop) out <- utils::head(out, -1)
return(out)
Expand Down Expand Up @@ -113,3 +114,4 @@ plot_forc <- function(.forecasts, .train, .test) {
}



89 changes: 89 additions & 0 deletions man/forecast_ili.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/get_cdc_vax.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion man/make_tsibble.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading