-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data prep to translate target forecasts to submission file format #6
Comments
NOTE the data prep of forecast output will depend on the forecast method used ... but given that we are likely starting with a time series model (and using the |
Initial shot at this in 6bbd2fc. Lots of outstanding issues here.
|
…aths, cumdeaths, and format for submission #6
I have this working in some code at f2c3e91
fable-submission-mockup-allmetrics.csv.txt (github doesn't let you upload .csv extensions, remove the .txt) Notes / known issues:
@vpnagraj run through this code a pipe at a time, see if you have any suggestions. |
stepped through what you have (using the *-allmetrics version of the script) pushed up some edits: i think i have a candidate fix for the text formatting conversion of "icases" to "inc cases" ... just pass in a new argument for also played around with the dates a little bit. agreed that something is way off. i reworked your code, thought i had fixed the issue (to get the epiweek date starting on sunday instead of monday) but now that i'm looking at this issue again it looks the same as your comment above (#6 (comment)) 🤔 im wondering if |
@stephenturner FYI looks like i think it's better to do handle it that way ^ ... ie lets drop the current week exclusion from done in 4aa7bdb so that saved us one week of data. we're still bumping into the issue with horizon being need to keep thinking on this ... |
I'm still cracking at this. I think the problem comes in with mmwrweeks being converted to dates, then to yearweeks, then back to dates. library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
# Saturday
MMWRweek::MMWRweek("2020-12-26")$MMWRweek
#> [1] 52
# Sunday
MMWRweek::MMWRweek("2020-12-27")$MMWRweek
#> [1] 53
# Monday
MMWRweek::MMWRweek("2020-12-28")$MMWRweek
#> [1] 53
# Sunday
MMWRweek::MMWRweek2Date(2020, 53, MMWRday = 1)
#> [1] "2020-12-27"
MMWRweek::MMWRweek2Date(2020, 53, MMWRday = 1) %>% tsibble::yearweek()
#> <yearweek[1]>
#> [1] "2020 W52"
#> # Week starts on: Monday
MMWRweek::MMWRweek2Date(2020, 53, MMWRday = 1) %>% tsibble::yearweek() %>% lubridate::as_date()
#> [1] "2020-12-21" Still trying to craft that reprex. |
I pushed some code in a new script at 6ff70b4. Run through that. I think the |
I the date issue is fixed now. I'm creating the tsibble with a function that adds a From the https://github.com/reichlab/covid19-forecast-hub#ensemble-model section:
I don't think the current forecasts based on the auto ARIMA models are doing this, but we should probably add a check/correction for this case, that if the 10th quantile of any cumulative forecast is below the most recently observed data, then make it equal to the most recent observed data, at a minimum. |
This check for forecasts for cumulative deaths not below current week values is now implemented in ae43487. But I haven't yet figured out the best place for this to reside, functionally. The |
I added some code in cfdd1e8 to use the script added in 63a2ff9 to validate the submission.
So, everything seems to look okay except for the targets. The code https://github.com/reichlab/covid19-forecast-hub/blob/68df08d9e6e19d55fddab4bd5abb505202023ecb/code/validation/R-scripts/functions_plausibility.R#L169-L186, checks for 1, 2, 3, 4 wk ahead inc death and cum deaths, but doesn't allow for inc cases: #' Checking that all entries in `target` correspond to standards
#'
#' @param entry the data.frame
#'
#' @return invisibly returns TRUE if problems detected, FALSE otherwise
verify_targets <- function(entry){
allowed_targets <- c(
paste(0:130, "day ahead inc death"),
paste(0:130, "day ahead cum death"),
paste(0:20, "wk ahead inc death"),
paste(0:20, "wk ahead cum death"),
paste(0:130, "day ahead inc hosp")
)
targets_in_entry <- unique(entry$target)
if(!all(targets_in_entry %in% allowed_targets)){
warning("ERROR: Some entries in `targets` do not correspond to standards:",
paste0(targets_in_entry[!(targets_in_entry %in% allowed_targets)], collapse = ", "))
return(invisible(FALSE))
}else{
cat("VALIDATED: targets\n")
return(invisible(TRUE))
}
} This doesn't jive with what I thought was required here to be included in the ensemble forecast (https://github.com/reichlab/covid19-forecast-hub/tree/master/data-processed#target). Perhaps this R code is no longer maintained. According to the documentation at https://github.com/reichlab/covid19-forecast-hub/blob/master/data-processed/R_forecast_file_validation.md,
... in fact, after digging around a little bit, it seems like this is the case! That R script, https://github.com/reichlab/covid19-forecast-hub/blob/master/code/validation/R-scripts/functions_plausibility.R, was last updated in May. According to the README, https://github.com/reichlab/covid19-forecast-hub/tree/master/data-processed#removed-targets, N day ahead inc cases was removed in June. |
To the script in our This let the results pass that validation check (after changing 'deaths' to 'death' and 'cases' to 'case' in 5de29cd). But another validation effort failed:
I dug into the validation scripts and there's a place right around here https://github.com/reichlab/covid19-forecast-hub/blob/68df08d9e6e19d55fddab4bd5abb505202023ecb/code/validation/R-scripts/functions_plausibility.R#L259-L282 where it checks for "quantile crossing". I'm not exactly sure what this is doing yet, but I think what's causing a problem here is that some targets have different quantiles required than others. inc deaths and cum deaths require a larger set of quantiles, while N wk ahead inc case (the newly added target) requires only a subset of those quantiles. This is spelled out in the data submission readme here. I think this causes a problem with this old legacy code because one of the operations it performs is a widening reshape, and when there are some targets with a subset of quantiles compared to other targets, you end up with
CAVEAT: This works, but given the hacks I had to put into place to get this working, I'd recommend we either:
If we can find #-2 above, it sure would be more lightweight than going the #-1 route, which requires updating the upstream of the fork, installing some python pkgs, etc. Perhaps it isn't as burdensome as I think. I'll give it a spin on darwin if I can before our meeting today. |
Follow up -- #-1 is pretty trivial. I set up a new conda environment, and followed the instructions at https://github.com/reichlab/covid19-forecast-hub/wiki/Running-Checks-Locally to install requirements and validate a single forecast file. On darwin:
🎉 🥳 🌟 ✔️ |
@stephenturner parallel thought here ... what if we put that the python validation script / pkgs in a docker image ... and wrapped a call to taht docker image in an R function (i.e. using something like i can help with that if want to pursue. shouldn't be too big of a lift. BUT we'd obviously still need to makes sure that validation code stays current |
I'd almost always prefer to call an R function than issue a python command/script at the bash shell. Looks like the requirements are pretty minimal. https://github.com/reichlab/covid19-forecast-hub/blob/master/visualization/requirements.txt |
agreed. see #9 |
@stephenturner heads up i've heavily refactored the scratch submission mockup code: things to note:
any thoughts on that ^ ? |
I don't know, unless it expected the target end date for 1 week ahead to end on the following saturday if you're dating the forecast after monday? I feel like I've seen something to this effect in the docs. Let me dig. |
sheesh. well maybe thats OK? i mean im working on writing the validation wrapper for the python method now. we can stick to validating only before we are ready to submit on the sunday or monday. so as long as we generate the forecasts/validate on sunday or monday (before deadline) it should be fine? i think? |
@stephenturner see https://github.com/signaturescience/focustools/blob/state-level-ts/R/submission.R#L72 i removed the
we do need to convert the state/territory name to appropriate FIPS: https://github.com/signaturescience/focustools/blob/state-level-ts/R/submission.R#L72 i think that will be a simple join to |
sorry to steamroll you here @stephenturner but i'm cooking on this state level stuff! i just pushed up an edit to that piece seems to be working now. mostly. i'm seeing the following issues in
|
Bound it at zero for now. We could get more sophisticated... for cum deaths we would bound point and all quantiles at no less than the last week's current data. Inc death/cases- seems reasonable that the +1wk ahead should be no less than 2x the difference between 0 and -1wk. Or +2wk ahead should be no less than 2x difference between 0 and -2x. And still bounded at zero. I.e., enforcing that you can't drop incident cases/deaths more than twice as much as they changed in a previous horizon backward? Where to do it? Agree doesn't really belong in a formatting script. But the |
"District of Columbia" is 11 right focustools/data-raw/locations.csv Line 11 in 230e2bc
|
ahh DC is both: focustools/data-raw/locations.csv Line 379 in 230e2bc
11001 must be the county FIPS need to make a special case to handle that somehow |
Looks like there are lots of counties with the same name in different states (Washington, Jefferson, Franklin, no surprise). DC looks like the only non-county dupe.
I was worried about eg Hawaii (county) vs Hawaii (state) but no problem there. |
heads up i think i have a solution for this. pushing up soon ... |
edits pushed up to state-level-ts branch to address the location code issues:
|
Actually, handling this in the forecast function so won't have to change this. |
the COVID-19 forecast hub has strict requirements for the forecast submission format:
https://github.com/reichlab/covid19-forecast-hub/blob/master/data-processed/README.md#forecast-file-format
once we generate point / quantile forecasts for targets we need to execute some post-processing to wrangle the data into the required format
this could include:
target_end_date
from weekonce we have the submission file format prepped, we can validate locally:
The text was updated successfully, but these errors were encountered: