Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auto ARIMA for influenza hospitalizations directly #37

Closed
vpnagraj opened this issue Dec 27, 2021 · 9 comments
Closed

auto ARIMA for influenza hospitalizations directly #37

vpnagraj opened this issue Dec 27, 2021 · 9 comments
Assignees

Comments

@vpnagraj
Copy link
Collaborator

vpnagraj commented Dec 27, 2021

up to now we have discussed a two step forecasting workflow:

  1. forecast ILI with ARIMA (or similar) framework
  2. use ILI as predictor in hospitalization modeling via glm framework

what if we tried auto ARIMA on hospitalizations directly?

Do this for both state and national level

@vpnagraj vpnagraj self-assigned this Dec 27, 2021
@stephenturner
Copy link
Contributor

I fit a nonseasonal exponential smoothing model and an ARIMA model and took an ensemble of both of them. Showing a 10% interval so as to not have so much overplotting.

image

Limiting it to data prior to October it's not terrible..

image

@vpnagraj
Copy link
Collaborator Author

interesting. can you add your code over at https://github.com/signaturescience/fiphde/tree/hosp-arima

i was working on this at https://github.com/signaturescience/fiphde/blob/hosp-arima/scratch/hosp-arima.R

holding the current four weeks (which includes the dramatic rise in hospitalizations) as the test set .... this doesnt look nearly as good:

hosp-arima

@stephenturner
Copy link
Contributor

Alright, you're right. Pushed some relatively well-commented code in 710147b.

Forecasting the next four weeks, ETS looks possibly too high, ARIMA too low, ensemble just right 🥣 🧸 📈

image

But backing up and removing four weeks of data and forecasting those weeks, it looks pretty bad

image

OTOH, if you look at the 95% confidence interval on the ETS model, it's almost covering the true value.

image

One of the things I want to look at is adding those ili ranks and hosp ranks as predictors here. We could treat this as an ex-post predictor as noted in https://otexts.com/fpp3/forecasting-regression.html. I have some ideas on how to do this. More here shortly.

@stephenturner
Copy link
Contributor

a972758 shows how to bring in historical severity into the ARIMA modelling and forecasting steps. Code shoudl be fairly well commented.

Without historical hospitalization rank data, forecasting next four weeks:

image

With hospitalization rank data:

image

Previous four weeks, without historical predictor:

image

With the historical predictor, maybe, slightly better?

image

@stephenturner
Copy link
Contributor

A few more commits in that script looking at how these models performed throughout the time series we have. @vpnagraj, Download this pdf and hold the pagedown/pageup key to scroll through the weeks. Keep in mind I'm only looking at the 10% interval to keep it tight. Note the ETS intervals are pretty wide at 95%.

tsplots.pdf

@vpnagraj
Copy link
Collaborator Author

this is awesome. i added some code at 3cc73f2

might seem redundant. but i actually think what i did there could be really useful. basically recreated similar plots but using the plot_forc framework that we have been using for the glm code. i also did some data manipulation so we can calculate the WIS for each model => forecast => horizon. before you pushed this up i was working something in the same vein over at https://github.com/signaturescience/fiphde/blob/eval-windows/scratch/eval.R .

heres what the updated plots look like (with 95% PI):

tsplots_v2.pdf

bottom line. getting the forecasts into a common format / visualization paradigm sets us up to have a "fair" evaluation of forecasts from multiple methods (right now our glm-s and these several time series approaches). and we dont have to eyeball it. we can basically scoot back to any (every?) previous week where we have data and run the modeling => calculate WIS. from there we could justify any model selection by going with the framework that gives us the lowest (summed? averaged? median?) WIS over all forecast time points.

@stephenturner
Copy link
Contributor

About to blow up your tab with this monster image. Good/bad news.

Good news. tsibble/fable "just works" as advertised. When the tsibble is keyed by a location, the modeling and forecasting steps work with nearly zero adjustments to the code 🔮

Bad news. I think we kind of knew this was going to happen based on some work we did at fluforce-init. Some states have extremely sparse data. I'm thinking we ought to implement some kind of filter on the incoming data, where if there aren't at least X number of observations with at least Y number of counts over some period Z, we throw out that location.

state-level-forecasts

@stephenturner
Copy link
Contributor

I removed a few states reporting extremely low numbers, and added a few transformations on the ets model in
c0b9f4a. Not sure it's really helping us much. The sqrt transformation isn't very useful, and the box cox with a .5 lambda is doing something weird here and there.

state-level-forecasts

@stephenturner
Copy link
Contributor

closed in #65

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants