Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help with some pathological forecasts #201

Closed
juliohm opened this issue Oct 1, 2020 · 9 comments
Closed

Help with some pathological forecasts #201

juliohm opened this issue Oct 1, 2020 · 9 comments

Comments

@juliohm
Copy link
Contributor

juliohm commented Oct 1, 2020

I've been playing with some forecasts of temperature at a few locations in the globe, and have noticed that sometimes the forecasts go off the historical data. Do you know if there is any way to enforce the seasonality and avoid this behaviour, which has never been seen?

For example, consider the time series I saved in this gist, and the script below:

using TimeSeries
using StateSpaceModels
using Statistics
using Plots

# read weekly time series (5 years worth of data)
series = readtimearray("2t-series.csv")
t, z   = timestamp(series), values(series)

# fit state space model with period = 1 year = 52 weeks
model  = structural(z, 52)
ss     = statespace(model, verbose = 0)

# plot structural components
plot(t, [ss.smoother.alpha[:, 1] ss.smoother.alpha[:, 2] ss.smoother.alpha[:, 3]],
     layout = (3, 1), label = ["trend" "slope" "seasonal"], legend = :topleft)

# forecast the next 6 months
n    = 26
tₒ   = t[end]
δt   = t[2] - t[1]
t̂    = tₒ+δt:δt:tₒ+n*δt
μ, d = forecast(ss, n)
σ    = @. sqrt(getindex(var(d), 1))

# plot history and forecast
plot(t, z, label="history", size = (900,300), legend = :topleft)
plot!(t̂, μ, ribbon=σ, label="forecast")

The components of the model look like this:

Screen Shot 2020-10-01 at 08 53 19

Can you double check my intuition about these components? As I understand it, the trend component accounts for the mean or "general" behaviour of the process, when we exclude all the oscillations and disturbances. Given that this is a time series of temperature, I am understanding that this is the component that tells us which years where hotter and which years were colder overall. The slope component accounts for deviations from this general behaviour, which aren't explained by a fixed period of oscillation. So these would be related to external factors that one could research further. Finally, the seasonal component accounts for an oscillation of fixed period that is not very interesting per see, but is there as part of the process. Do you feel that this is a good intuition?

When I plot the forecast, I get this:

Screen Shot 2020-10-01 at 09 02 40

The forecast goes up indefinitely, even though the time series is known to oscillate within some physical bounds. Is there a way to circumvent this? It seems to agree with what the state space model learned about the components. The trend component in the previous plot seems to increase at the end of the series in way that fools the model. I've tried with other times of the year, and sometimes the forecasts also goes down indefinitely.

I will keep investigating the issue. I've seen a section in the documentation about diagnosis, and would be happy to check for convergence if possible, or have some error estimates to guide me on when we can trust the forecasts.

@raphaelsaavedra
Copy link
Member

raphaelsaavedra commented Oct 1, 2020

Your intuitions are correct for the most part, but

an oscillation of fixed period that is not very interesting per see, but is there as part of the process

is not really true. The seasonality can be a major part of the series, and in fact should be in this case.

There are two separate issues here:

  1. Were the state estimates consistent with what we'd expect from the series?
  2. Was the forecast consistent with the state estimates?

The answers are 1) no and 2) yes. The forecast is perfectly reasonable if you look at the state estimates because the estimated seasonality is extremely small. Since there is no seasonality and the trend is going upwards in the last in-sample observations, it's expected that the forecast will have a similar upwards trend.

The problem here is that the seasonality wasn't estimated as we would expect (you can see its values range from -3 to 2 as opposed to the series which is in the order of hundreds). As a result, the majority of the real, unobserved seasonal component ended up in the trend.

Some possible explanations for this issue, off the top of my head, are:

  1. s = 52 is pretty large. Large seasonality means more parameters to be estimated and larger matrices and it leads to a harder model to fit. We haven't really stressed the model with long periods for seasonality. I'd suggest a simple test where you aggregate this data into a monthly time series and use s = 12 to see if the seasonality is better characterized.

  2. If the seasonality isn't perfectly periodic, this can also lead to imprecisions. For example, suppose we take the yearly peaks of this series across the 5 years and see how much time there is between each pair, and that the result is 48, 51, 54, 49. This can lead to a more difficult characterization of the seasonality. Note that I don't think things should be 100% perfect for the model to work reasonably well -- I would attribute the problem mostly to item 1), but this is also something to keep in mind.

@juliohm
Copy link
Contributor Author

juliohm commented Oct 1, 2020

Thank you @raphaelsaavedra , that is what I wanted to double check. Yesterday, I've performed experiments with monthly series and they were more well-behaved. It is good to know that my intuition is aligned with yours. Is there a way to spot this problem automatically? Perhaps comparing the magnitude of the trend and seasonal components as you suggested could help as a rule of thumb?

@juliohm
Copy link
Contributor Author

juliohm commented Oct 1, 2020

Perhaps using some different initialisation of filter method could also help? I am assuming that the default options of the model are a reasonable choice for users interested in robustness.

@raphaelsaavedra
Copy link
Member

Perhaps comparing the magnitude of the trend and seasonal components as you suggested could help as a rule of thumb?

It's good to keep in mind that trend + seasonal + irregular = observation, so if your series is clearly seasonal, but the seasonality is orders of magnitude smaller than the observations, then it's clear that the seasonality was not captured accurately. Another red flag is that the trend has a clear seasonal aspect, which also indicates that it got infected by the unobserved seasonality.

Perhaps using some different initialisation of filter method could also help?

The square-root filter is more numerically stable, so you could try to use it (just use the kwarg filter_type=SquareRootFilter). We don't define it as the default one because it is significantly slower and it's rare that it makes a difference.

@guilhermebodin
Copy link
Member

My intuition is that this should not be modeled as a basic structural (trend + seasonality) model, the model should contain cycle components (trend + seasonality + cycle) that are not implemented on this version.

@juliohm
Copy link
Contributor Author

juliohm commented Oct 1, 2020

@guilhermebodin can you elaborate on the conceptual difference between seasonal and cycle components? I was scanning the section of the book on this matter, but didn't have enough time read it carefully.

@guilhermebodin
Copy link
Member

Durbin and Koopman suggest that when a seasonal period is too long (365 days or 52 weeks in your case) you can model these seasonalities as cycles (that only have 2 states).

@juliohm
Copy link
Contributor Author

juliohm commented Oct 1, 2020

Thanks for sharing. It seems that cycles are the way to go here indeed. :)

@guilhermebodin
Copy link
Member

Template added in #204. We still need to write an example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants