Created date
Jun 16, 2022 01:21 PM
Data Science
Applied forecasting

5.2.    Some simple forecasting methods. 2

The following 4 forecasting methods that we will use are benchmarks for other forecasting methods. They are very simple and surprisingly effective.
  1. MEAN(y): Average method
  1. NAIVE(y): Naïve method
  1. SNAIVE(y ~ lag(m)): Seasonal naïve method
  1. RW(y ~ drift()): Drift method
Above, Naive assumes the most recent observation is the most important one, and all previous obs provides no information about the future.

MEAN(y): Average method
notion image
Forecast of all future values = mean of historical data {y1, . . . , yT}.

SNAIVE(y ~ lag(m)): Seasonal naïve method
notion image
Forecasts = last value from same period a season ago.
You take the last m of observed data; in this case, this is quarterly data so m = 4. So you take the last 4 values, and then your future values are the same as those ones for all future years
NAIVE(y) : Naïve method
notion image
Forecasts = last observed value.

RW(y ~ drift()) : Drift method
notion image
Forecasts = last value plus average change from period to period
You may ask what's the average amount it's changed from period to period and that's what we expect it to continue to change into the future. So, you take your last value plus that amount of change for all future periods.

5.3.   Residual diagnostics

5.2.2.  Fitted values

  • Each observation in a time series can be forecast using all previous observations. We call these Fitted values.
notion image

5.2.3.    Forecasting residuals

  • Define difference between observed value and its fitted value:
  • Useful in checking whether a model has adequately captured the information in the data.
    • A good forecasting method has the following assumptions and useful properties :

Assumptions of residuals

notion image
Useful properties
(for distributions & prediction intervals)
(for distributions & prediction intervals)

There are 2 ways to check residuals; one by treating residual individually (i.e. ACF of residuals) , another one by treating residuals as a group (Portmanteau tests)

5.2.4.    ACF of residuals.

notion image
These graphs show that the naïve method produces forecasts that appear to account for all available information.
The mean of the residuals is close to zero and there is no significant correlation in the residuals series.
The time plot of the residuals shows that the variation of the residuals stays much the same across the historical data, apart from the one outlier, and therefore the residual variance can be treated as constant.
This can also be seen on the histogram of the residuals. The histogram suggests that the residuals may not be normal — the right tail seems a little too long, even when we ignore the outlier.
Consequently, forecasts from this method will probably be quite good, but prediction intervals that are computed assuming a normal distribution may be inaccurate.
  • Assume residuals are white noise (uncorrelated, mean zero, constant variance).
    • If they aren’t, then there is information left in the residuals that should be used in computing forecasts.

5.2.5.  Portmanteau tests

  • A more formal test for autocorrelation by considering a whole set of values as a group
  • A test to see whether the set is significantly different from a zero set.

5.4.   Distributional forecasts and prediction intervals

5.4.1.    Forecast distributions

  • A forecast is (usually) the mean of the conditional distribution .
  • Most time series models produce normally distributed forecasts.
  • The forecast distribution describes the probability of observing any future value.

5.4.2.    Prediction intervals

  • A prediction interval gives a region within which we expect to lie with a specified probability.
  • Assuming forecast errors are normally distributed, then a 95% PI is ; where is the st dev of the h-step distribution.
  • When h = 1, can be estimated from the residuals.
  • brick_fc %>% hilo(level = 95)
  • Point forecasts often useless without a measure of uncertainty (such as prediction intervals).
  • Prediction intervals require a stochastic model (with random errors, etc).
  • For most models, prediction intervals get wider as the forecast horizon increases.
  • Use level argument to control coverage. Check residual assumptions before believing them.
  • Usually too narrow due to unaccounted uncertainty.

5.5.    Forecasting with transformations

5.5.1.    Modelling with transformations
5.5.2.    Forecasting with transformations

5.5.3.    Bias adjustment
If this probability is some number — p, then the probability of the transformation must also be that number p.
  • because the amount of probability of the amount of mass — the density mass — is going to be the same as whatever sits in here.
  • So. probabilities are preserved (i.e., identical), at least in terms of the quantiles of the distribution.
notion image
  • The mean is not the same, but the median is.
Taylor Series: Lecture starts here.

5.6.    Forecasting and decomposition
Since we have learnt how to decompose the time series into 3 components (T= S+T_R), we now can first forecast the components and then combine them into one forecast.

  1. Fit a decomposition model which involves both an STL decomposition followed by separate models for the seasonally adjusted series and the seasonal component.
  1. When I produce forecasts of that, it's a forecast of the original series.
    1. Under decomposition, model understands it's going to put these together at the end of the day
    2. it looks to see what the model is for the seasonal component and what the model is for the adjusted component and adds them together to get forecasts of the original series
    3. that's what it comes back with a forecast of the original series in the usual format the distribution and then the mean of the distribution
## use the function decomposition model ## 1 dcmp <- decomposition_model( STL(Employed), NAIVE(season_adjust), SNAIVE(season_year) ) ## 2 us_retail_employment %>% model(stlf = dcmp) %>% forecast()%>% autoplot()

5.7.    7 Evaluating forecast accuracy
5.7.1.    Training and test sets
  • Same as the ones taught in ETC3250.
5.7.2.    Forecast errors
  • Errors are not the same as the residual (Here).

5.7.3.    Measures of forecast accuracy
for the notation above,
  • y t plus h is the t plus h th observation
  • y hat t plus h was given t is the forecast based on data up until the end of the training set
  • e t plus h is the difference between the two

Example of Scale independent:
If the unit of e is a dollar, the units of e in MAE, and RMSE are dollar; but the unit of MSE is dollar ^2.
MAPE is a common metric to be used in the industry. But it has drawbacks: 1) yt has to be positive. 2) y has an absolute 0.

So Rob invented one that called MASE. ->>>>
Mean Error (ME) and Mean Percentage Error (MPE) are measure of bias, rather than accuracy, which Rob do not normally look at.
notion image
notion image
notion image
notion image
notion image

5.8.    8 Time series cross-validation 2. 4



What is the assumption of residuals?
  1. are uncorrelated. If they aren’t, then information left in residuals that should be used in computing forecasts.
  1. have mean zero. If they don’t, then forecasts are biased

For prediction interval:

  1. have constant variance.
  1. are normally distributed
How do I know if the residuals are uncorrelated?
notion image
Dynamic harmonic regression (dhr) modelsRegression model (Time Series) (RTS)