Fundamentals in Applied Forecasting

type

Post

Created date

Jun 16, 2022 01:21 PM

Time series patterns

There are 3 types of time series patterns that need to be well-defined :

Trend

exists when there is a long-term increase or decrease in the data.

Seasonal

exists when a series is inﬂuenced by seasonal factors

(e.g., the quarter of the year, the month, or day of the week).

Cyclic

exists when data exhibit rises and falls that are not of ﬁxed period (duration usually of at least 2 years).

Economic data is more of a cyclic pattern = Recession, boom, climax

Seasonal subseries plots

3.5. Lagged scatterplots

Why do we need lagged valued?

In finance, we want to see percentage change in certain amount of day like t-1 = one day changed

Lag Plots (netdna-ssl.com)

3.6.1 Autocorrelation

Covariance and correlation: measure extent of linear relationship between two variables (y and X).

Auto means “Self” in Latin. As implied, autocovariance and autocorrelation measure the linear relationship between lagged values of a time series y. Lagged values are derived from the “SELF” values.

The autocorrelation function (ACF) tells us the correlation between observations and those that came before them separated by different lags (refer to the monster generations in slides!)

3.6.1 correlogram (ACF)

new_production %>% ACF(Beer) %>% autoplot()

r4 higher than for the other lags.

This is due to the seasonal pattern in the data: the peaks tend to be 4 quarters apart and the troughs tend to be 2 quarters apart.The dashed Blue line indicates that if the correlations are significantly different from 0.

r2 is more negative than for the other lags because troughs tend to be 2 quarters behind peaks.

The dashed Blue line indicates that if the correlations are significantly different from 0.

When data have a trend, the autocorrelations for small lags tend to be large and positive.

When data are seasonal, the autocorrelations will be larger at the seasonal lags (i.e., at multiples of the seasonal frequency)

When data are trended and seasonal, you see a combination of these eﬀects.

3.7. White noise

Time series that show no Autocorrelation are called “White noise“.

Expect each autocorrelation to be close to zero.

Blue lines show 95% critical values.

Common to plot lines at ±1.96/√T when plotting ACF. These are the critical values.

95% of all for white noise must lie within ±1.96/√T.

If this is not the case (亦即某超過 dashed blue line) , the series is probably not White noise.

Week 3: Time series decomposition. 1

3.1. Transformations and adjustments. 1

There are 4 kinds of adjustments :

Population adjustments

Inflation adjustments

calendar adjustments

Mathematical transformations

The purpose of which is to simplify the pattern in the historical data

by removing the known source of variation

by making the pattern more consistent across the entire dataset.

Why do I need to simplify the pattern ?

Lead to a more accurate forecast.

3.2. Mathematical transformations. 1

Power transformations includes Square root, cube roots

Box-cox transformations: log

Example of the different adjustment

3.2.1. Box-Cox transformations.


myseries_train %>%
features(Turnover, features = guerrero)

Feature of power transformation :

STABLISES the variance of the series.

ADJUST the series to make it more comparable.

Often no transformation needed.

Simple transformations are easier to explain and work well enough.

Transformations can have very large eﬀect on PI.

If some data are zero or negative, then use λ > 0.

log1p() can also be useful for data with zeros.

Choosing logs is a simple way to force forecasts to be positive

Transformations must be reversed to obtain forecasts on the original scale. (Handled automatically by fable.)

ASSUME the variation is proportional to the level of the series.

Rule of thumb of Box-Cox transformation (lambda)

Let x = the size of the seasonal variation about the same across the whole series

When Value of

≥ 0, you are making x weaker.

≤ 1, you are making x stronger.

= 1 means there is no transformation.

= 0 means natural log, which is pretty strong.

Alternative transformation

`log(x+1)` (Use it when there is zero)


pedestrian %>%
  filter(Sensor == "Southern Cross Station") %>%
  autoplot(log1p(Count)) +
  labs(title = "Southern Cross Pedestrians")

If there is a high skewness and some ZEROs (so we can’s take logs, we can try log(x+1) transformation.

3.3. Time series components.

3.3.1. Time series patterns (Refer back the previous week)

3.3.2. Time series decomposition. (trend-cycle)

When we decompose a time series into components, we usually combine the trend and cycle into a single trend-cycle component (sometimes called the trend for simplicity).

Thus we think of a time series as comprising three components: a trend-cycle component, a seasonal component, and a remainder component (containing anything else in the time series).

In this chapter, we are to learn some common methods for extracting these components from a time series.

where

yt = data at period t

Tt = trend-cycle component at period t

St = seasonal component at period t

Rt = remainder component at period t

Additive model appropriate if the magnitude of seasonal fluctuations does not vary with level.

If seasonal is proportional to the level of series, then the multiplicative model is appropriate.

Multiplicative decomposition is more common with economic time series.

Alternative: use a Box-Cox transformation (make it more stable), and then use additive decomposition.

Logs turn multiplicative relationship into an additive relationship:

The electrical equipment orders (top) and its three additive components.

The grey bars to the right of each panel show the RELATIVE SCALES of the components.

A longer bar means the smaller scales; smaller bar means the scales are similar to the original data.

Bars with same lengths belong to the same scales.

The large grey bar in the bottom panel shows that the variation in the remainder component is small compared to the variation in the data, which has a bar about one quarter the size.

If we shrunk the bottom three panels until their bars became the same size as that in the data panel, then all the panels would be on the same scale.

3.3.3. Seasonal adjustment (Seasonal)

We use estimates of S based on past values to seasonally adjust a current value.

Seasonally adjusted series reflect remainders as well as trends. Therefore they are not “smooth”

“downturns” or “upturns” can be misleading.

What is an example of seasonal variation?

An increase in unemployment due to school leavers seeking work is seasonal variation, while an increase in unemployment due to an economic recession is non-seasonal.

Most economic analysts who study unemployment data are more interested in the non-seasonal variation. Consequently, employment data (and many other economic series) are removing the seasonality of the data (i.e. seasonally adjusted).

3.4. History of time series decomposition.

3.4.1. X-11 decomposition. 2

3.4.2. Extensions: X-12-ARIMA and X-13-ARIMA.. 2

3.4.3. X-13ARIMA-SEATS. 2

3.4.4. STL decomposition. 2


seats_dcmp <- us_retail_employment %>%
  model(seats = X_13ARIMA_SEATS(Employed ~ seats())) %>%
  components()
autoplot(seats_dcmp) +
  labs(title =
    "Decomposition of total US retail employment using SEATS")

The grey bar asides correspond to the contribution to the original data.

The seasonal pattern (3rd row) has the SMALLEST contribution as the bar is the largest compared to the trend’s and seasonal one.

3.5. When things go wrong. 2

Change of window (Example)

(remainder ) : my remainder looks pretty random; it's going all over the place.

(seasonality ) : my seasonality looks smooth and seasonal there are some fluctuations in it in the size you can see it's growing a little bit but that's fine it's just variability that we couldn't handle with our transformation

(trend) : our trend looks like it goes through the data nicely

(trend) : the line is a bit smoother now i've increased the window so it's now averaging over more numbers and we get a straighter line a smoother line

(trend) : what an infinite window means is essentially we've just got regression we don't have a local regression anymore. Because we've only got one window of infinite size, that's using the whole length of the data and we just get one straight line our regression line

(trend): each slope is based only on one position in time; that value essentially will just be the intercept. That means it's really flexible right it's pretty much your trend

(remainder ): your remainder in one component and looking at the error there's no error left it's zero which again sometimes in models you do want no error. However, for this one, we don't we want the randomness to be in the remainder term. Here we've got too much flexibility and our trend is no longer smooth

Week 5. The forecaster's toolbox

For this week, we discuss some useful tools for many different forecasting situations. Each of the tools below will be used repeatedly as we develop and explore a range of forecasting methods.

Some benchmark forecasting methods,

Ways of making the forecasting task simpler using transformations and adjustments

Methods for checking whether a forecasting method has adequately utilised the available information (Quality of the method)

Techniques for computing prediction intervals.

5.1 A tidy forecasting workﬂow

5.1.1. An overview of tidy forecasting workﬂow

1 Preparing data

2 Data visualisation

3 Specifying a model

4 Model estimation

5 Accuracy & performance evaluation

6 Producing forecasts

5.1.2. Data preparation (tidy) : Select the needed variables and Transform the data

A mable is a model table, each cell corresponds to a ﬁtted model.

The model() function trains models to data.

A fable is a forecast table with point forecasts and distributions.

5.2. Some simple forecasting methods. 2

The following 4 forecasting methods that we will use are benchmarks for other forecasting methods. They are very simple and surprisingly effective.

MEAN(y): Average method

NAIVE(y): Naïve method

SNAIVE(y ~ lag(m)): Seasonal naïve method

RW(y ~ drift()): Drift method

Above, Naive assumes the most recent observation is the most important one, and all previous obs provide no information about the future.

MEAN(y) : Average method

Forecast of all future values = mean of historical data {y1, . . . , yT}.

SNAIVE(y ~ lag(m)): Seasonal naïve method

Forecasts = last value from same season.

NAIVE(y) : Naïve method

Forecasts = last observed value.

RW(y ~ drift()) : Drift method

Forecasts = last value plus average change.

5.3. Residual diagnostics

5.2.2. Fitted values

Each observation in a time series can be forecast using all previous observations. We call these Fitted values.

5.2.3. Forecasting residuals

Define difference between observed value and its fitted value:

Useful in checking whether a model has adequately captured the information in the data.

A good forecasting method has the following assumptions and useful properties :

Assumptions

Useful properties

(for distributions & prediction intervals)

There are 2 ways to check residuals; one by treating residuals individually (i.e. ACF of residuals) , another one by treating residuals as a group (Portmanteau tests)

5.2.4. ACF of residuals.

Interpretation

These graphs show that the naïve method produces forecasts that appear to account for all available information. The mean of the residuals is close to zero and there is no significant correlation in the residuals series. The time plot of the residuals shows that the variation of the residuals stays much the same across the historical data, apart from the one outlier, and therefore the residual variance can be treated as constant. This can also be seen on the histogram of the residuals. The histogram suggests that the residuals may not be normal — the right tail seems a little too long, even when we ignore the outlier. Consequently, forecasts from this method will probably be quite good, but prediction intervals that are computed assuming a normal distribution may be inaccurate.

Assume residuals are white noise (uncorrelated, mean zero, constant variance).

If they aren’t, then there is information left in the residuals that should be used in computing forecasts.

5.2.5. Portmanteau tests

A more formal test for autocorrelation by considering a whole set of values as a group

A test to see whether the set is significantly different from a zero set.

5.4. Distributional forecasts and prediction intervals

5.4.1. Forecast distributions

A forecast is (usually) the mean of the conditional distribution .

Most time series models produce normally distributed forecasts.

The forecast distribution describes the probability of observing any future value.

5.4.2. Prediction intervals

A prediction interval gives a region within which we expect to lie with a specified probability.

Assuming forecast errors are normally distributed, then a 95% PI is ; where is the st dev of the h-step distribution.

When h = 1, can be estimated from the residuals.

brick_fc %>% hilo(level = 95)

Point forecasts often useless without a measure of uncertainty (such as prediction intervals).

Prediction intervals require a stochastic model (with random errors, etc).

For most models, prediction intervals get wider as the forecast horizon increases.

Use level argument to control coverage. Check residual assumptions before believing them.

Usually too narrow due to unaccounted uncertainty.

5.5. Forecasting with transformations

5.5.1. Modelling with transformations

5.5.2. Forecasting with transformations

5.5.3. Bias adjustment

ETC3550 Lecture 5A - YouTube Here mentions :

If this probability is some number — p, then the probability of the transformation must also be that number p.

because the amount of probability of the amount of mass — the density mass — is going to be the same as whatever sits in here.

So. probabilities are preserved (i.e., identical), at least in terms of the quantiles of the distribution.

The mean is not the same, but the median is.

Taylor Series : Lecture starts here.

5.6. Forecasting and decomposition

Since we have learnt how to decompose the time series into 3 components (T= S+T_R), we now can first forecast the components and then combine them into one forecast.

Fit a decomposition model which involves both an STL decomposition followed by separate models for the seasonally adjusted series and the seasonal component.

When I produce forecasts of that, it's a forecast of the original series.

Under decomposition, model understands it's going to put these together at the end of the day
it looks to see what the model is for the seasonal component and what the model is for the adjusted component and adds them together to get forecasts of the original series
that's what it comes back with a forecast of the original series in the usual format the distribution and then the mean of the distribution


## use the function decomposition model
## 1
dcmp <- decomposition_model(
  STL(Employed),
  NAIVE(season_adjust),
  SNAIVE(season_year)
)
## 2
us_retail_employment %>% 
  model(stlf = dcmp) %>% 
  forecast()%>% 
  autoplot()

5.7. 7 Evaluating forecast accuracy

5.7.1. Training and test sets

Same as the ones taught in ETC3250.

5.7.2. Forecast errors

Errors are not the same as the residual (Here).

Measures of forecast accuracy

for notation above,

y t plus h is the t plus h th observation

y hat t plus h given t is the forecast based on data up until the end of the training set

e t plus h is the difference between the two

Example of Scale independent:

If the unit of e is a dollar, the units of e in MAE, RMSE are dollars; but the unit of MSE is dollar ^2.

MAPE is a common metric to be used in the industry. But it has drawbacks: 1) yt has to be positive. 2) y has an absolute 0.

So Rob invented one that called MASE. ->>>>

Works well because it can be used to compare forecast accuracy across series with different units.

Mean Error (ME) and Mean Percentage Error (MPE) are a measure of bias, rather than accuracy, which Rob does not normally look at.

5.8. 8 Time series cross-validation 2. 4