Created date
Jun 16, 2022 01:21 PM
Data Science
Applied forecasting
TOC (By Week)
Model Learnt
  1. MEAN(y): Average method
  1. NAIVE(y): Naïve method
  1. SNAIVE(y ~ lag(m)): Seasonal naïve method
  1. RW(y ~ drift()): Drift method

  1. Automatic forecasting
notion image

Arima → Less interpretable compared to ETS
To do
What is hetroskesity
p.57 likelihood
Innovations state space models
notion image
notion image

Time series patterns

There are 3 types of time series patterns that need to be well-defined :

  • exists when there is a long-term increase or decrease in the data.
  • exists when a series is influenced by seasonal factors
  • (e.g., the quarter of the year, the month, or day of the week).
  • exists when data exhibit rises and falls that are not of fixed period (duration usually of at least 2 years).
Many people get confused about Seasonal and Cyclic. Here are the difference.
Many people get confused about Seasonal and Cyclic. Here are the difference.
Economic data is more of a cyclic pattern = Recession, boom, climax


Seasonal subseries plots

3.5. Lagged scatterplots
Why do we need lagged valued?
  • In finance, we want to see percentage change in certain amount of day like t-1 = one day changed
3.6.1 Autocorrelation
  • Covariance and correlation: measure extent of linear relationship between two variables (y and X).
  • Auto means “Self” in Latin. As implied, autocovariance and autocorrelation measure the linear relationship between lagged values of a time series y. Lagged values are derived from the “SELF” values.
    • The autocorrelation function (ACF) tells us the correlation between observations and those that came before them separated by different lags (refer to the monster generations in slides!)
      r = autocorrelation coefficient
      r = autocorrelation coefficient
3.6.1 correlogram (ACF)
new_production %>% ACF(Beer) %>% autoplot()
notion image
  • r4 higher than for the other lags.
    • This is due to the seasonal pattern in the data: the peaks tend to be 4 quarters apart and the troughs tend to be 2 quarters apart.The dashed Blue line indicates that if the correlations are significantly different from 0.
  • r2 is more negative than for the other lags because troughs tend to be 2 quarters behind peaks.

The dashed Blue line indicates that if the correlations are significantly different from 0.
The dashed Blue line indicates that if the correlations are significantly different from 0.

  • When data have a trend, the autocorrelations for small lags tend to be large and positive.
  • When data are seasonal, the autocorrelations will be larger at the seasonal lags (i.e., at multiples of the seasonal frequency)
  • When data are trended and seasonal, you see a combination of these effects.
3.7. White noise
  • Time series that show no Autocorrelation are called “White noise“.
  • Expect each autocorrelation to be close to zero.
    • notion image
  • Blue lines show 95% critical values.
    • Common to plot lines at ±1.96/√T when plotting ACF. These are the critical values.
  • 95% of all for white noise must lie within ±1.96/√T.
  • If this is not the case (亦即某 超過 dashed blue line) , the series is probably not White noise.

    Week 3: Time series decomposition. 1

    3.1.    Transformations and adjustments. 1

    There are 4 kinds of adjustments :
    1. Population adjustments
    1. Inflation adjustments
    1. calendar adjustments
    1. Mathematical transformations

    The purpose of which is to simplify the pattern in the historical data
    • by removing the known source of variation
    • by making the pattern more consistent across the entire dataset.
    Why do I need to simplify the pattern ?
    • Lead to a more accurate forecast.

    3.2.    Mathematical transformations. 1

    • Power transformations includes Square root, cube roots
    • Box-cox transformations: log
    notion image
    notion image

    Example of the different adjustment

    notion image
    notion image
    notion image
    notion image
    notion image

    3.2.1.    Box-Cox transformations.

    notion image
    myseries_train %>% features(Turnover, features = guerrero)

    Feature of power transformation :

    • STABLISES the variance of the series.
    • ADJUST the series to make it more comparable.
    • Often no transformation needed.
    • Simple transformations are easier to explain and work well enough.
    • Transformations can have very large effect on PI.
    • If some data are zero or negative, then use λ > 0.
    • log1p() can also be useful for data with zeros.
    • Choosing logs is a simple way to force forecasts to be positive
    • Transformations must be reversed to obtain forecasts on the original scale. (Handled automatically by fable.)
    • ASSUME the variation is proportional to the level of the series.

    Rule of thumb of Box-Cox transformation (lambda)

    Let x = the size of the seasonal variation about the same across the whole series
    When Value of
    • ≥ 0, you are making x weaker.
    • ≤ 1, you are making x stronger.
    • = 1 means there is no transformation.
    • = 0 means natural log, which is pretty strong.

    Handling  trend
    Handling trend
    Handling  seasonality
    Handling seasonality
    notion image

    Alternative transformation

    log(x+1) (Use it when there is zero)

    notion image
    notion image
    pedestrian %>% filter(Sensor == "Southern Cross Station") %>% autoplot(log1p(Count)) + labs(title = "Southern Cross Pedestrians")
    If there is a high skewness and some ZEROs (so we can’s take logs, we can try log(x+1) transformation.

    3.3.    Time series components.

    3.3.1.    Time series patterns (Refer back the previous week)

    3.3.2.    Time series decomposition. (trend-cycle)

    When we decompose a time series into components, we usually combine the trend and cycle into a single trend-cycle component (sometimes called the trend for simplicity).
    Thus we think of a time series as comprising three components: a trend-cycle component, a seasonal component, and a remainder component (containing anything else in the time series).
    In this chapter, we are to learn some common methods for extracting these components from a time series.
    yt = data at period t
    Tt = trend-cycle component at period t
    St = seasonal component at period t
    Rt = remainder component at period t

    • Additive model appropriate if the magnitude of seasonal fluctuations does not vary with level.
    • If seasonal is proportional to the level of series, then the multiplicative model is appropriate.
    • Multiplicative decomposition is more common with economic time series.
    • Alternative: use a Box-Cox transformation (make it more stable), and then use additive decomposition.
    • Logs turn multiplicative relationship into an additive relationship:

    The electrical equipment orders (top) and its three additive components.
    The electrical equipment orders (top) and its three additive components.
    The grey bars to the right of each panel show the RELATIVE SCALES of the components.
    • A longer bar means the smaller scales; smaller bar means the scales are similar to the original data.
    • Bars with same lengths belong to the same scales.
    The large grey bar in the bottom panel shows that the variation in the remainder component is small compared to the variation in the data, which has a bar about one quarter the size.
    • If we shrunk the bottom three panels until their bars became the same size as that in the data panel, then all the panels would be on the same scale.

    3.3.3.    Seasonal adjustment (Seasonal)

    • We use estimates of S based on past values to seasonally adjust a current value.
    • Seasonally adjusted series reflect remainders as well as trends. Therefore they are not “smooth”
    • “downturns” or “upturns” can be misleading.
    What is an example of seasonal variation?
    • An increase in unemployment due to school leavers seeking work is seasonal variation, while an increase in unemployment due to an economic recession is non-seasonal.
    • Most economic analysts who study unemployment data are more interested in the non-seasonal variation. Consequently, employment data (and many other economic series) are removing the seasonality of the data (i.e. seasonally adjusted).

    3.4.    History of time series decomposition.

    3.4.1.    X-11 decomposition. 2
    3.4.2.    Extensions: X-12-ARIMA and X-13-ARIMA.. 2
    3.4.3.    X-13ARIMA-SEATS. 2
    3.4.4.    STL decomposition. 2
    seats_dcmp <- us_retail_employment %>% model(seats = X_13ARIMA_SEATS(Employed ~ seats())) %>% components() autoplot(seats_dcmp) + labs(title = "Decomposition of total US retail employment using SEATS")
    notion image
    The grey bar asides correspond to the contribution to the original data.
    • The seasonal pattern (3rd row) has the SMALLEST contribution as the bar is the largest compared to the trend’s and seasonal one.
    3.5.    When things go wrong. 2

    Change of window (Example)

    notion image
    (remainder ) : my remainder looks pretty random; it's going all over the place.
    (seasonality ) : my seasonality looks smooth and seasonal there are some fluctuations in it in the size you can see it's growing a little bit but that's fine it's just variability that we couldn't handle with our transformation
    (trend) : our trend looks like it goes through the data nicely
    notion image
    (trend) : the line is a bit smoother now i've increased the window so it's now averaging over more numbers and we get a straighter line a smoother line
    notion image
    (trend) : what an infinite window means is essentially we've just got regression we don't have a local regression anymore. Because we've only got one window of infinite size, that's using the whole length of the data and we just get one straight line our regression line
    notion image
    (trend): each slope is based only on one position in time; that value essentially will just be the intercept. That means it's really flexible right it's pretty much your trend
    (remainder ): your remainder in one component and looking at the error there's no error left it's zero which again sometimes in models you do want no error. However, for this one, we don't we want the randomness to be in the remainder term. Here we've got too much flexibility and our trend is no longer smooth

    Week 5. The forecaster's toolbox

    For this week, we discuss some useful tools for many different forecasting situations. Each of the tools below will be used repeatedly as we develop and explore a range of forecasting methods.
    • Some benchmark forecasting methods,
    • Ways of making the forecasting task simpler using transformations and adjustments
    • Methods for checking whether a forecasting method has adequately utilised the available information (Quality of the method)
    • Techniques for computing prediction intervals.

    5.1 A tidy forecasting workflow

    5.1.1.    An overview of tidy forecasting workflow
    5.1.1.    An overview of tidy forecasting workflow
    3 Specifying a model
    5 Accuracy & performance evaluation

    notion image
    5.1.2.    Data preparation (tidy) : Select the needed variables and Transform the data
    5.1.2.    Data preparation (tidy) : Select the needed variables and Transform the data
    notion image
    5.1.3.    Data visualisation
    5.1.3.    Data visualisation

    5.1.4.    Model estimation.
    5.1.4.    Model estimation.
    • A mable is a model table, each cell corresponds to a fitted model.
    • The model() function trains models to data.
    5.1.5.    Producing forecasts.
    5.1.5.    Producing forecasts.
    • A fable is a forecast table with point forecasts and distributions.
    5.1.6.    Visualising forecasts.
    5.1.6.    Visualising forecasts.

    5.2.    Some simple forecasting methods. 2

    The following 4 forecasting methods that we will use are benchmarks for other forecasting methods. They are very simple and surprisingly effective.
    1. MEAN(y): Average method
    1. NAIVE(y): Naïve method
    1. SNAIVE(y ~ lag(m)): Seasonal naïve method
    1. RW(y ~ drift()): Drift method
    Above, Naive assumes the most recent observation is the most important one, and all previous obs provide no information about the future.

    MEAN(y) : Average method
    notion image
    Forecast of all future values = mean of historical data {y1, . . . , yT}.

    SNAIVE(y ~ lag(m)): Seasonal naïve method
    notion image
    Forecasts = last value from same season.
    NAIVE(y) : Naïve method
    notion image
    Forecasts = last observed value.

    RW(y ~ drift()) : Drift method
    notion image
    Forecasts = last value plus average change.

    5.3.   Residual diagnostics

    5.2.2.  Fitted values

    • Each observation in a time series can be forecast using all previous observations. We call these Fitted values.
    notion image

    5.2.3.    Forecasting residuals

    • Define difference between observed value and its fitted value:
    • Useful in checking whether a model has adequately captured the information in the data.
      • A good forecasting method has the following assumptions and useful properties :
    notion image
    Useful properties
    (for distributions & prediction intervals)
    (for distributions & prediction intervals)

    There are 2 ways to check residuals; one by treating residuals individually (i.e. ACF of residuals) , another one by treating residuals as a group (Portmanteau tests)

    5.2.4.    ACF of residuals.

    notion image
    These graphs show that the naïve method produces forecasts that appear to account for all available information. The mean of the residuals is close to zero and there is no significant correlation in the residuals series. The time plot of the residuals shows that the variation of the residuals stays much the same across the historical data, apart from the one outlier, and therefore the residual variance can be treated as constant. This can also be seen on the histogram of the residuals. The histogram suggests that the residuals may not be normal — the right tail seems a little too long, even when we ignore the outlier. Consequently, forecasts from this method will probably be quite good, but prediction intervals that are computed assuming a normal distribution may be inaccurate.
    • Assume residuals are white noise (uncorrelated, mean zero, constant variance).
      • If they aren’t, then there is information left in the residuals that should be used in computing forecasts.

    5.2.5.  Portmanteau tests

    • A more formal test for autocorrelation by considering a whole set of values as a group
    • A test to see whether the set is significantly different from a zero set.

    5.4.   Distributional forecasts and prediction intervals

    5.4.1.    Forecast distributions

    • A forecast is (usually) the mean of the conditional distribution .
    • Most time series models produce normally distributed forecasts.
    • The forecast distribution describes the probability of observing any future value.

    5.4.2.    Prediction intervals

    • A prediction interval gives a region within which we expect to lie with a specified probability.
    • Assuming forecast errors are normally distributed, then a 95% PI is ; where is the st dev of the h-step distribution.
    • When h = 1, can be estimated from the residuals.
    • brick_fc %>% hilo(level = 95)
    • Point forecasts often useless without a measure of uncertainty (such as prediction intervals).
    • Prediction intervals require a stochastic model (with random errors, etc).
    • For most models, prediction intervals get wider as the forecast horizon increases.
    • Use level argument to control coverage. Check residual assumptions before believing them.
    • Usually too narrow due to unaccounted uncertainty.

    5.5.    Forecasting with transformations

    5.5.1.    Modelling with transformations
    5.5.2.    Forecasting with transformations

    5.5.3.    Bias adjustment
    If this probability is some number — p, then the probability of the transformation must also be that number p.
    • because the amount of probability of the amount of mass — the density mass — is going to be the same as whatever sits in here.
    • So. probabilities are preserved (i.e., identical), at least in terms of the quantiles of the distribution.
    notion image
    • The mean is not the same, but the median is.
    Taylor Series : Lecture starts here.

    5.6.    Forecasting and decomposition
    Since we have learnt how to decompose the time series into 3 components (T= S+T_R), we now can first forecast the components and then combine them into one forecast.

    1. Fit a decomposition model which involves both an STL decomposition followed by separate models for the seasonally adjusted series and the seasonal component.
    1. When I produce forecasts of that, it's a forecast of the original series.
      1. Under decomposition, model understands it's going to put these together at the end of the day
      2. it looks to see what the model is for the seasonal component and what the model is for the adjusted component and adds them together to get forecasts of the original series
      3. that's what it comes back with a forecast of the original series in the usual format the distribution and then the mean of the distribution
    ## use the function decomposition model ## 1 dcmp <- decomposition_model( STL(Employed), NAIVE(season_adjust), SNAIVE(season_year) ) ## 2 us_retail_employment %>% model(stlf = dcmp) %>% forecast()%>% autoplot()

    5.7.    7 Evaluating forecast accuracy
    5.7.1.    Training and test sets
    • Same as the ones taught in ETC3250.
    5.7.2.    Forecast errors
    • Errors are not the same as the residual (Here).

    Measures of forecast accuracy

    for notation above,
    • y t plus h is the t plus h th observation
    • y hat t plus h given t is the forecast based on data up until the end of the training set
    • e t plus h is the difference between the two

    Example of Scale independent:
    If the unit of e is a dollar, the units of e in MAE, RMSE are dollars; but the unit of MSE is dollar ^2.
    MAPE is a common metric to be used in the industry. But it has drawbacks: 1) yt has to be positive. 2) y has an absolute 0.

    So Rob invented one that called MASE. ->>>>
    Works well because it can be used to compare forecast accuracy across series with different units.
    Mean Error (ME) and Mean Percentage Error (MPE) are a measure of bias, rather than accuracy, which Rob does not normally look at.
    notion image
    notion image
    notion image
    notion image
    notion image

    5.8.    8 Time series cross-validation 2. 4



    Subset selectionDynamic harmonic regression (dhr) models

    Jason Siu
    A warm welcome! I am a tech enthusiast who loves sharing my passion for learning and self-discovery through my website.
    Number of posts: