Created date
Jun 16, 2022 01:21 PM
Data Science
Applied forecasting
This week, we're gonna talk about exponential smoothing, which is some of the most successful forecasting ways to generate reliable forecasts for a wide range of time-series data. ETS essentially is that we give more weight to the most recent observations, and then the weights will be decaying exponentially. That is, the most recent observations will contain more information.
An example would be the stock price yesterday contains more information as compared to the stock price two days ago
we will look at three types of forecasting models based on
  • a series with no trend or seasonality → (Simple Exponential Smoothing)
  • a series with the trend but no seasonality → (Holt's linear trend methods, damped trend methods)
  • a series with both trend and seasonality → (Holt-Winters’ seasonal method, Holt­Winters’ Multiplicative method, Holt­Winters’ additive method)

Simple forecasting smoothing

Before introducing Simple forecasting smoothing, you need to know about what Average and Naive methods are :

AVG : All observations are equally weighted

when we were using average method we say that the forecast for the future it is the average of all the values in the time series that is we said that our time series.
that is, we do not distinguish between the previous observation the observation before that or the last observation we give them equal weights

NAIVE : The last observation contains all information; previous observation provides no information, so all weight is given to the last information.

our forecast for the future is whatever observation we observed in the previous period, that is whatever we observed the last time and any information contained in the observation before that observation is zero
that is, only the last observation contains all the information and we can use that information to forecast.
Not being too extreme, Simple forecasting smoothing is lying between the method of AVG and Navie; most recent data should have more weight.
It is suitable for forecasting a series with no trend or seasonality.

Simple Exponential Smoothing (SES)

Used when you don’t know the trend and seasonality.

What is Smoothing params? (??)

Smoothing params controls the rate of change of the components, which are .

Rule of thumb for those parameters?

alpha, gamma, beta
alpha, gamma, beta
7.2: Simple exponential smoothing youtube
T+1 is a time on which is one step ahead /
T+1 is a time on which is one step ahead /
What is the intuition of Alpha ?
Alpha is a param to control how much weight we want to assign to each observation.
  • The value lies between 0 and 1.
  • if is closer to 1 (large), then we assign more weight to the most recent observations and the weight decay very rapidly.
  • Conversely, if is closer to 0 (small) small weights are assigned to the most recent observations and the weights they decay pretty slowly over time.
When = 1, it becomes the method of Naïve, that is, the last observation contains no information
So here comes the question: How do we estimate the value of alpha 7.3: Simple exponential smoothing in component form - YouTube. But essentially we find an alpha in which its level has the least SSE.

It turns out Component Form answers this question.
Component form representations of exponential smoothing methods comprise a forecast equation and a smoothing equation.
notion image
  • The level (or the smoothed value) of the series at time
  • Setting h = 1 gives the fitted values, while setting gives the true forecasts beyond the training data.
  • Level depends on the previous level. i.e., whatever level was in time t minus 1 and the value of alpha.
  • We can compute all forecasts when we have & alpha, so we will need to estimate these two first.


What is ?

determines how wiggly the line is. The higher , the wigglier
When   = 0 ,it is a linear regression
When = 0 ,it is a linear regression
When   = 0 ,it is wigglier.
When = 0 ,it is wigglier.
When   = .99, it seems like the trend is overfitting.
When = .99, it seems like the trend is overfitting.

Holt's linear trend methods

An extension of SES allowing local trends in the data (Small part of time space) and the seasonality.

Damped trend methods (保守d)

Holt's linear trend methods may have a problem of over forecasting in this case, so sometimes it makes more sense if we can dampen this forecast and say that the trend is gonna keep increasing in the same direction, but just gonna keep dampening as we move in time.
That is, it will not be as aggressive as it was, showing recently that is for the long time horizon the trend will be a little bit smaller in slope as compared with the trend that we are observing for our near forecasts
so to do that, we can introduce another parameter phi and now all of our three equations will contain these five parameters.
What will the model be in long run?
  • In short run, forecasts are trending, but in long run, the forecast remains constant.

WDYM by constant in a trend ?

if the trend is increasing, it will keep on increasing the trend
notion image

Holt-Winters’ seasonal method

This method extended Holt’s method to capture seasonality.
notion image

Holt-Winters’ seasonal additive method

notion image

Holt-Winters’ seasonal multiplicative method

notion image


ETS has 2 meaning 1) ExponenTial Smoothing; 2) Error Trend Seasonality (i.e., state)
Three components are Error Trend Seasonality.
notion image
Multiplicative and additive time series
Multiplicative and additive time series

notion image

Formula of Additive error models
notion image
Formula of Multiplicative error models
notion image


Innovations state space models


ETS - Coding in R

No trend and seasonality
No trend and seasonality
notion image
components(fit) %>% autoplot()
components(fit) %>% autoplot()
notion image
ETS(y ~ error("A") + trend("N") + season("N"))
By default, an optimal value for α and `0 is used. α can be chosen manually in trend(). trend("N", alpha = 0.5) trend("N", alpha_range = c(0.2, 0.8))

algeria_economy <- global_economy %>% filter(Country == "Algeria") fit <- algeria_economy %>% model(ANN = ETS(Exports ~ error("A") + trend("N") + season("N"))) report(fit)


Plot forecast
fit %>% forecast(h = 5) %>% autoplot(algeria_economy) + labs(y = "% of GDP", title = "Exports: Algeria")

Modeling with trend
notion image
notion image
components(fit) %>% autoplot()
components(fit) %>% autoplot()
notion image
ETS(y ~ error("A") + trend("A") + season("N"))
By default, an optimal value for α and `0 is used. α can be chosen manually in trend().
trend("N", alpha = 0.5) trend("N", alpha_range = c(0.2, 0.8))
aus_economy <- global_economy %>% filter(Code == "AUS") %>% mutate(Pop = Population / 1e6) fit <- aus_economy %>% model(AAN = ETS(Pop ~ error("A") + trend("A") + season("N"))) report(fit)
fit %>% forecast(h = 10) %>% autoplot(aus_economy) + labs(y = "Millions", title = "Population: Australia")
## Dampen aus_economy %>% model(holt = ETS(Pop ~ error("A") + trend("Ad") + season("N"))) %>% forecast(h = 20) %>% autoplot(aus_economy)
All in one
notion image
fit <- aus_economy %>% filter(Year <= 2010) %>% model( ses = ETS(Pop ~ error("A") + trend("N") + season("N")), holt = ETS(Pop ~ error("A") + trend("A") + season("N")), damped = ETS(Pop ~ error("A") + trend("Ad") + season("N")) ) tidy(fit) accuracy(fit)

Modeling with seasonlity
Holt-Winters additive method with additive errors.
notion image
notion image
aus_holidays <- tourism %>% filter(Purpose == "Holiday") %>% summarise(Trips = sum(Trips)) fit <- aus_holidays %>% model( additive = ETS(Trips ~ error("A") + trend("A") + season("A")), multiplicative = ETS(Trips ~ error("M") + trend("A") + season("M")) ) fc <- fit %>% forecast() fc %>% autoplot(aus_holidays, level = NULL) + labs(y = "Thousands", title = "Overnight trips")
Holt-Winters damped method
notion image
sth_cross_ped <- pedestrian %>% filter( Date >= "2016-07-01", Sensor == "Southern Cross Station" ) %>% index_by(Date) %>% summarise(Count = sum(Count) / 1000)
sth_cross_ped %>% filter(Date <= "2016-07-31") %>% model( hw = ETS(Count ~ error("M") + trend("Ad") + season("M")) ) %>% forecast(h = "2 weeks") %>% autoplot(sth_cross_ped %>% filter(Date <= "2016-08-14")) + labs( title = "Daily traffic: Southern Cross", y = "Pedestrians ('000)" )

Automatic forecasting

fit <- global_economy %>% mutate(Pop = Population / 1e6) %>% model(ets = ETS(Pop))
notion image
fit %>% forecast(h = 5)
notion image

R interpretation

notion image
alpha :
  • here in this case is optimal value (You need to compare to know).
  • The smoothing parameter is 0.322, which is pretty big so that means it's moving the intercept pretty quickly to changes in the data. Which is appropriate given the amount of movement that we saw in the data
L_Not : Initial level we talked before
  • Not the first value nor the mean.
  • was computed by optimizing for the minimal sum of squared errors.
  • it's wherever the general location of the data is at that point, which is 100647 here





Fundamentals in Applied ForecastingDynamic harmonic regression (dhr) models