Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

5.2 Classical Decomposition

Components of Time Series

Many time series contain trends, seasonal effects, and/or cycles. Time series decomposition, or simply decomposition, is defined as methods to isolate each component’s contribution to an overall time series. Since cycles lack a specific length it is difficult to draw a clear distinction between an upward trend followed by a downward trend and a cycle. Consequently, we will generally fold cycles into the trend for the purposes of decomposition.

Before diving into the details, let’s get a feeling for what a time series decomposition might look like.

US employment rate from 1948 through 2025 from the Federal Reserve Bank of St. Louis with STL decomposition consisting of original series (top), trends and cycles (second from top), seasonal contribution (third from top), and residual variation not explained (bottom).

Figure 1:US employment rate from 1948 through 2025 from the Federal Reserve Bank of St. Louis with STL decomposition consisting of original series (top), trends and cycles (second from top), seasonal contribution (third from top), and residual variation not explained (bottom).

We’ll delve into the algorithms used to create figures like Figure 1 below and in subsequent chapters. For now, let’s take a moment to understand the components depicted in Figure 1.

  1. TtT_t: The trend component, consisting of both trends and cycles in the data.

  2. StS_t: The seasonal component, consisting of regular periodicity at a specified seasonal length. Note that most decomposition methods only allow a single seasonality.

  3. RtR_t: The residual (sometimes also referred to as the remainder), consisting of any variation not accounted for by TtT_t or StS_t. The residual can be thought of as roughly equivalent to noise wtw_t, though I’d caution against taking this analogy too far. RtR_t is only strictly equivalent to noise if TtT_t and StS_t are a perfect model for the underlying process.

Additive and Multiplicative Models

Most commonly, we assume an additive decomposition in the form of

xt=St+Tt+Rt.x_t = S_t + T_t + R_t.

In some cases, we may also assume a multiplicative decomposition

xt=St×Tt×Rt,x_t = S_t \times T_t \times R_t,

which is more applicable, for example, in forecasting quarterly sales of a product experiencing compound growth in demand such as Mac sales in the previous chapter. Note that we can always recast Eq. (2) into the form of Eq. (1) by using a log transform

log(xt)=log(St)+log(Tt)+log(Rt).\log{(x_t)} = \log{(S_t)} + \log{(T_t)} + \log{(R_t)}.

While other models are possible, for example

xt=St+Tt×Rt,x_t = S_t + T_t \times R_t,

such models are almost never used in practice if for no other reason than that the difficulty in interpreting the output can easily defeat the original purpose of time series decomposition.

Classical Decomposition Algorithm

Defining Seasonal Length

Before using the algorithm we must first define our seasonal period mm. Common values of mm include m=7m=7 for weekly data, m=12m=12 for monthly data, and m=24m=24 for hourly data. Note that we cannot define multiple seasonal effects in a given decomposition.

Recall that a moving average with the length of a season (or any multiple thereof) will remove seasonal effects, allowing us to focus on longer term cycles and trends. For seasonal effects with odd numbers of time steps per season such as a m=7m=7 for weekly seasonality, we employ a simple centered moving average of length mm. Thus, for data starting on a Sunday, the first Wednesday will be replaced with the average of the first Sunday-Saturday, the first Thursday will be replaced by the average of the the first Monday-Sunday, and so on. Note the we do lose the first and last m12\frac{m-1}{2} observations for odd mm or—as we will see shortly—the first and last m2\frac{m}{2} for even mm. This is not an issue for long time series in which the number of observations n>>mn>>m, but does pose a challenge for shorter time series.

77-Day Moving Average for Weeks 1n1-n

Original Day of Week (Week Number)Value in Smoothed Series
Sunday (1)
Monday (1)
Tuesday (1)
Wednesday (1)17(Sunday (1)+Monday (1)+Tuesday (1)+Wednesday (1)+Thursday (1)+Friday (1)+Saturday (1))\frac{1}{7}\big(\text{Sunday (1)}+\text{Monday (1)}+\text{Tuesday (1)}+\text{Wednesday (1)}+\text{Thursday (1)}+\text{Friday (1)}+\text{Saturday (1)}\big)
Thursday (1)17(Monday (1)+Tuesday (1)+Wednesday (1)+Thursday (1)+Friday (1)+Saturday (1)+Sunday (2))\frac{1}{7}\big(\text{Monday (1)}+\text{Tuesday (1)}+\text{Wednesday (1)}+\text{Thursday (1)}+\text{Friday (1)}+\text{Saturday (1)}+\text{Sunday (2)}\big)
Friday (1)17(Tuesday (1)+Wednesday (1)+Thursday (1)+Friday (1)+Saturday (1)+Sunday (2)+Monday (2))\frac{1}{7}(\text{Tuesday (1)}+\text{Wednesday (1)}+\text{Thursday (1)}+\text{Friday (1)}+\text{Saturday (1)}+\text{Sunday (2)}+\text{Monday (2)})
\vdots\vdots
Monday (nn)17(Friday (n1)+Saturday (n1)+Sunday (n)+Monday (n)+Tuesday (n)+Wednesday (n)+Thursday (n))\frac{1}{7}\big(\text{Friday (}n-1)+\text{Saturday (}n-1)+\text{Sunday (}n)+\text{Monday (}n)+\text{Tuesday (}n)+\text{Wednesday (}n)+\text{Thursday (}n)\big)
Tuesday (nn)17(Saturday (n1)+Sunday (n)+Monday (n)+Tuesday (n)+Wednesday (n)+Thursday (n)+Friday (n))\frac{1}{7}\big(\text{Saturday (}n-1)+\text{Sunday (}n)+\text{Monday (}n)+\text{Tuesday (}n)+\text{Wednesday (}n)+\text{Thursday (}n)+\text{Friday (}n)\big)
Wednesday (nn)17(Sunday (n)+Monday (n)+Tuesday (n)+Wednesday (n)+Thursday (n)+Friday (n)+Saturday (n))\frac{1}{7}\big(\text{Sunday (}n)+\text{Monday (}n)+\text{Tuesday (}n)+\text{Wednesday (}n)+\text{Thursday (}n)+\text{Friday (}n)+\text{Saturday (}n)\big)
Thursday (nn)
Friday (nn)
Saturday (nn)

For even values of mm such as quarterly data, it is impossible to center a moving average of length mm. As a compromise, we use a 2×m2\times m moving average. A 2×m2\times m moving average is a moving average of moving averages, for example, a 2×42\times 4 moving average is defined as:

T^t=12(14[xt2+xt1+xt+xt+1]+14[xt1+xt+xt+1+xt+2])=18xt2+14xt1+14xt+14xt+1+18xt+2\begin{split} \hat{T}_t &= \frac{1}{2} \Big(\frac{1}{4} [x_{t-2}+x_{t-1}+x_t+x_{t+1}] + \frac{1}{4} [x_{t-1}+x_{t}+x_{t+1}+x_{t+2}]\Big)\\ &=\frac{1}{8}x_{t-2} + \frac{1}{4}x_{t-1} + \frac{1}{4}x_{t} + \frac{1}{4}x_{t+1} + \frac{1}{8}x_{t+2} \end{split}

A 2×m2\times m moving average allows us to center each observation by using an odd m+1m+1 length window, weighting the first and last observation by 12m\frac{1}{2m} and the others by 1m\frac{1}{m}.

Quarterly Moving Average for Years 1n1-n

Original Quarter (Year Number)Value in Smoothed Series
First (1)
Second (1)
Third (1)18First (1)+14Second (1)+14Third (1)+14Fourth (1)+18First (2)\frac{1}{8}\text{First (1)} + \frac{1}{4}\text{Second (1)} + \frac{1}{4}\text{Third (1)} + \frac{1}{4}\text{Fourth (1)} + \frac{1}{8}\text{First (2)}
Fourth (1)18Second (1)+14Third (1)+14Fourth (1)+14First (2)+18Second (2)\frac{1}{8}\text{Second (1)} + \frac{1}{4}\text{Third (1)} + \frac{1}{4}\text{Fourth (1)} + \frac{1}{4}\text{First (2)} + \frac{1}{8}\text{Second (2)}
\vdots\vdots
First (nn)18Third (n1)+14Fourth (n1)+14First (n)+14Second (n)+18Third (n)\frac{1}{8}\text{Third (}n-1) + \frac{1}{4}\text{Fourth (}n-1) + \frac{1}{4}\text{First (}n) + \frac{1}{4}\text{Second (}n) + \frac{1}{8}\text{Third (}n)
Second (nn)18Fourth (n1)+14First (n)+14Second (n)+14Third (n)+18Fourth (n)\frac{1}{8}\text{Fourth (}n-1) + \frac{1}{4}\text{First (}n) + \frac{1}{4}\text{Second (}n) + \frac{1}{4}\text{Third (}n) + \frac{1}{8}\text{Fourth (}n)
Third (nn)
Fourth (nn)

The smoothed series is used as our estimate of the series trend TtT_t, denoted as T^t\hat{T}_t to emphasize that it is an estimate to the true TtT_t. From here on, classical decomposition subdivides into additive and multiplicative methods.

Additive Method

Having obtained our estimate of the trend T^t\hat{T}_t, we are now ready to estimate the seasonality. We first detrend the series by subtracting T^t\hat{T}_t

xt,detrend=xtT^tx_{t, detrend} = x_t - \hat{T}_t

We then obtain our estimate of the seasonal component S^t\hat{S}_t by taking the average of each detrended value of that season (e.g. the average detrended Thursday or average detrended May). The individual seasonal components are adjusted for an overall baseline of zero, i.e.

tS^t0.\sum_{t}\hat{S}_t \approx 0.

For example, our baseline temperature might be 60°F60°\, F, with the summer being 30°30° higher and the winter 30°30° lower.

Having obtained our estimates T^t\hat{T}_t and S^t\hat{S}_t, the estimated residual R^t\hat{R}_t is simply what’s left after subtracting the estimated trend and seasonality

R^t=xtT^tS^t.\hat{R}_t = x_t - \hat{T}_t - \hat{S}_t.

Multiplicative Methods

In certain scenarios, in particular with time series exhibiting exponential growth, a multiplicative decomposition may be more appropriate. While we can always apply additive decomposition to the logarithm of the original series, classical decomposition is capable of directly estimating the multiplicative decomposition (Eq. (2)).

As with the additive case, we begin by obtaining our estimated trend T^t\hat{T}_t via a moving average. We detrend the series via division

xt,detrend=xtT^tx_{t, detrend} = \frac{x_t}{\hat{T}_t}

We obtain our estimated seasonal component S^t\hat{S}_t by averaging each detrended season as before. For multiplicative decomposition the individual seasonal components are adjusted for a baseline of one, i.e.

tS^t1,\prod_{t}\hat{S}_t \approx 1,

For example the first quarter might have a seasonal component value of 125%125\% while the third quarter might have a component value of 80%80\%.

The residual is simply what remains after dividing out the estimated trend and seasonal effects

R^t=xtT^tS^t.\hat{R}_t = \frac{x_t}{\hat{T}_t\hat{S}_t}.

Plotting Decomposition Components

Above, we accessed the components of the classical decomposition and plotted them individually. statsmodels does have a nice feature to automatically plot the full decomposition invoked (using the variable names from above) as follows:

classic_decomp.plot()

This should provide you with a plot like

US employment rate from 1948 through 2025 from the Federal Reserve Bank of St. Louis with additive classical decomposition consisting of original series (top), trends and cycles (second from top), seasonal contribution (third from top), and residual variation not explained (bottom).

Figure 2:US employment rate from 1948 through 2025 from the Federal Reserve Bank of St. Louis with additive classical decomposition consisting of original series (top), trends and cycles (second from top), seasonal contribution (third from top), and residual variation not explained (bottom).

Assessing Decomposition Quality

A natural question to ask is how to assess the quality of a time series decomposition. Assuming that Eq. (1) or Eq. (2) is a reasonable model of the underlying data, a better decomposition can be expected to result in an estimated residual R^t\hat{R}_t far smaller than the trend and seasonal components. While visual inspection of a plot such as Figure 2 or Figure 1 (paying careful attention to the scale of the y-axes!) is a valuable starting point, in certain scenarios we may wish to have a more quantitative metric.

Hyndman et al. (2026) recommend the following formula for estimating the strength of the trend

FT=max(0,1V(R^t)V(T^t+R^t)),F_T = \max\Bigg(0, 1-\frac{\mathbb{V}(\hat{R}_t)}{\mathbb{V}(\hat{T}_t + \hat{R}_t)}\Bigg),

the logic being that for a time series with a strong trend component, the variance of the residual component alone should be much smaller than the variance of the combined trend and residual components. Thus, for data with a strong trend that has been well isolated by T^t\hat{T}_t, FTF_T will be close to 1. FTF_T will be close to 0 for data with a minimal trend and/or a poorly isolated T^t\hat{T}_t[1].

Computing FTF_T from the unemployment data above using the formula

max(0, 1-(np.var(classic_decomp.resid)/np.var(classic_decomp.trend+classic_decomp.resid)))

gives a value of 0.928, indicating a strong trend component that has been well isolated by T^t\hat{T}_t.

The strength of the seasonal component FSF_S is computed in the same manner as FTF_T using the equation

FS=max(0,1V(R^t)V(S^t+R^t)).F_S = \max\Bigg(0, 1-\frac{\mathbb{V}(\hat{R}_t)}{\mathbb{V}(\hat{S}_t + \hat{R}_t)}\Bigg).

Calculating FSF_S using the code

max(0, 1-(np.var(classic_decomp.resid)/np.var(classic_decomp.seasonal +classic_decomp.resid)))

gives a value of 0.379. While not terrible, compared to FTF_T this value indicates either (1) a weaker seasonal component (unlikely based on visual examination of the original data), or that (2) our decomposition has not done as good a job isolating the seasonal component, resulting in S^t\hat{S}_t having a weaker contribution to the estimated decomposition shown in Figure 2.

Drawbacks to Classical Decomposition

While not a terrible method, classical decomposition does have a number of drawbacks that make algorithms such as STL preferable. The major issues (highlighted in the derivations and exercises above) are as follows:

  1. Use of a moving average removes the first and last m2\frac{m}{2} (mm even) or m12\frac{m-1}{2} (mm odd) observations from the trend (and consequently also the seasonal and residual components). For short time series this will result in sacrificing a fair amount of potential insight into the time series’ behavior.

  2. Averaging each season across the data assumes there is exactly one seasonal value per season that is constant across the entire time series, rather than allowing the seasonal contribution itself to be a function of time.

  3. Classical decomposition is not robust to brief but extreme fluctuations such as the COVID pandemic and its effect on unemployment rates.

For these reasons, most statistics texts recommend against using the classical method for anything beyond a baseline to compare to more advanced methods. In the coming sections we will build on our analysis of classical decomposition to understand STL decomposition and how it addresses the issues above.

Footnotes
  1. The quantity V(R^t)V(T^t+R^t)\frac{\mathbb{V}(\hat{R}_t)}{\mathbb{V}(\hat{T}_t + \hat{R}_t)} could conceivably be greater than 1 if T^t\hat{T}_t and R^t\hat{R}_t have a strong negative covariance. Eq. (12) ensures that FT[0,1]F_T\in[0,1] by use of the maximum of 0 or the computed value.

References
  1. Hyndman, R. J., Athanasopoulos, G., Garza, A., Challu, C., Mergenthaler, M., & Olivares, K. G. (2026). Forecasting: Principles and Practice, the Pythonic Way. OTexts.