Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

4.5 Increasing Variance and the Log Transform

Most of the time series we’ve seen so far have had constant variance across time[1]. In some scenarios, particularly those involving exponential growth, variance may increase concomitantly with the value. For example, as overall air travel increases, the absolute difference between busier and less busy seasons will also increase (even if the relative difference is constant). Such time series are non-stationary for at least two (and often three) reasons:

  1. The mean is non-constant over time.

  2. The variance γ(0)\gamma(0) is non-constant over time.

  3. Some examples such as air travel may also be non-stationary due to seasonal effects.

Apple Mac Sales

Quarterly sales figures for Apple Mac computers in millions from GitHub Apple Data Repository.

Figure 1:Quarterly sales figures for Apple Mac computers in millions from GitHub Apple Data Repository.

Figure 1 appears to be non-stationary for all three reasons listed above. The mean shows a steady upward trend and the volumes appear to exhibit a seasonal effect with higher sales in the autumn and winter and lower sales in the spring and summer. As the size of the seasonal effect appears to depend on the overall trend, the variance also increases with time.

Let’s first address the increasing variance by replacing the raw values with their logarithms:

Logarithm of quarterly sales figures for Apple Mac computers in millions from GitHub Apple Data Repository.

Figure 2:Logarithm of quarterly sales figures for Apple Mac computers in millions from GitHub Apple Data Repository.

Figure 2 is beginning to look better—the variance appears to be roughly constant across time. We can address both the trend and seasonality by taking the first and fourth differences of the log values, i.e.

zt=4(log(xt))z_t \overset{\triangle}{=} \nabla_4\,\nabla\big(\log{(x_t)}\big)
First and fourth differences of logarithm of quarterly sales figures for Apple Mac computers in millions from GitHub Apple Data Repository.

Figure 3:First and fourth differences of logarithm of quarterly sales figures for Apple Mac computers in millions from GitHub Apple Data Repository.

Figure 3 certainly looks stationary upon visual inspection. After we’ve covered autoregressive processes we will revisit this example with statistical tests designed to help determine stationarity.

Log Difference and Returns

In Eq. (1) we used the first (and fourth) difference of the logarithm of the sales figures. This concept closely relates to the return, a unitless quantity defined as the difference in value between successive time steps divided by the previous time step, i.e.

rt=xtxt1xt1.r_t \overset{\triangle}{=} \frac{x_t-x_{t-1}}{x_{t-1}}.

To see this relation, let us rewrite Eq. (5) as

xtxt1=1+rt.\frac{x_t}{x_{t-1}} = 1+r_t.

We now have

log(xt)=log(xt)log(xt1)=log(xtxt1)=log(1+rt)rt\begin{split} \nabla \log{(x_t)} &= \log{(x_t)}-\log{(x_{t-1})}\\ &= \log{\Big(\frac{x_t}{x_{t-1}}\Big)}\\ &= \log(1+r_t)\\ &\approx r_t \end{split}

where we have used the Taylor expansion

log(1+rt)=rtrt22+\log{(1+r_t)} = r_t -\frac{r_t^2}{2}+\dots

Under the assumption of small rtr_t, quadratic and higher order terms will be negligible. Figure 4 demonstrates that the two functions are essentially identical for returns with an absolute value less than about 20%20\%. For this reason, log(xt)\nabla\log{(x_t)} is often used interchangeably with the return defined in Eq. (5).

Linear slope y=r_t (gray) and logarithmic curve y=\log{(1+r_t)} (blue) for r_t\in[-0.5, 0.5].

Figure 4:Linear slope y=rty=r_t (gray) and logarithmic curve y=log(1+rt)y=\log{(1+r_t)} (blue) for rt[0.5,0.5]r_t\in[-0.5, 0.5].

Footnotes
  1. A random walk is an important exception. Random walks have a linearly increasing variance with respect to time.