Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

3.2 Autocovariance

Autocovariance Definition

As discussed in the first chapter, serial correlation of time series is both one the most challenging and most valuable aspects of time series. Autocovariance allows us to exploit serial correlation to address questions such as:

To give a concrete example, if we wish to know how strongly today’s highest temperature influences tomorrow’s, we may take the covariance of the series (T0,T1,...,Tn1)(T_{0}, T_{1}, ...,T_{n-1}) and (T1,T2,...,Tn)(T_{1}, T_{2}, ...,T_{n}), i.e.

Cov(i=0n1Ti,j=1nTj).\text{Cov}\Big(\sum_{i=0}^{n-1} T_i, \sum_{j=1}^{n} T_j \Big).

A high positive value for Eq. (1) tells us that higher (lower) temperatures today are indicative of higher (lower) temperatures tomorrow. While implausible for weather, in general many time series exhibit negative correlation, in which case a higher (lower) value at one time step is indicative of a lower (higher) value in the next.

We can naturally extent Eq. (1) to larger time lags to ask how long an anomalous temperature continues to skew the daily high. We will see in subsequent chapters that it is very common for time series to obey an exponentially decaying autocovariance.

Notation

Following the notation used in sources such as Shumway & Stoffer (2025) and Brockwell & Davis (2016), we will use the notation γx(s,t)\gamma_{x}(s, t) to denote the autocovariance at time lags ss and tt for time series xx:

γx(s,t)=E[(xsμs)(xtμt)]=E[xsxt]μsμt\begin{split} \gamma_{x}(s, t) &\stackrel{\triangle}= \mathbb{E}[(x_{s}-\mu_{s})(x_{t}-\mu_{t})]\\ &=\mathbb{E}[x_{s}x_{t}]-\mu_{s}\mu_{t}\\ \end{split}

Note that γx(s,t)=γx(t,s)\gamma_x(s, t)=\gamma_x(t, s).

In general, we will drop the xx subscript when it is obvious which time series we are referring to and only write γ(s,t)\gamma(s, t). For an arbitrarily large (or infinite) observation window, μs=μt\mu_{s}=\mu_{t} as both series share almost all the same observations, leading to the simplification

γ(s,t)=E[xsxt]μ2\gamma(s,t)=\mathbb{E}[x_{s}x_{t}]-\mu^2\\

where we have dropped the subscript on μ\mu.

Mean and Autocovariance of Moving Average

Recall from the first chapter that a white noise process is defined as a mean zero process with finite variance

wtwn(0,σw2).w_t \sim wn(0, \sigma_w^2).

A summation of multiple values of wtw_t will still have a mean of zero

E[+wt1+wt+wt+1+]=+E[wt1]+E[wt]+E[wt+1]+=0.\mathbb{E}[\ldots + w_{t-1}+ w_{t}+w_{t+1}+\ldots] = \dots+\mathbb{E}[w_{t-1}]+\mathbb{E}[w_{t}] + \mathbb{E}[w_{t+1}]+\ldots=0.

Consider the moving average

vt=13(wt1+wt+wt+1).v_t = \frac{1}{3}(w_{t-1}+w_t+w_{t+1}).

vtv_t has a mean of zero, but the autocovariance will depend on the difference between ss and tt:

γv(s,t)={39σw2,s=t29σw2,st=119σw2,st=20,otherwise.\gamma_v(s, t)=\begin{cases} \frac{3}{9}\sigma_w^2, & s=t\\ \frac{2}{9}\sigma_w^2, & |s-t|=1\\ \frac{1}{9}\sigma_w^2, & |s-t|=2\\ 0, & \text{otherwise.}\\ \end{cases}

Note that even if all instances of the initial time series are iid, averaging will still introduce serial correlation.

Mean and Autocovariance of Random Walk

Mean for Zero Drift

Recall that a random walk (without drift) is defined as:

xt=xt1+wtx_t = x_{t-1} + w_t

Iterating backwards, we have

xt=xt1+wt=(xt2+wt1)+wt=xt2+wt1+wt=xt3+wt2+wt1+wt=h=0wth\begin{split} x_t &= x_{t-1} + w_t\\ &= (x_{t-2} + w_{t-1}) + w_t\\ &= x_{t-2} + w_{t-1} + w_t\\ &= x_{t-3}+w_{t-2} + w_{t-1} + w_t\\ &\ldots\\ &=\sum_{h=0}^{\infty} w_{t-h} \end{split}

Of course, in real life we cannot have an infinite number of observations. Calling the first observation w0w_0, this gives

xt=w0+w1++wt=h=0twth\begin{split} x_t&=w_0+w_1+\ldots+w_t\\ &= \sum_{h=0}^{t} w_{t-h} \end{split}

Whether we use Eq. (12) or (13), the expectation value for the mean will always be 0, i.e.

E[xt]=E[wth]=E[wth]=0.\begin{split} \mathbb{E}[x_t] &= \mathbb{E}\Big[\sum w_{t-h}\Big]\\ &= \sum \mathbb{E}[w_{t-h}]\\ &=0. \end{split}

Mean with Drift

A random walk with drift is defined as

xt=δ+xt1+wt,δ0.x_t = \delta + x_{t-1} + w_t, \qquad \delta \neq 0.

Following the same reasoning used above in Eq.s (12) and (13), we can iterate backwards to obtain

xt=δ+xt1+wt=δ+(δ+xt2+wt1)+wt=2δ+xt2+wt1+wt=3δ+xt3+wt2+wt1+wt=tδ+h=0twth\begin{split} x_t &= \delta+x_{t-1} + w_t\\ &= \delta+(\delta+x_{t-2} + w_{t-1}) + w_t\\ &= 2\delta+x_{t-2} + w_{t-1} + w_t\\ &= 3\delta + x_{t-3}+w_{t-2} + w_{t-1} + w_t\\ &\ldots\\ &=t\delta + \sum_{h=0}^{t} w_{t-h} \end{split}

where we have only used the finite case to avoid tδt\delta going to ±\pm \infty.

The expectation value for Eq. (16) is

E[xt]=E[tδ+h=0twth]=E[tδ]+h=0tE[wth]=tδ.\begin{split} \mathbb{E}[x_t] &= \mathbb{E}\Big[t\delta+\sum_{h=0}^t w_{t-h}\Big]\\ &= \mathbb{E}[t\delta]+\sum_{h=0}^t \mathbb{E}[w_{t-h}]\\ &=t\delta. \end{split}

Random Walk Variance

Recall that while adding constants to random variables adjusts the means, it does not affect the variance or covariance, i.e.

V(a+X)=V(X),\mathbb{V}(a+X) = \mathbb{V}(X),

and

Cov(a+X,b+Y)=Cov(X,Y),\text{Cov}(a+X, b+Y) = \text{Cov}(X, Y),

thus the autocovariance of a random walk is independent of drift terms. To simplify notation we will therefore only explicitly address a random walk without drift.

The autocovariance at lags ss and tt is

Cov(i=0swi,j=0twj)=Cov(w0,w0)+Cov(w0,w1)++Cov(ws,wt).\text{Cov}\Big(\sum_{i=0}^s w_i, \sum_{j=0}^t w_j\Big) = \text{Cov}(w_0, w_0) +\text{Cov}(w_0, w_1) +\ldots+\text{Cov}(w_s, w_t).

By the assumption of independent wtw_t’s, the only non-zero terms will be when i=ji=j (i.e. the variances). Assuming without loss of generality that tst\leq s, we have

Cov(i=0swi,j=0twj)=i=0tV(wi),ts=tσw2\begin{split} \text{Cov}\Big(\sum_{i=0}^s w_i, \sum_{j=0}^t w_j\Big) &= \sum_{i=0}^t \mathbb{V}(w_i), \qquad t\leq s\\ &= t\sigma_w^2 \end{split}

From Eq. (23) we see that the variance of a random walk γ(t,t)\gamma(t,t) increases linearly with respect to tt, i.e.

γ(t,t)=tσw2\gamma(t,t) = t\sigma_w^2

equivalently, the standard deviation of a random walk increases with respect to t\sqrt{t}

s.d.=tσw\text{s.d.} = \sqrt{t}\sigma_w

Figure 1 displays how random walks spread out with respect to time. Note how the random walks more closely follow a square root rather than a linear spread, which provides a concrete example of why standard deviation is often favored over variance in analysis.

1,000 simulated random walks with \sigma_w^2=1 and \delta=0.

Figure 1:1,0001,000 simulated random walks with σw2=1\sigma_w^2=1 and δ=0\delta=0.

References
  1. Shumway, R. H., & Stoffer, D. S. (2025). Time Series Analysis and Its Applications: With R Examples. In Springer Texts in Statistics. Springer Nature Switzerland. 10.1007/978-3-031-70584-7
  2. Brockwell, P. J., & Davis, R. A. (2016). Introduction to Time Series and Forecasting. In Springer Texts in Statistics. Springer International Publishing. 10.1007/978-3-319-29854-2