Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

3.6 Estimation of Autocovariance, Autocorrelation, and Cross-Correlation

Thus far, we have dealt with the theoretical values for various forms of variance, covariance, and correlation based on the assumption that we know the underlying process from which our observations have been drawn. In real life, we will usually be given the time series and then attempt to infer aspects such as the underlying process from statistical estimators. In this section we will develop techniques analogous to how sample covariance and sample correlation are used to estimate the true covariance and correlation in data science and statistics.

Estimate Definitions

Sample Autocovariance

The standard formula used in sources such as Shumway & Stoffer (2025) and for sample autocovariance is

γ^(h)=1nt=0nh1(xt+hxˉ)(xtxˉ)\hat{\gamma}(h) \stackrel{\triangle}{=} \frac{1}{n}\sum_{t=0}^{n-h-1} (x_{t+h}-\bar{x})(x_t-\bar{x})

where xˉ\bar{x} is the sample mean and the “hat” notation indicates that γ^\hat{\gamma} is an estimate of the true population value. Note that we assume the same mean for xtx_t and xt+hx_{t+h}.

Sample Autocorrelation

Autocorrelation is estimated by defining the sample autocorrelation

ρ^(h)=γ^(h)γ^(0).\hat{\rho}(h) \stackrel{\triangle}{=} \frac{\hat{\gamma}(h)}{\hat{\gamma}(0)}.

Sample Cross-Covariance

The sample cross-covariance is defined as

γ^x,y(h)=1nt=0nh1(xt+hxˉ)(ytyˉ)\hat{\gamma}_{x,y}(h) \stackrel{\triangle}{=} \frac{1}{n}\sum_{t=0}^{n-h-1} (x_{t+h}-\bar{x})(y_t-\bar{y})

and the sample cross-correlation as

ρ^x,y(h)=γ^x,y(h)γ^x(0)γ^y(0).\hat{\rho}_{x,y}(h) \stackrel{\triangle}{=} \frac{\hat{\gamma}_{x,y}(h)}{\sqrt{\hat{\gamma}_x(0)\hat{\gamma}_y(0)}}.

Properties of Estimators

Positive Semidefiniteness

You may have noticed something strange about Eq. (1). For nn observations, we only have nhn-h pairs to sum over; for example when n=100n=100 and h=2h=2 there are 98 pairs consisting of (x0,x2),(x1,x3),,(x97,x99)(x_0,x_2), (x_1,x_3),\ldots,(x_{97}, x_{99}). Why then, do we divide Eq. (1) by nn rather than nhn-h?[1]

The answer relates to the necessity that the autocovariance matrix (and by extension autocorrelation matrix) be positive semidefinite in order to avoid the possibility of generating negative variances. To see that the form used in Eq. (1) guarantees this property, let us sketch the proof from Brockwell & Davis (2016)[2]. Let us define the matrix X\mathbf{X} using the demeaned time series such that

(1nX)2=Γ,\Big(\frac{1}{\sqrt{n}}\mathbf{X}\Big)^2 = \boldsymbol{\Gamma},

where the elements of XX are the vectors [,x0xˉ,x1xˉ,x2xˉ,][\ldots, x_0-\bar{x}, x_1-\bar{x}, x_2-\bar{x}, \ldots] appropriately padded with zeros. Put slightly differently, we have now factored Γ\boldsymbol{\Gamma} such that

Γ=1nXTX.\boldsymbol{\Gamma} = \frac{1}{n}\mathbf{X}^T\mathbf{X}.

Since we have expressed Γ\boldsymbol{\Gamma} as the square of X\mathbf{X}, for any vector v\mathbf{v} we have

vTΓv=1nvTXTXv=1n(Xv)TXv=1nXv20.\begin{split} \mathbf{v}^T\boldsymbol{\Gamma}\mathbf{v} &= \frac{1}{n} \mathbf{v}^T \mathbf{X}^T \mathbf{X}\mathbf{v}\\ &= \frac{1}{n}\big(\mathbf{X}\mathbf{v}\big)^T \mathbf{X}\mathbf{v}\\ &= \frac{1}{n} ||\mathbf{X}\mathbf{v}||^2\\ &\geq0. \end{split}

The central point to understand is that we could only factor Γ\boldsymbol{\Gamma} because we could pull out 1n\frac{1}{n} from every entry. If we separately weighted each γ(h)\gamma(h) in Γ\boldsymbol{\Gamma} by nhn-h, we would no longer be able to express the autocovariance matrix Γ\boldsymbol{\Gamma} as the product of two other matrices. Consequently, we would have no guarantee that Γ\boldsymbol{\Gamma} was positive semidefinite, and could end up with negative variances.

Bias and Consistency

Note that neither dividing by nn nor nhn-h creates an unbiased estimator, i.e. in neither case is it true that

E[γ^(h)]=γ(h)\mathbb{E}[\hat{\gamma}(h)] = \gamma(h)

though when dividing by nhn-h the bias will be smaller.

The estimates are both, however, consistent in that in as nn\rightarrow\infty

γ^(h)Pγ(h).\hat{\gamma}(h) \stackrel{P}{\rightarrow} \gamma(h).

As observed in Wasserman (2004), most modern statistics texts consider consistency to be sufficient and are less concerned with estimators being unbiased.

Standard Error of Autocorrelation and Cross-Correlation

A natural question to ask is how we might determine if an autocorrelation value is statistically significant. We previously touched on the fact that programs such as statsmodels estimate statistical significance automatically. But we have not yet explained how significance is estimated.

Sample autocorrelation plot from statsmodels with blue shading indicating area of statistical insignificance.

Figure 2:Sample autocorrelation plot from statsmodels with blue shading indicating area of statistical insignificance.

It turns out that in order to definitively calculate statistical significance, we would need to know the underlying process that generated our data—exactly the question we are trying to answer in the first place. Instead, let’s ask a slightly different question: What values should we expect for the autocorrelation if our time series is pure white noise?

Under the assumption of white noise[3], we can expect the sample autocorrelation values to be normally distributed with mean 0 and variance 1n\frac{1}{n}

ρ^(h)N(0,1n)\hat{\rho}(h) \sim \mathcal{N}\Big(0,\frac{1}{n}\Big)

where nn is the total number of observations (Box et al. (2008)). Equivalently, the standard error of ρ^(h)\hat{\rho}(h) is given by

se(ρ^(h))=1n.\text{se}(\hat{\rho}(h))=\frac{1}{\sqrt{n}}.

For a white noise process, we expect around 95%95\% of all sample autocorrelation values to fall within ±2n\pm\frac{2}{\sqrt{n}}. By default, statsmodels usually uses Eq. (13) at a 95%95\% confidence interval to estimate statistical significance to create figures such as Figure 2[4]. It can be shown that this formula also holds for estimating the sample cross-correlation ρ^x,y(h)\hat{\rho}_{x,y}(h) (Shumway & Stoffer (2025)).

Footnotes
  1. While Eq. (1) is the default method for calculating sample autocovariance in statsmodels, the autocovariance and autocorrelation (found in statsmodels.tsa.stattools.acovf and statsmodels.tsa.stattools.acf, respectively) have an argument adjusted that can be set to True in order to divide by nhn-h instead. By default, both functions set adjusted=False (i.e. dividing by nn by default). The cross-covariance and cross-correlation functions in statsmodels default to adjusted=True as they do not share the same requirement to be positive semidefinite. Our definition in Eq. (6) follows sources such as Box et al. (2008) and Shumway & Stoffer (2025).

  2. The full proof is unnecessary for our purposes, but can be found in Sec. 2.4.2 of the referenced text for those who are interested.

  3. Strictly, this formula is only valid for noise with a finite fourth moment such as Gaussian white noise (Shumway & Stoffer (2025)). As the formula is in any event an approximation for any non-white noise time series we will not be too concerned about this point.

  4. statsmodels can be forced to always use this method by setting bartlett_confint=False.

References
  1. Shumway, R. H., & Stoffer, D. S. (2025). Time Series Analysis and Its Applications: With R Examples. In Springer Texts in Statistics. Springer Nature Switzerland. 10.1007/978-3-031-70584-7
  2. Brockwell, P. J., & Davis, R. A. (2016). Introduction to Time Series and Forecasting. In Springer Texts in Statistics. Springer International Publishing. 10.1007/978-3-319-29854-2
  3. Wasserman, L. (2004). All of Statistics: A Concise Course in Statistical Inference. In Springer Texts in Statistics. Springer New York. 10.1007/978-0-387-21736-9
  4. Box, G. E. P., Jenkins, G. M., & Reinsel, G. C. (2008). Time Series Analysis. In Wiley Series in Probability and Statistics. Wiley. 10.1002/9781118619193