Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

2.4 Properties of Variance and Covariance

We will make heavy use of both variance and covariance (in particular the autocovariance) throughout the book. This chapter presents a refresher on these topics laying the foundation for the forms used in time series analysis discussed in the next chapter.

Operators

There are several operators we will encounter in this book. Values such as mean, variance, and covariance can all be cast in the operator formalism. In later chapters we will introduce the backshift and Fourier operators. So what is an operator?

An operator O\mathbb{O} is defined as a rule that maps a member of a set to another member. Thus, an operator could just be a function such as 5, defined as multiplying by 5 (i.e. f(x)=5xf(x)=5\,x)[1]. However, the most common operators we will use map one function to another function. Two such operators are differentiation ddx\frac{d}{dx} and integration dx\int dx.

Linear operators are of particular interest. An operator is a linear operator if it fulfills the following two conditions:

  1. OaF(x)=aOF(x)\mathbb{O}\, a F(x) = a\,\mathbb{O}\, F(x) for any constant aa

  2. O(F(x)+G(y))=OF(x)+OG(y)\mathbb{O}\, (F(x) + G(y)) = \mathbb{O}\,F(x) + \mathbb{O}\,G(y)

We may write the two conditions more succinctly as

O(aF(x)+bG(y))=aOF(x)+bOG(y)\mathbb{O}\, (aF(x) + bG(y)) = a\,\mathbb{O}\,F(x) + b\,\mathbb{O}\,G(y)

for any constants aa and bb.

Going forward, we will assume that all operators discussed are linear unless otherwise noted.

Expectation

The most important operator in statistics and data science is the expectation operator E[F(x)]\mathbb{E}[F(x)], usually first encountered in the context of the arithmetic mean.

Expectation is defined as:

E[F(x)]={xF(x)P(x)discrete xF(x)P(x)dxcontinuous x\mathbb{E}[F(x)] \stackrel{\triangle}= \begin{cases} \sum_{x} F(x)P(x) & \text{discrete}\ x \\ \int F(x)P(x)\, dx & \text{continuous}\ x \\ \end{cases}

where P(x)P(x) is either the probability function of the random variable xx such that

P(a<X<b)=a<xi<bP(xi)\mathbb{P}(a< X < b)=\sum_{a < x_i < b} P(x_i)

(discrete xx) or the probability density function such that

P(a<X<b)=abP(x)dx\mathbb{P}(a<X<b)=\int_a^b P(x)\,dx

(continuous xx).

In general, we will not explicitly reference both the discrete and continuous cases in this book. Instead, we will use the notation E\mathbb{E} or one of the two methods in Eq. (3) with an understanding that a reference to either one implicitly refers to both unless specified otherwise.

Linearity of Expectation

An important property of expectation is that it is a linear operator. We can demonstrate this fact by proving that E[aF(x)+bG(y)]=aE[F(x)]+bE[G(y)]\mathit{\mathbb{E}[a\,F(x) + b\,G(y)]} = \mathit{a\,\mathbb{E}[F(x)] + b\,\mathbb{E}[G(y)]} as follows:

E[aF(x)+bG(y)]=(aF(x)+bG(y))P(x,y)dxdy=aF(x)P(x,y)dxdy+bG(y)P(x,y)dxdy=aF(x)P(x,y)dxdy+bG(y)P(x,y)dxdy=aF(x)P(x)dx+bG(y)P(y)dy=aE[F(x)]+bE[G(y)]\begin{split} \mathit{\mathbb{E}[a\,F(x) + b\,G(y)]} &= \int \int (a\,F(x) + b\,G(y))\, P(x, y)\, dx\, dy \\ &= \int \int a\,F(x)\, P(x, y)\, dx\, dy + \int \int b\,G(y)\, P(x, y)\, dx\, dy \\ &= a\int \int F(x)\, P(x, y)\, dx\, dy + b\int \int G(y)\, P(x, y)\, dx\, dy \\ &= a \int F(x)\, P(x)\, dx + b\int G(y)\, P(y)\, dy \\ &= a\,\mathbb{E}[F(x)] + b\,\mathbb{E}[G(y)] \end{split}

where in the fourth line we have used the definition of marginal probabilities: P(x)=P(x,y)dyP(x) = \int P(x, y)\, dy and P(y)=P(x,y)dxP(y) = \int P(x, y)\, dx.

Moments

We have not yet specified the identity of F(x)F(x). By far, the most commonly used functions are powers of xx, i.e.

F(x)=xn,n=0,1,2,F(x) = x^n, \qquad n=0,1,2,\ldots

The expectation value of Eq. (9) is the nthn\text{th} moment of P(x)P(x):

E[xn]=xnP(x)dx,\mathbb{E}[x^n] = \int x^n P(x)\, dx,

where we are usually interested in cases of n=14n=1-4[2]:

  1. The first moment is the arithmetic mean denoted as μx\mu_x.

  2. The second moment relates to the variance, or how widely spread the distribution is.

  3. The third moment relates to the skew, or how symmetric the function is.

  4. The fourth moment relates to the kurtosis, or the “fatness” of the tails.

Finite Moments

An important theorem states that if the kthk\text{th} moment E[xk]\mathbb{E}[x^{k}] is finite, then all moments j<kj<k must also be finite. As a corollary, if E[xk]\mathbb{E}[x^{k}] is infinite, all moments m>km>k must also be infinite.

Proof: Let E[xk]\mathbb{E}[x^{k}] be finite and j<kj<k (\forall is read as “for all”)

E[xj]=xjP(x)dx=1xjP(x)dx+11xjP(x)dx+1xjP(x)dxnote that xjxk x1 and  xj1x11xkP(x)dx+111P(x)dx+1xkP(x)dx1xkP(x)dx+1+1xkP(x)dx<\begin{split} \mathbb{E}[x^{j}] &= \int_{-\infty}^{\infty} x^{j} P(x)\, dx \\ &= \int_{-\infty}^{-1} x^{j} P(x)\,dx + \int_{-1}^{1} x^{j} P(x)\, dx + \int_{1}^{\infty} x^{j} P(x)\, dx\\ &\text{note that } |x^{j}| \leq |x^{k}| \, \forall\, \ |x| \geq 1\ \text{and } \ |x^{j}| \leq 1 \, \forall\, |x| \leq 1 \\ &\leq \Big|\int_{-\infty}^{-1} x^{k} P(x) \,dx\Big| + \int_{-1}^{1} 1 \, P(x)\, dx + \int_{1}^{\infty} x^{k} P(x) \,dx\\ &\leq \Big|\int_{-\infty}^{-1} x^{k} P(x)\, dx\Big| + 1 + \int_{1}^{\infty} x^{k} P(x)\, dx\\ &< \infty \\ %&\text{Q.E.D.} \end{split}

where we have used the fact that 111P(x)dx1\int_{-1}^{1} 1 P(x)\, dx \leq 1, with equality only occurring if the entire probability mass is contained in the interval [1,1][-1,1].

Variance

The variance of a random variable XX is defined as

V(X)=E[(Xμx)2],\mathbb{V}(X)\stackrel{\triangle}= \mathbb{E}[(X-\mu_{x})^{2}],

and is often denoted as σx2\sigma_x^2. The variance gives us a measure of how widely the distribution is spread about the mean. In practice, we more commonly make use of the standard deviation σx\sigma_x, which is simply the square root of the variance.

As written, the variance is slightly different than our definition of the second moment.

By exploiting the linearity of expectation we can express Eq. (17) using the first and second moments exclusively:

E[(Xμx)2]=E[X22Xμx+μx2]=E[X2]E[2Xμx]+E[μx2]=E[X2]2μxE[X]+μx2E[1]=E[X2]2μx2+μx2=E[X2]μx2=E[X2](E[X])2\begin{split} \mathbb{E}[(X-\mu_{x})^{2}] &=\mathbb{E}[X^{2}-2X\mu_{x}+\mu_{x}^{2}] \\ &=\mathbb{E}[X^{2}]- \mathbb{E}[2X\mu_{x}] + \mathbb{E}[\mu_{x}^{2}] \\ &=\mathbb{E}[X^{2}]- 2\mu_{x}\mathbb{E}[X] + \mu_{x}^{2}\mathbb{E}[1] \\ &=\mathbb{E}[X^{2}]- 2\mu_{x}^{2} + \mu_{x}^{2}\\ &=\mathbb{E}[X^{2}]- \mu_{x}^{2}\\ &=\mathbb{E}[X^{2}]- (\mathbb{E}[X])^{2}\\ \end{split}

Covariance

The covariance of two random variables X,YX, Y is defined as

Cov(X,Y)=E[(Xμx)(Yμy)].\text{Cov}(X, Y) \stackrel{\triangle}= \mathbb{E}[(X-\mu_{x})(Y-\mu_{y})].

Covariance is also written as σx,y\sigma_{x, y}. Note that Cov(X,X)=V(X)\text{Cov}(X, X) = \mathbb{V}(X).

Unlike variance, which is never negative, covariance can be negative, zero, or positive.Following the same logic used in Eq. (18), we can also express the covariance as

Cov(X,Y)=E[XY]E[X]E[Y]\text{Cov}(X, Y) = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y]

Covariance gives us a measure of how much random variables change in tandem with one another.

Variance of Sums of Random Variables

One of the most fundamental aspects of time series analysis is understanding the variance of the sum of random variables. Let us begin with the variance of a sum of two random variables, XX and YY.

Variance of Multiple Random Variables

V(X+Y)=E[(X+Yμxμy)2]=E[X2]+E[Y2]+2E[XY]2E[Xμx]2E[Xμy]2E[Yμx]2E[Yμy]+2E[μxμy]+E[μx2]+E[μy2]=E[X2]+E[Y2]+2E[XY]μx2μy22μxμy=(E[X2]μx2)+(E[Y2]μy2)+2(E[XY]μxμy)=V(X)+V(Y)+2Cov(X,Y)\begin{split} \mathbb{V}(X+Y) &= \mathbb{E}[(X+Y-\mu_{x}-\mu_{y})^{2}]\\ &= \mathbb{E}[X^{2}] + \mathbb{E}[Y^{2}] + 2\mathbb{E}[XY] - 2\mathbb{E}[X\mu_{x}]\\ &- 2\mathbb{E}[X\mu_{y}] - 2\mathbb{E}[Y\mu_{x}]- 2\mathbb{E}[Y\mu_{y}] + 2\mathbb{E}[\mu_{x}\mu_{y}]\\ &+\mathbb{E}[\mu_{x}^{2}] +\mathbb{E}[\mu_{y}^{2}] \\ &= \mathbb{E}[X^{2}] + \mathbb{E}[Y^{2}] + 2\mathbb{E}[XY] - \mu_{x}^{2} -\mu_{y}^{2} - 2\mu_{x}\mu_{y}\\ &= (\mathbb{E}[X^{2}] -\mu_{x}^{2}) + (\mathbb{E}[Y^{2}] -\mu_{y}^{2}) + 2(\mathbb{E}[XY]-\mu_{x}\mu_{y})\\ &= \mathbb{V}(X) + \mathbb{V}(Y) + 2\,\text{Cov}(X, Y) \end{split}

The last line of Eq. (21) can be rewritten as

Cov(X,X)+Cov(Y,Y)+Cov(X,Y)+Cov(Y,X),\text{Cov}(X, X) + \text{Cov}(Y,Y) + \text{Cov}(X,Y) + \text{Cov}(Y, X),

or, renaming the variables as X0X_{0} and X1X_{1}

i,j=01Cov(Xi,Xj).\sum_{i, j=0}^{1} \text{Cov}(X_{i}, X_{j}).

The above suggests (though does not prove) that the variance of a sum of variables is the sum of all covariance combinations. For random variables X0,X1,X2,,Xn1X_{0}, X_{1}, X_{2}, \ldots,X_{n-1}:

V(i=0n1Xi)=i,j=0n1Cov(Xi,Xj)=i=0n1V(Xi)+2i=0,j>iCov(Xi,Xj)\begin{split} \mathbb{V}\biggl(\sum_{i=0}^{n-1} X_{i}\biggr) &= \sum_{i, j=0}^{n-1} \text{Cov}(X_{i}, X_{j})\\ &= \sum_{i=0}^{n-1} \mathbb{V}(X_{i}) + 2\sum_{i=0,\, j>i} \text{Cov}(X_{i}, X_{j}) \end{split}

where the last term sums ii to n2n-2 and jj to n1n-1. Eq. (24) can be proven in the same manner as Eq. (21), though the algebra gets rather intricate. We present a more direct proof in the following problem.

Variance of Independent Variables

From Eq. (24) we can see that if all variables have zero covariance (most commonly due to independence), the variance of a sum of variables is the sum of the variances of each variable

V(i=0n1Xi)=i=0n1V(Xi)for zero covariance.\mathbb{V}\biggl(\sum_{i=0}^{n-1} X_{i}\biggr) = \sum_{i=0}^{n-1} \mathbb{V}(X_{i}) \qquad \text{for zero covariance}.

While it is very tempting to simply assume that Eq. (25) holds, in real life we must justify its use, either from theoretical analysis and/or empirical evidence.

It should also be recalled that knowing that random variables have zero covariance does not inherently prove independence. As a simple counterexample, consider a random variable XX with zero mean and third moment and let Y=X2Y=X^2. An example might be XN(0,1)X \sim N(0,1) and Y=X2χ12Y = X^2 \sim \chi^2_1. Clearly, XX and YY and highly dependent; for example, knowing that Y>4Y>4 tells us X>2|X|>2. Nevertheless, they still have zero covariance:

Cov(X,Y)=E[XY]E[X]E[Y]=E[XY]0E[Y]=E[X3]=0\text{Cov}(X, Y) = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y]= \mathbb{E}[XY] - 0 \cdot \mathbb{E}[Y] = \mathbb{E}[X^3]=0

Variance-Covariance Inequality

Warning: The math in this section can get rather heavy. Feel free to skip this section if you’re having difficultly. While the material below does add to the overall understanding of future material, it is not absolutely necessary.

Arithmetic Inequality

From Eq. (21) we see that

V(X+Y)=V(X)+V(Y)+2Cov(X,Y).\mathbb{V}(X + Y) = \mathbb{V}(X) + \mathbb{V}(Y) + 2\,\text{Cov}(X, Y).

By substituting in Y-Y, we arrive at

V(XY)=V(X)+V(Y)2Cov(X,Y).\mathbb{V}(X-Y) = \mathbb{V}(X) + \mathbb{V}(Y) - 2\,\text{Cov}(X, Y).

As variances are by definition non-negative, by combining the above two equations we arrive at the inequality 2Cov(X,Y)V(X)+V(Y)2\,|\text{Cov}(X, Y)| \leq \mathbb{V}(X) + \mathbb{V}(Y). We can express this in terms of the arithmetic mean of the variances:

Cov(X,Y)12(V(X)+V(Y))|\text{Cov}(X, Y)| \leq \frac{1}{2}(\mathbb{V}(X) + \mathbb{V}(Y))

While Eq. (28) is true, we will soon see that we can get an even tighter bound on the inequality.

Cauchy-Schwarz Inequality

In order to better understand the relation between variance and covariance, we must first define the Cauchy-Schwarz inequality, a valuable inequality from linear algebra. In words, it states that the square of the inner product of two vectors must always be less than or equal to the norm of the first vector squared times the second vector’s norm squared.

(uv)2u2v2(\mathbf{u} \cdot \mathbf{v})^{2} \leq \|\mathbf{u}\|^{2}\|\mathbf{v}\|^{2}

or, recognizing that vector norms are always non-negative

uvuv|\mathbf{u} \cdot \mathbf{v}| \leq \|\mathbf{u}\|\|\mathbf{v}\|

Eq.s (29) and (30) will be equalities if and only if u\mathbf{u} and v\mathbf{v} can be expressed as scalar multiples of one another (i.e. lie on the same line).

If u\mathbf{u} and/or v\mathbf{v} is the zero vector, the inequality is trivially true, let us prove the inequality when neither is:

Let w=uu±vv\text{Let } \mathbf{w} \stackrel{\triangle}= \frac{\mathbf{u}}{\|\mathbf{u}\|} \pm \frac{\mathbf{v}}{\|\mathbf{v}\|}
ww=(uu±vv)(uu±vv)=uuu2±2uvuv+vvv2=1±2uvuv+1=2±2uvuvas with any inner product ww002±2uvuv1uvuvuvuv\begin{split} \mathbf{w} \cdot \mathbf{w} &= \Big(\frac{\mathbf{u}}{\|\mathbf{u}\|} \pm \frac{\mathbf{v}}{\|\mathbf{v}\|}\Big)\cdot \Big(\frac{\mathbf{u}}{\|\mathbf{u}\|} \pm \frac{\mathbf{v}}{\|\mathbf{v}\|}\Big) \\ &= \frac{\mathbf{u} \cdot \mathbf{u}}{\|\mathbf{u}\|^{2}} \pm 2\frac{\mathbf{u} \cdot \mathbf{v}}{\|\mathbf{u}\| \|\mathbf{v}\|} + \frac{\mathbf{v} \cdot \mathbf{v}}{\|\mathbf{v}\|^{2}}\\ &= 1 \pm 2\frac{\mathbf{u} \cdot \mathbf{v}}{\|\mathbf{u}\| \|\mathbf{v}\|} + 1\\ &= 2 \pm 2\frac{\mathbf{u} \cdot \mathbf{v}}{\|\mathbf{u}\| \|\mathbf{v}\|} \\ &\text{as with any inner product } \mathbf{w} \cdot \mathbf{w} \geq 0\\ 0 &\leq 2 \pm 2\frac{\mathbf{u} \cdot \mathbf{v}}{\|\mathbf{u}\| \|\mathbf{v}\|} \\ 1 &\geq \frac{| \mathbf{u} \cdot \mathbf{v} |}{\|\mathbf{u}\| \|\mathbf{v}\|} \\ %1 &\geq \mp \frac{\mathbf{u} \cdot \mathbf{v}}{\|\mathbf{u}\| \|\mathbf{v}\|} \\ %-1 &\leq \frac{\mathbf{u} \cdot \mathbf{v}}{\|\mathbf{u}\| \|\mathbf{v}\|} \leq 1\\ \|\mathbf{u}\| \|\mathbf{v}\| &\geq |\mathbf{u} \cdot \mathbf{v}| \\ \end{split}

Functions as Infinite Dimensional Vectors

The Cauchy-Schwarz inequality may be extended to integrals by viewing functions as infinite dimensional vectors living in Hilbert space. Let us imagine we have two continuous functions, f(x)f(x) and g(x)g(x) that are square-integrable on the interval [a,b][a, b]. Let us create nn-dimensional vectors by sampling the value of the function at nn evenly spaced points along the interval, producing the vectors

(f(x1),f(x2),,f(xn)=f(b)),(f(x_{1}), f(x_{2}),\ldots,f(x_{n})=f(b)),

and

(g(x1),g(x2),,g(xn)=g(b))(g(x_{1}), g(x_{2}),\ldots,g(x_{n})=g(b))

where f(x0)=f(a)f(x_0)=f(a) and g(x0)=g(a)g(x_0)=g(a) are omitted since we are using a right Riemann sum.

Right Riemann approximation to f(x) and g(x) for n=10.

Figure 1:Right Riemann approximation to f(x)f(x) and g(x)g(x) for n=10n=10.

By the Cauchy-Schwarz inequality:

(i=1nf(xi)g(xi))2(i=1nf(xi)2)(i=1ng(xi)2)\Big(\sum_{i=1}^{n}f(x_{i})g(x_{i})\Big)^{2} \leq \Big(\sum_{i=1}^{n}f(x_{i})^{2}\Big)\Big(\sum_{i=1}^{n}g(x_{i})^{2}\Big)

As the term [ban]2[\frac{b-a}{n}]^{2} is a positive constant for any fixed aa, bb, and nn, we may freely multiply both sides by it:

[ban]2(i=1nf(xi)g(xi))2[ban]2(i=1nf(xi)2)(i=1ng(xi)2)(i=1nf(xi)g(xi)[ban])2(i=1nf(xi)2[ban])(i=1ng(xi)2[ban]).\begin{split} \Big[\frac{b-a}{n}\Big]^{2}\Big(\sum_{i=1}^{n}f(x_{i})g(x_{i})\Big)^{2} &\leq \Big[\frac{b-a}{n}\Big]^{2}\Big(\sum_{i=1}^{n}f(x_{i})^{2}\Big)\Big(\sum_{i=1}^{n}g(x_{i})^{2}\Big)\\ \Big(\sum_{i=1}^{n}f(x_{i})g(x_{i})\Big[\frac{b-a}{n}\Big]\Big)^{2} &\leq \Big(\sum_{i=1}^{n}f(x_{i})^{2}\Big[\frac{b-a}{n}\Big]\Big)\Big(\sum_{i=1}^{n}g(x_{i})^{2}\Big[\frac{b-a}{n}\Big]\Big). \end{split}

Defining Δx\Delta x as ban\frac{b-a}{n}:

(i=1nf(xi)g(xi)Δx)2(i=1nf(xi)2Δx)(i=1ng(xi)2Δx).\begin{split} \Big(\sum_{i=1}^{n}f(x_{i})g(x_{i})\Delta x\Big)^{2} &\leq \Big(\sum_{i=1}^{n}f(x_{i})^{2}\Delta x\Big)\Big(\sum_{i=1}^{n}g(x_{i})^{2}\Delta x\Big). \end{split}

Finally, by taking the limit limnΔx\lim_{n\to \infty}\Delta x we arrive at the Cauchy-Schwarz inequality in integral form:

(abf(x)g(x)dx)2abf(x)2dxabg(x)2dx\Big(\int_{a}^{b} f(x)g(x) dx\Big)^{2} \leq \int_{a}^{b} f(x)^{2} dx \int_{a}^{b} g(x)^{2} dx

Geometric Inequality

We can use the Cauchy-Schwarz inequality to derive a tighter upper bound on the covariance of two variables than that found in Eq. (28). We will only explicitly prove the bound for the discrete case of sample covariance, but by Eq. (34) the bound will also hold for the continuous case. Let

x=1n1(xμx)\mathbf{x'}\stackrel{\triangle}= \frac{1}{\sqrt{n-1}}(\mathbf{x}-\mu_{x})

for random variable realizations x=(x1,x2,,xn)\mathbf{x} = (x_{1}, x_{2},\ldots,x_{n}), and let y\mathbf{y'} be defined analogously. By the definitions of (sample) variance and covariance,

V(X)=x2=xx,\mathbb{V}(X) = \|\mathbf{x'}\|^{2} =\mathbf{x'} \cdot \mathbf{x'},
V(Y)=y2=yy,\mathbb{V}(Y) = \|\mathbf{y'}\|^{2}= \mathbf{y'} \cdot \mathbf{y'},

and

(Cov(X,Y))2=(xy)2.(\text{Cov}(X, Y))^{2} = (\mathbf{x'} \cdot \mathbf{y'})^{2}.

Substituting these definitions into Eq. (29), we conclude that

(xy)2x2y2,(\mathbf{x'} \cdot \mathbf{y'})^{2} \leq \|\mathbf{x'}\|^{2}\|\mathbf{y'}\|^{2},

or

(Cov(X,Y)))2V(X)V(Y)(\text{Cov}(X, Y)))^{2} \leq \mathbb{V}(X) \mathbb{V}(Y)

Equivalently, by taking the square roots of both sides and recognizing that covariance could be negative, we arrive at

Cov(X,Y)V(X)V(Y),|\text{Cov}(X, Y)| \leq \sqrt{\mathbb{V}(X) \mathbb{V}(Y)},

or more concisely

σx,yσxσy.|\sigma_{x, y}| \leq \sigma_{x}\sigma_{y}.

We thus arrive at the tighter bound that the absolute value of the covariance must always be less than or equal to the geometric mean of the variances.

There is actually an even stricter requirement for covariance, namely that the covariance matrix by positive semidefinite. We will defer discussion of postive semidefiniteness until we encounter it in the specific time series application of covariance.

Correlation

Drawbacks to Covariance

While generally useful, the covariance does have potential drawbacks. In particular, Cov(X,Y)\text{Cov}(X, Y) has units of X×YX \times Y, causing two major difficulties:

  1. The exact value obtained will depend on the units used, for example the covariance of height and weight will be different in imperial vs. metric.

  2. It can be very difficult to assess if the value of the covariance is practically significant, e.g. is a covariance of 33 kg cm high or low?

Correlation Definition

To overcome these difficulties with covariance, it is often convenient to instead normalize covariance to a unitless quantity bounded by [1,1][-1, 1]:

Cor(X,Y)=σx,yσxσy\text{Cor}(X, Y) \stackrel{\triangle}= \frac{\sigma_{x, y}}{\sigma_{x}\sigma_{y}}

This quantity is referred to as the correlation. Note that the Cauchy-Schwarz inequality as applied in Eq. (42) guarantees that Eq. (43) will always lie in [1,1][-1, 1]. Of course, even when using correlation determining if a value such as 0.7 should be considered a high correlation or not will be context and situation dependent.

Footnotes
  1. In some disciplines such as quantum mechanics, operators are often denoted by a “hat” as O^\hat{\mathbb{O}}, not to be confused with the hat used to denote a statistical estimator.

  2. Some texts define moments as (xc)nP(x)dx\int (x-c)^n P(x)\, dx for some constant cc. We will assume c=0c=0 unless otherwise specified.