2.4 Properties of Variance and Covariance - Time Series Analysis for Data Scientists

We will make heavy use of both variance and covariance (in particular the autocovariance) throughout the book. This chapter presents a refresher on these topics laying the foundation for the forms used in time series analysis discussed in the next chapter.

Operators¶

There are several operators we will encounter in this book. Values such as mean, variance, and covariance can all be cast in the operator formalism. In later chapters we will introduce the backshift and Fourier operators. So what is an operator?

An operator $\mathbb{O}$ is defined as a rule that maps a member of a set to another member. Thus, an operator could just be a function such as 5, defined as multiplying by 5 (i.e. $f(x)=5\,x$ )^[1]. However, the most common operators we will use map one function to another function. Two such operators are differentiation $\frac{d}{dx}$ and integration $\int dx$ .

Linear operators are of particular interest. An operator is a linear operator if it fulfills the following two conditions:

$\mathbb{O}\, a F(x) = a\,\mathbb{O}\, F(x)$ for any constant $a$
$\mathbb{O}\, (F(x) + G(y)) = \mathbb{O}\,F(x) + \mathbb{O}\,G(y)$

We may write the two conditions more succinctly as

\mathbb{O}\, (aF(x) + bG(y)) = a\,\mathbb{O}\,F(x) + b\,\mathbb{O}\,G(y)

(1)

for any constants $a$ and $b$ .

Going forward, we will assume that all operators discussed are linear unless otherwise noted.

Expectation¶

The most important operator in statistics and data science is the expectation operator $\mathbb{E}[F(x)]$ , usually first encountered in the context of the arithmetic mean.

Expectation is defined as:

\mathbb{E}[F(x)] \stackrel{\triangle}= \begin{cases} \sum_{x} F(x)P(x) & \text{discrete}\ x \\ \int F(x)P(x)\, dx & \text{continuous}\ x \\ \end{cases}

(3)

where $P(x)$ is either the probability function of the random variable $x$ such that

\mathbb{P}(a< X < b)=\sum_{a < x_i < b} P(x_i)

(4)

(discrete $x$ ) or the probability density function such that

\mathbb{P}(a<X<b)=\int_a^b P(x)\,dx

(5)

(continuous $x$ ).

In general, we will not explicitly reference both the discrete and continuous cases in this book. Instead, we will use the notation $\mathbb{E}$ or one of the two methods in Eq. (3) with an understanding that a reference to either one implicitly refers to both unless specified otherwise.

Linearity of Expectation¶

An important property of expectation is that it is a linear operator. We can demonstrate this fact by proving that $\mathit{\mathbb{E}[a\,F(x) + b\,G(y)]} = \mathit{a\,\mathbb{E}[F(x)] + b\,\mathbb{E}[G(y)]}$ as follows:

\begin{split} \mathit{\mathbb{E}[a\,F(x) + b\,G(y)]} &= \int \int (a\,F(x) + b\,G(y))\, P(x, y)\, dx\, dy \\ &= \int \int a\,F(x)\, P(x, y)\, dx\, dy + \int \int b\,G(y)\, P(x, y)\, dx\, dy \\ &= a\int \int F(x)\, P(x, y)\, dx\, dy + b\int \int G(y)\, P(x, y)\, dx\, dy \\ &= a \int F(x)\, P(x)\, dx + b\int G(y)\, P(y)\, dy \\ &= a\,\mathbb{E}[F(x)] + b\,\mathbb{E}[G(y)] \end{split}

(6)

where in the fourth line we have used the definition of marginal probabilities: $P(x) = \int P(x, y)\, dy$ and $P(y) = \int P(x, y)\, dx$ .

Problem

Why can we simplify $\int \int F(x)\, P(x, y)\, dx\, dy$ to $\int F(x)\, P(x)\, dx$ ? What property of marginal probabilities justifies this step?

Moments¶

We have not yet specified the identity of $F(x)$ . By far, the most commonly used functions are powers of $x$ , i.e.

F(x) = x^n, \qquad n=0,1,2,\ldots

(9)

The expectation value of Eq. (9) is the $n\text{th}$ moment of $P(x)$ :

\mathbb{E}[x^n] = \int x^n P(x)\, dx,

(10)

where we are usually interested in cases of $n=1-4$ ^[2]:

The first moment is the arithmetic mean denoted as $\mu_x$ .
The second moment relates to the variance, or how widely spread the distribution is.
The third moment relates to the skew, or how symmetric the function is.
The fourth moment relates to the kurtosis, or the “fatness” of the tails.

Finite Moments¶

An important theorem states that if the $k\text{th}$ moment $\mathbb{E}[x^{k}]$ is finite, then all moments $j<k$ must also be finite. As a corollary, if $\mathbb{E}[x^{k}]$ is infinite, all moments $m>k$ must also be infinite.

Proof: Let $\mathbb{E}[x^{k}]$ be finite and $j<k$ ( $\forall$ is read as “for all”)

\begin{split} \mathbb{E}[x^{j}] &= \int_{-\infty}^{\infty} x^{j} P(x)\, dx \\ &= \int_{-\infty}^{-1} x^{j} P(x)\,dx + \int_{-1}^{1} x^{j} P(x)\, dx + \int_{1}^{\infty} x^{j} P(x)\, dx\\ &\text{note that } |x^{j}| \leq |x^{k}| \, \forall\, \ |x| \geq 1\ \text{and } \ |x^{j}| \leq 1 \, \forall\, |x| \leq 1 \\ &\leq \Big|\int_{-\infty}^{-1} x^{k} P(x) \,dx\Big| + \int_{-1}^{1} 1 \, P(x)\, dx + \int_{1}^{\infty} x^{k} P(x) \,dx\\ &\leq \Big|\int_{-\infty}^{-1} x^{k} P(x)\, dx\Big| + 1 + \int_{1}^{\infty} x^{k} P(x)\, dx\\ &< \infty \\ %&\text{Q.E.D.} \end{split}

where we have used the fact that $\int_{-1}^{1} 1 P(x)\, dx \leq 1$ , with equality only occurring if the entire probability mass is contained in the interval $[-1,1]$ .

Problem

Consider the standard Cauchy distribution (also known as the Cauchy-Lorentz or Lorentz distribution) defined as

P(x) = \frac{1}{\pi}\cdot\frac{1}{1+x^2}

(11)

a. Prove that this is a valid probability distribution.

b. What is the first moment of the Cauchy distribution?

c. What does part b tell you about higher moments?

Solution:

a. In order to be a valid probability distribution, the Cauchy distribution must satisfy the three Kolmogorov axioms of probability:

Non-negativity, i.e. $\mathbb{P}(x) \geq 0\,\forall\, x$ : This is satisfied because $\frac{1}{1+x^2}$ is strictly positive for all real $x$ .
$\sigma-\text{additivity}$ , i.e. $\mathbb{P} (A \,\cup \, B) =\mathbb{P}(A) + \mathbb{P}(B)$ for $A\,\cap\,B=0$ : For two disjoint intervals $A=[a_1,a_2]$ and $B=[b_1,b_2]$ , we have

\mathbb{P}(A \,\cup \, B) = \int_{a_1}^{a_2}P(x) \,dx+\int_{b_1}^{b_2}P(x) \,dx = \mathbb{P}(A) + \mathbb{P}(B).

(12)

In texts on mathematical statistics you will find that the above point actually has additional nuance stemming from the definition of Lebesgue integrals and other measure-theoretic arguments. That said, the proof used here is fully sufficient for our purposes.

Summation to unity, i.e. $\int_{-\infty}^{\infty}P(x)\,dx=1$ .

\begin{split} \frac{1}{\pi} \int_{-\infty}^{\infty}\frac{1}{1+x^2}\,dx &= \frac{1}{\pi} \arctan{(x)}\Biggr\rvert_{x=-\infty}^{\infty}\\ &= \frac{1}{\pi}\Big(\lim_{x\to\infty}\arctan{(x)} -\lim_{x\to-\infty}\arctan{(x)}\Big)\\ &= \frac{1}{\pi}\big(\frac{\pi}{2}- (-\frac{\pi}{2})\big)\\ &= \frac{1}{\pi}{\pi}=1. \end{split}

(13)

b. We must evaluate the integral $\mathbb{E}[x]$ for the Cauchy distribution:

\begin{split} \mathbb{E}[x] &= \frac{1}{\pi}\int_{-\infty}^{\infty} \frac{x}{1+x^2}\,dx \end{split}

(14)

Letting $u=1+x^2$ so that $du=2x\,dx$ , we have

\begin{split} \frac{1}{\pi}\int \frac{x}{1+x^2}\,dx &= \frac{1}{2\pi}\int \frac{du}{u}\\ &= \frac{1}{2\pi}\ln{(u)}. \end{split}

(15)

Substituting the definition of $u$ and limits of integration

\begin{split} \frac{1}{2\pi}\ln{(1+x^2)}\Biggr\rvert_{x=-\infty}^{\infty} &= \frac{1}{2\pi}\Big(\lim_{x\to\infty}\ln{(1+x^2)} -\lim_{x\to-\infty}\ln{(1+x^2)}\Big)\\ &= \frac{1}{2\pi}(\infty-\infty)\\ &= \infty. \end{split}

(16)

Note that strictly $\infty-\infty$ is an indeterminate form. For our purposes the relevant point is that it is not finite.

c. The fact that the first moment (the arithmetic mean) is not finite guarantees that all higher moments will also not be finite. Thus, the distribution cannot have a defined variance, skew, etc. We will see later in the book that this causes real world problems as many extreme financial events roughly obey a Cauchy distribution.

Variance¶

The variance of a random variable $X$ is defined as

\mathbb{V}(X)\stackrel{\triangle}= \mathbb{E}[(X-\mu_{x})^{2}],

(17)

and is often denoted as $\sigma_x^2$ . The variance gives us a measure of how widely the distribution is spread about the mean. In practice, we more commonly make use of the standard deviation $\sigma_x$ , which is simply the square root of the variance.

As written, the variance is slightly different than our definition of the second moment.

By exploiting the linearity of expectation we can express Eq. (17) using the first and second moments exclusively:

\begin{split} \mathbb{E}[(X-\mu_{x})^{2}] &=\mathbb{E}[X^{2}-2X\mu_{x}+\mu_{x}^{2}] \\ &=\mathbb{E}[X^{2}]- \mathbb{E}[2X\mu_{x}] + \mathbb{E}[\mu_{x}^{2}] \\ &=\mathbb{E}[X^{2}]- 2\mu_{x}\mathbb{E}[X] + \mu_{x}^{2}\mathbb{E}[1] \\ &=\mathbb{E}[X^{2}]- 2\mu_{x}^{2} + \mu_{x}^{2}\\ &=\mathbb{E}[X^{2}]- \mu_{x}^{2}\\ &=\mathbb{E}[X^{2}]- (\mathbb{E}[X])^{2}\\ \end{split}

(18)

Covariance¶

The covariance of two random variables $X, Y$ is defined as

\text{Cov}(X, Y) \stackrel{\triangle}= \mathbb{E}[(X-\mu_{x})(Y-\mu_{y})].

(19)

Covariance is also written as $\sigma_{x, y}$ . Note that $\text{Cov}(X, X) = \mathbb{V}(X)$ .

Unlike variance, which is never negative, covariance can be negative, zero, or positive.Following the same logic used in Eq. (18), we can also express the covariance as

\text{Cov}(X, Y) = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y]

(20)

Covariance gives us a measure of how much random variables change in tandem with one another.

Variance of Sums of Random Variables¶

One of the most fundamental aspects of time series analysis is understanding the variance of the sum of random variables. Let us begin with the variance of a sum of two random variables, $X$ and $Y$ .

Variance of Multiple Random Variables¶

\begin{split} \mathbb{V}(X+Y) &= \mathbb{E}[(X+Y-\mu_{x}-\mu_{y})^{2}]\\ &= \mathbb{E}[X^{2}] + \mathbb{E}[Y^{2}] + 2\mathbb{E}[XY] - 2\mathbb{E}[X\mu_{x}]\\ &- 2\mathbb{E}[X\mu_{y}] - 2\mathbb{E}[Y\mu_{x}]- 2\mathbb{E}[Y\mu_{y}] + 2\mathbb{E}[\mu_{x}\mu_{y}]\\ &+\mathbb{E}[\mu_{x}^{2}] +\mathbb{E}[\mu_{y}^{2}] \\ &= \mathbb{E}[X^{2}] + \mathbb{E}[Y^{2}] + 2\mathbb{E}[XY] - \mu_{x}^{2} -\mu_{y}^{2} - 2\mu_{x}\mu_{y}\\ &= (\mathbb{E}[X^{2}] -\mu_{x}^{2}) + (\mathbb{E}[Y^{2}] -\mu_{y}^{2}) + 2(\mathbb{E}[XY]-\mu_{x}\mu_{y})\\ &= \mathbb{V}(X) + \mathbb{V}(Y) + 2\,\text{Cov}(X, Y) \end{split}

(21)

The last line of Eq. (21) can be rewritten as

\text{Cov}(X, X) + \text{Cov}(Y,Y) + \text{Cov}(X,Y) + \text{Cov}(Y, X),

(22)

or, renaming the variables as $X_{0}$ and $X_{1}$

\sum_{i, j=0}^{1} \text{Cov}(X_{i}, X_{j}).

(23)

The above suggests (though does not prove) that the variance of a sum of variables is the sum of all covariance combinations. For random variables $X_{0}, X_{1}, X_{2}, \ldots,X_{n-1}$ :

\begin{split} \mathbb{V}\biggl(\sum_{i=0}^{n-1} X_{i}\biggr) &= \sum_{i, j=0}^{n-1} \text{Cov}(X_{i}, X_{j})\\ &= \sum_{i=0}^{n-1} \mathbb{V}(X_{i}) + 2\sum_{i=0,\, j>i} \text{Cov}(X_{i}, X_{j}) \end{split}

(24)

where the last term sums $i$ to $n-2$ and $j$ to $n-1$ . Eq. (24) can be proven in the same manner as Eq. (21), though the algebra gets rather intricate. We present a more direct proof in the following problem.

Problem

Prove Eq. (24) using the definition of variance and covariance. Hint, express $\mathbb{V}(X)$ as $\text{Cov}(X, X)$ .

Variance of Independent Variables¶

From Eq. (24) we can see that if all variables have zero covariance (most commonly due to independence), the variance of a sum of variables is the sum of the variances of each variable

\mathbb{V}\biggl(\sum_{i=0}^{n-1} X_{i}\biggr) = \sum_{i=0}^{n-1} \mathbb{V}(X_{i}) \qquad \text{for zero covariance}.

(25)

While it is very tempting to simply assume that Eq. (25) holds, in real life we must justify its use, either from theoretical analysis and/or empirical evidence.

It should also be recalled that knowing that random variables have zero covariance does not inherently prove independence. As a simple counterexample, consider a random variable $X$ with zero mean and third moment and let $Y=X^2$ . An example might be $X \sim N(0,1)$ and $Y = X^2 \sim \chi^2_1$ . Clearly, $X$ and $Y$ and highly dependent; for example, knowing that $Y>4$ tells us $|X|>2$ . Nevertheless, they still have zero covariance:

\text{Cov}(X, Y) = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y]= \mathbb{E}[XY] - 0 \cdot \mathbb{E}[Y] = \mathbb{E}[X^3]=0

Variance-Covariance Inequality¶

Warning: The math in this section can get rather heavy. Feel free to skip this section if you’re having difficultly. While the material below does add to the overall understanding of future material, it is not absolutely necessary.

Arithmetic Inequality¶

From Eq. (21) we see that

\mathbb{V}(X + Y) = \mathbb{V}(X) + \mathbb{V}(Y) + 2\,\text{Cov}(X, Y).

(26)

By substituting in $-Y$ , we arrive at

\mathbb{V}(X-Y) = \mathbb{V}(X) + \mathbb{V}(Y) - 2\,\text{Cov}(X, Y).

(27)

As variances are by definition non-negative, by combining the above two equations we arrive at the inequality $2\,|\text{Cov}(X, Y)| \leq \mathbb{V}(X) + \mathbb{V}(Y)$ . We can express this in terms of the arithmetic mean of the variances:

|\text{Cov}(X, Y)| \leq \frac{1}{2}(\mathbb{V}(X) + \mathbb{V}(Y))

(28)

While Eq. (28) is true, we will soon see that we can get an even tighter bound on the inequality.

Cauchy-Schwarz Inequality¶

In order to better understand the relation between variance and covariance, we must first define the Cauchy-Schwarz inequality, a valuable inequality from linear algebra. In words, it states that the square of the inner product of two vectors must always be less than or equal to the norm of the first vector squared times the second vector’s norm squared.

(\mathbf{u} \cdot \mathbf{v})^{2} \leq \|\mathbf{u}\|^{2}\|\mathbf{v}\|^{2}

(29)

or, recognizing that vector norms are always non-negative

|\mathbf{u} \cdot \mathbf{v}| \leq \|\mathbf{u}\|\|\mathbf{v}\|

(30)

Eq.s (29) and (30) will be equalities if and only if $\mathbf{u}$ and $\mathbf{v}$ can be expressed as scalar multiples of one another (i.e. lie on the same line).

If $\mathbf{u}$ and/or $\mathbf{v}$ is the zero vector, the inequality is trivially true, let us prove the inequality when neither is:

\text{Let } \mathbf{w} \stackrel{\triangle}= \frac{\mathbf{u}}{\|\mathbf{u}\|} \pm \frac{\mathbf{v}}{\|\mathbf{v}\|}

\begin{split} \mathbf{w} \cdot \mathbf{w} &= \Big(\frac{\mathbf{u}}{\|\mathbf{u}\|} \pm \frac{\mathbf{v}}{\|\mathbf{v}\|}\Big)\cdot \Big(\frac{\mathbf{u}}{\|\mathbf{u}\|} \pm \frac{\mathbf{v}}{\|\mathbf{v}\|}\Big) \\ &= \frac{\mathbf{u} \cdot \mathbf{u}}{\|\mathbf{u}\|^{2}} \pm 2\frac{\mathbf{u} \cdot \mathbf{v}}{\|\mathbf{u}\| \|\mathbf{v}\|} + \frac{\mathbf{v} \cdot \mathbf{v}}{\|\mathbf{v}\|^{2}}\\ &= 1 \pm 2\frac{\mathbf{u} \cdot \mathbf{v}}{\|\mathbf{u}\| \|\mathbf{v}\|} + 1\\ &= 2 \pm 2\frac{\mathbf{u} \cdot \mathbf{v}}{\|\mathbf{u}\| \|\mathbf{v}\|} \\ &\text{as with any inner product } \mathbf{w} \cdot \mathbf{w} \geq 0\\ 0 &\leq 2 \pm 2\frac{\mathbf{u} \cdot \mathbf{v}}{\|\mathbf{u}\| \|\mathbf{v}\|} \\ 1 &\geq \frac{| \mathbf{u} \cdot \mathbf{v} |}{\|\mathbf{u}\| \|\mathbf{v}\|} \\ %1 &\geq \mp \frac{\mathbf{u} \cdot \mathbf{v}}{\|\mathbf{u}\| \|\mathbf{v}\|} \\ %-1 &\leq \frac{\mathbf{u} \cdot \mathbf{v}}{\|\mathbf{u}\| \|\mathbf{v}\|} \leq 1\\ \|\mathbf{u}\| \|\mathbf{v}\| &\geq |\mathbf{u} \cdot \mathbf{v}| \\ \end{split}

Functions as Infinite Dimensional Vectors¶

The Cauchy-Schwarz inequality may be extended to integrals by viewing functions as infinite dimensional vectors living in Hilbert space. Let us imagine we have two continuous functions, $f(x)$ and $g(x)$ that are square-integrable on the interval $[a, b]$ . Let us create $n$ -dimensional vectors by sampling the value of the function at $n$ evenly spaced points along the interval, producing the vectors

(f(x_{1}), f(x_{2}),\ldots,f(x_{n})=f(b)),

(31)

and

(g(x_{1}), g(x_{2}),\ldots,g(x_{n})=g(b))

(32)

where $f(x_0)=f(a)$ and $g(x_0)=g(a)$ are omitted since we are using a right Riemann sum.

Right Riemann approximation to f(x) and g(x) for n=10. — Figure 1:Right Riemann approximation to $f(x)$ and $g(x)$ for $n=10$ .

By the Cauchy-Schwarz inequality:

\Big(\sum_{i=1}^{n}f(x_{i})g(x_{i})\Big)^{2} \leq \Big(\sum_{i=1}^{n}f(x_{i})^{2}\Big)\Big(\sum_{i=1}^{n}g(x_{i})^{2}\Big)

(33)

As the term $[\frac{b-a}{n}]^{2}$ is a positive constant for any fixed $a$ , $b$ , and $n$ , we may freely multiply both sides by it:

\begin{split} \Big[\frac{b-a}{n}\Big]^{2}\Big(\sum_{i=1}^{n}f(x_{i})g(x_{i})\Big)^{2} &\leq \Big[\frac{b-a}{n}\Big]^{2}\Big(\sum_{i=1}^{n}f(x_{i})^{2}\Big)\Big(\sum_{i=1}^{n}g(x_{i})^{2}\Big)\\ \Big(\sum_{i=1}^{n}f(x_{i})g(x_{i})\Big[\frac{b-a}{n}\Big]\Big)^{2} &\leq \Big(\sum_{i=1}^{n}f(x_{i})^{2}\Big[\frac{b-a}{n}\Big]\Big)\Big(\sum_{i=1}^{n}g(x_{i})^{2}\Big[\frac{b-a}{n}\Big]\Big). \end{split}

Defining $\Delta x$ as $\frac{b-a}{n}$ :

\begin{split} \Big(\sum_{i=1}^{n}f(x_{i})g(x_{i})\Delta x\Big)^{2} &\leq \Big(\sum_{i=1}^{n}f(x_{i})^{2}\Delta x\Big)\Big(\sum_{i=1}^{n}g(x_{i})^{2}\Delta x\Big). \end{split}

Finally, by taking the limit $\lim_{n\to \infty}\Delta x$ we arrive at the Cauchy-Schwarz inequality in integral form:

\Big(\int_{a}^{b} f(x)g(x) dx\Big)^{2} \leq \int_{a}^{b} f(x)^{2} dx \int_{a}^{b} g(x)^{2} dx

(34)

Geometric Inequality¶

We can use the Cauchy-Schwarz inequality to derive a tighter upper bound on the covariance of two variables than that found in Eq. (28). We will only explicitly prove the bound for the discrete case of sample covariance, but by Eq. (34) the bound will also hold for the continuous case. Let

\mathbf{x'}\stackrel{\triangle}= \frac{1}{\sqrt{n-1}}(\mathbf{x}-\mu_{x})

(35)

for random variable realizations $\mathbf{x} = (x_{1}, x_{2},\ldots,x_{n})$ , and let $\mathbf{y'}$ be defined analogously. By the definitions of (sample) variance and covariance,

\mathbb{V}(X) = \|\mathbf{x'}\|^{2} =\mathbf{x'} \cdot \mathbf{x'},

(36)

\mathbb{V}(Y) = \|\mathbf{y'}\|^{2}= \mathbf{y'} \cdot \mathbf{y'},

(37)

and

(\text{Cov}(X, Y))^{2} = (\mathbf{x'} \cdot \mathbf{y'})^{2}.

(38)

Substituting these definitions into Eq. (29), we conclude that

(\mathbf{x'} \cdot \mathbf{y'})^{2} \leq \|\mathbf{x'}\|^{2}\|\mathbf{y'}\|^{2},

(39)

(\text{Cov}(X, Y)))^{2} \leq \mathbb{V}(X) \mathbb{V}(Y)

(40)

Equivalently, by taking the square roots of both sides and recognizing that covariance could be negative, we arrive at

|\text{Cov}(X, Y)| \leq \sqrt{\mathbb{V}(X) \mathbb{V}(Y)},

(41)

or more concisely

|\sigma_{x, y}| \leq \sigma_{x}\sigma_{y}.

(42)

We thus arrive at the tighter bound that the absolute value of the covariance must always be less than or equal to the geometric mean of the variances.

There is actually an even stricter requirement for covariance, namely that the covariance matrix by positive semidefinite. We will defer discussion of postive semidefiniteness until we encounter it in the specific time series application of covariance.

Correlation¶

Drawbacks to Covariance¶

While generally useful, the covariance does have potential drawbacks. In particular, $\text{Cov}(X, Y)$ has units of $X \times Y$ , causing two major difficulties:

The exact value obtained will depend on the units used, for example the covariance of height and weight will be different in imperial vs. metric.
It can be very difficult to assess if the value of the covariance is practically significant, e.g. is a covariance of $3$ kg cm high or low?

Correlation Definition¶

To overcome these difficulties with covariance, it is often convenient to instead normalize covariance to a unitless quantity bounded by $[-1, 1]$ :

\text{Cor}(X, Y) \stackrel{\triangle}= \frac{\sigma_{x, y}}{\sigma_{x}\sigma_{y}}

(43)

This quantity is referred to as the correlation. Note that the Cauchy-Schwarz inequality as applied in Eq. (42) guarantees that Eq. (43) will always lie in $[-1, 1]$ . Of course, even when using correlation determining if a value such as 0.7 should be considered a high correlation or not will be context and situation dependent.

Footnotes¶

In some disciplines such as quantum mechanics, operators are often denoted by a “hat” as $\hat{\mathbb{O}}$ , not to be confused with the hat used to denote a statistical estimator.
↩
Some texts define moments as $\int (x-c)^n P(x)\, dx$ for some constant $c$ . We will assume $c=0$ unless otherwise specified.
↩