Mathematics is known for its resolute commitment to precision in definitions and statements. However, when words are pulled from the English language and given rigid mathematical definitions, the connotations and colloquial use outside of mathematics still remain. This can lead to immutable mathematical terms being used interchangeably, even though the mathematical definitions are not equivalent. This occurs frequently in probability and statistics, particularly with the notion of *uncorrelated *and *independent*. We will focus this post on the exact meaning of both of these words, and how they are related but not equivalent.

## Independence

First, we will give the formal definition of independence:

**Definition (***Independence of Random Variables)*.

Two random variables X and Y are *independent* if the joint probability distribution P(X, Y) can be written as the product of the two individual distributions. That is,

P(X, Y) = P(X)P(Y)

Essentially this means that the joint probability of the random variables X and Y together are actually separable into the product of their individual probabilities. Here are some other equivalent definitions:

- P(X \cap Y) = P(X)P(Y)
- P(X|Y) = P(X) and P(Y|X) = P(Y)

This first alternative definition states that the probability of any outcome of X and any outcome of Y occurring simultaneously is the product of those individual probabilities.

For example, suppose the probability that you will put ham *and* cheese on your sandwich is P(H \cap C) = 1/3. The probability that ham is on your sandwich (with or without any other toppings) is P(H) = 1/2 and the probability that cheese is on your sandwich (again, with or without ham or other goodies) is P(C) = 1/2. If ham and cheese were independent sandwich fixings, then P(H\cap C) = P(H)P(C), but

P(H\cap C)= 1/3 \neq 1/4 = P(H)P(C)
Thus, ham and cheese are not independent sandwich fixings. This leads us into the next equivalent definition:

*Two random variables are independent if *P(X|Y) = P(X) *and* P(Y|X) = P(Y).

The vertical bar is a **conditional probability. **P(X|Y) reads “probability of X given Y“, and is the probability X will have any outcome x given that the random variable Y has occurred.

The second definition means that two random variables are independent if the outcome of one has no effect on the other. That is, putting cheese on my sandwich doesn’t affect the likelihood that I will then add ham, *and * if I started with ham, then it doesn’t affect the likelihood I will add cheese second. The example above already showed that ham and cheese were not independent, but we’ll repeat it again with this other equivalent definition.

By Bayes’ formula,

P(H | C) = \frac{P(H\cap C)}{P(C)} = \frac{1/3}{1/2} = \frac{2}{3} \neq \frac{1}{2} = P(H)
The probability that ham will be on my sandwich given that cheese is on my sandwich is 2/3. This means the presence of cheese increases the likelihood that ham will be there too, which also tells me ham and cheese are not independent.

P(C | H) = \frac{P(H\cap C)}{P(H)} = \frac{2}{3} \neq \frac{1}{2} = P(C)
In addition, I’m more likely to add cheese to the sandwich if ham is already there. In both of these, the presence of one affects the probability of the other, so they are not independent.

Coin flips are independent. The probability of me flipping a quarter and getting a head doesn’t affect the probability of you then getting a tail (or head) when you pick up that quarter and flip it after me. Independence is a common assumption in statistics because most distributions rely on it, though real data is rarely truly independent. For the development of the theory of *dependent *random variables, see here, here, and here.

Next we’ll define what it means to be uncorrelated, and discuss some of the subtleties and equivalent interpretations.

## Uncorrelated

When people use the word “uncorrelated”, they are typically referring to the Pearson correlation coefficient (or product-moment coefficient) having a value of 0. The Pearson correlation coefficient of random variables X and Y is given by

\rho = \frac{\text{Cov}(X,Y)}{\sqrt{\text{Var}(X)}\sqrt{\text{Var}(Y)}}
Where \text{Var}(X) is the variance of the random variable, and \text{Cov}(X,Y) is the covariance between X and Y. A correlation of 0, or X and Y being *uncorrelated *implies \text{Cov}(X,Y) = 0, and thus it suffices to just look at the numerator.

The **covariance **between two random variables X and Y measures the joint variability, and has the formula

\text{Cov}(X,Y) = E[XY]-E[X]E[Y]
E[\cdot] is the expectation operator and gives the expected value (or mean) of the object inside.

E[X] is the mean value of the random variable

X.

E[XY] is the mean value of the product of the two random variables

X and

Y.

For an example, suppose X and Y can take on the joint values (expressed as ordered pairs (0,0), (1,0), and (1,1) with equal probability. Then for any of the three possible points (x,y), P( (X,Y) = (x,y)) = 1/3. We will find the covariance between these two random variables.

The first step is to calculate the mean of each individual random variable. X only takes on two values, 0 and 1, with probability 1/3 and 2/3 respectively. (Remember that two of the points have X = 1, with each of those probabilities as 1/3.) Then

E[X] = 0\cdot 1/3 + 1\cdot 2/3 = 2/3
Similarly, E[Y] = 0\cdot 2/3 + 1\cdot 1/3 = 1/3. Now, we must calculate the expected value of the *product* of X and Y. That product can take on values 0 or 1 (multiply the elements of each ordered pair together) with respective probabilities 2/3 and 1/3. These probabilities are obtained the same way as for the individual expectations. Thus,

E[XY] = 0\cdot 2/3 + 1\cdot 1/3 = 1/3
Finally, we put it all together:

\text{Cov}(X,Y) = E[XY]-E[X]E[Y] = \frac{1}{3}-\frac{2}{3}\cdot\frac{1}{3} = \frac{1}{3}-\frac{2}{9} = \frac{1}{9}
Covariance (and correlation, the normalized form of covariance) measure the linear relationship between two random variables. This is important. Next, we’ll look at how independence and correlation are related.

## Independence \Rightarrow Uncorrelated

Here’s the start of where the confusion lies. First, it is absolutely true that if two random variables are independent, then they are uncorrelated. It’s important to prove this, so I will do it. I will prove this for discrete random variables to avoid calculus, but this holds for all random variables, both continuous and discrete.

**Theorem.** If two random variables X and Y are independent, then they are uncorrelated.

*Proof.* Uncorrelated means that their correlation is 0, or, equivalently, that the covariance between them is 0. Therefore, we want to show that for two given (but unknown) random variables that are independent, then the covariance between them is 0.

Now, recall the formula for covariance:

\text{Cov}(X,Y) = E[XY]-E[X]E[Y]
If the covariance is 0, then E[XY] = E[X]E[Y]. Therefore, we say mathematically that it is sufficient to show that E[XY] = E[X]E[Y]. This is because uncorrelated is equivalent to a 0 covariance, which is equivalent to E[XY] = E[X]E[Y], and thus showing this last equality is equivalent to showing that X and Y are uncorrelated.

OK, we are given that X and Y are independent, which by definition means that P(X,Y) = P(X)P(Y). This is our starting point, and we know what we want to show, so let’s calculate E[XY]:

E[XY] = \sum_{x}\sum_{y}x\cdot y\cdot P(X=x, Y=y)
This is what E[XY] is. We have to sum the product of the two random variables times the probability that X=x and Y=y over all possible values of X and all possible values of Y. Now, we use the definition of independence and substitute P(X=x)P(Y=y) for P(X=x, Y=y):

\begin{aligned}E[XY] &= \sum_{x}\sum_{y}x\cdot y\cdot P(X=x, Y=y)\\&=\sum_{x}\sum_{y}x\cdot y\cdot P(X=x)P(Y=y)\end{aligned}
If I sum over Y first, then everything related to X is constant with respect to Y. . Then I can factor out everything related to X from the sum over y:

\begin{aligned}E[XY] &= \sum_{x}\sum_{y}x\cdot y\cdot P(X=x, Y=y)\\&=\sum_{x}\sum_{y}x\cdot y\cdot P(X=x)P(Y=y)\\&= \sum_{x}x\cdot P(X=x)\sum_{y}y\cdot P(Y=y)\end{aligned}
Then if I actually carry out that inner sum over y, it becomes a completed object with no relationship to the outer sum going over x. That means I can put parentheses around it and pull it out of the sum over x:

\begin{aligned}E[XY] &= \sum_{x}\sum_{y}x\cdot y\cdot P(X=x, Y=y)\\&=\sum_{x}\sum_{y}x\cdot y\cdot P(X=x)P(Y=y)\\&= \sum_{x}x\cdot P(X=x)\left(\sum_{y}y\cdot P(Y=y)\right)\\&= \left(\sum_{y}y\cdot P(Y=y)\right)\left(\sum_{x}x\cdot P(X=x)\right)\end{aligned}
Looking at the objects in each group of parentheses, we see that it matches the definition of expectation for X and Y. That is E[X] = \sum_{x}x\cdot P(X=x), and similar for Y. Therefore, we have shown that E[XY] = E[X]E[Y], and have proven that independence always implies uncorrelated.

Now, to use these two words (independent and uncorrelated) interchangeably, then we would have to know that the *converse* of the statement we just proven is true: that

**Uncorrelated implies independence**

If we find even *one* counterexample (an example where the two variables have 0 correlation but do not fit the definition of independence), then the converse is *false* and we cannot use those terms interchangeably.

## No luck, I have a counterexample

Let’s take X and Y to exist as an ordered pair at the points (-1,1), (0,0), and (1,1) with probabilities 1/4, 1/2, and 1/4. Then E[X] = -1\cdot 1/4 + 0 \cdot 1/2 + 1\cdot 1/4 = 0 = E[Y] and

E[XY] = -1\cdot 1/4 + 0 \cdot 1/2 + 1\cdot 1/4 = 0 = E[X]E[Y]
and thus X and Y are uncorrelated.

Now let’s look at the marginal distributions of X and Y. X can take on values -1, 0, and 1, and the probability it takes each of those is 1/4, 1/2, and 1/4. Same with Y. Then looping through the possibilities, we have to check if P(X=x, Y=y) = P(X=x)P(Y=y)
P(X=-1, Y=1) =1/4 \neq 1/16 = P(X=-1)P(Y=1)

We loop through the other two points, and see that X and Y do not meet the definition of independent.

Therefore, we just found an example where two uncorrelated random variables are *not* independent, and thus the converse statement does not hold.

## Conclusion

Correlation is a linear association measure. We saw in our counterexample that there was no *linear* relationship between the two random variables That doesn’t mean the variables don’t have any effect on each other (again, as we saw in our counterexample.) The common mistake is to forget that correlation is a restrictive measure of relationships, since it only covers linear types. Independence is a measure of “probabilistic effect”, which encompasses far more than simply linear association.

The words *uncorrelated* and *independent* may be used interchangeably in English, but they are not synonyms in mathematics. Independent random variables are uncorrelated, but uncorrelated random variables are not always independent. In mathematical terms, we conclude that independence is a more restrictive property than uncorrelated-ness.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.