Browsed by
Author: Rachel Traylor

The Gravity of Mathematics: Summary of Tech Field Day at SDC 2017

The Gravity of Mathematics: Summary of Tech Field Day at SDC 2017

Fair warning – this will likely be one of the least technical posts I write. On September 14, I gave a presentation at Tech Field Day that wasn’t actually storage related, but rather a call to rekindle the relationship between pure math and industry. Here I’ll post the slides from that talk and summarize some of the discussion that carried on around this topic.

Read More Read More

The Central Limit Theorem isn’t a Statistical Silver Bullet

The Central Limit Theorem isn’t a Statistical Silver Bullet

Chances are, if you took anything away from that high school or college statistics class you were dragged into, you remember some vague notion about the Central Limit Theorem. It’s likely the most famous theorem in statistics, and the most widely used. Most introductory statistics textbooks state the theorem in broad terms, that as the sample size increases, the sample distribution of the sum of the sample elements will be approximately normally distributed, regardless of the underlying distribution. Many things used in statistical inference as justification in a broad variety of fields, such as the classical z-test, rely on this theorem. Many conclusions in science, economics, public policy, and social studies have been drawn with tests that rely on the Central Limit Theorem as justification. We’re going to dive into this theorem a bit more formally, and discuss some counterexamples to this theorem. Not every sequence of random variables will obey the conditions of theorem, and the assumptions are a bit more strict than are used in practice.

Read More Read More

Cauchy Sequences: the Importance of Getting Close

Cauchy Sequences: the Importance of Getting Close

I am an analyst at heart, despite my recent set of algebra posts. Augustin Louis Cauchy can be argued as one of the most influential mathematicians in history, pioneering rigor in the study of calculus, almost singlehandedly inventing complex analysis and real analysis, though he also made contributions to number theory, algebra, and physics. 

One of the fundamental areas he studied was sequences and their notion of convergence. Suppose I give you a sequence of numbers, and ask you what happens to this sequence if I kept appending terms forever? Would the path created by the sequence lead somewhere? 

Read More Read More

Taking Things for Granted: Elementary Properties of Groups

Taking Things for Granted: Elementary Properties of Groups

We take a lot of things for granted: electricity, gas at the pump, and mediocre coffee at the office. Many concepts in basic algebra are also taken for granted, such as cancellation of terms, and commutativity. This post will revisit some basic algebra (think solving for x), but with some of those things we took for granted removed.

Read More Read More

Theory of Coding, Episode 2: Maximum-Likelihood Decoding

Theory of Coding, Episode 2: Maximum-Likelihood Decoding

The introduction to coding theory in this post will now allow us to explore some more interesting topics in coding theory, courtesy of Pinter’s A Book of Abstract Algebra. We’ll introduce the notion of a code, informations, and parity check equations. Most communication channels are noisy to some extent, which means that a transmitted codeword may have one or more errors. If the recipient of a codeword attempts to decode a message, he needs to be able to determine first if there was a possible error in the code, and then correct it. This will lead us to one method called maximum likelihood decoding.

Read More Read More

Group Theory, XOR, and Binary Codes: Introducing Coding Theory

Group Theory, XOR, and Binary Codes: Introducing Coding Theory

Binary codes and bit-wise operations are fundamental in computer science. Whatever device you are running today works because of binary codes, and the bit-wise operations AND, OR, NOT, and XOR. (A fun exercise, prove to yourself that each of these rules meets the definition of an operation. If you need a refresher, check out this post.) We can also look at binary codes and the bit-wise operations in a more algebraic way.

Let’s continue exploring abstract algebra via Pinter’s delightful algebra book. The previous post discussed operations on sets, and explored concatenation as an example. Here, we will build on this and look at a set G combined with an operation \ast that satisfies three axioms. This operation/set combination is a special algebraic object called a group, and has been studied extensively. We’ll look at a specific group to illustrate the concept: binary codes with the XOR operation.1 (Note: we also looked at modulo arithmetic on finite groups of integers here as another example of a group.)

Some quick terminology and setup

We currently (until the advent of quantum or DNA computers, or something else entirely) transmit information by coding it into binary strings – strings of 1s and 0s. We call these binary words. 100111 is a binary word, as are 001, 1, and 10110. A general binary word can have any length we want.

When we send these binary words via some communication channel, we always run the risk of errors in transmission. This means we need to devise some way to detect and correct transmission errors in binary codes. One way to do this is the XOR operation.

Let \mathbf{a} = a_{1}a_{2}\ldots a_{n} and \mathbf{b} = b_{1}b_{2}\ldots b_{n}  be binary words of length n, where a_{i},b_{i} \in \{0,1\} for i=1,...,n. We define the XOR operation, which we will denote by the symbol \oplus on each bit. For two bits a_{i}, b_{i} in a word,

a_{i}\oplus b_{i} = (a_{i} + b_{i}) \bmod 2

(Now would be a good time to review your modulo arithmetic if you’ve forgotten.)

Then for two words \mathbf{a} and \mathbf{b}, the XOR operation is done bit-wise (component by component). That is,

\mathbf{a} \oplus \mathbf{b} = (a_{1}\oplus b_{1})(a_{2}\oplus b_{2})\ldots (a_{n}\oplus b_{n})

As a quick example,

110010 \oplus 110001 = 000011

Notice that the result of the XOR operation shows us the positions in which \mathbf{a} and \mathbf{b} differ. Another way to look at it is if \mathbf{a} was transmitted, and \mathbf{b} was received, there was an error in the last two positions. We can call the result of XORing two binary words the error pattern.

Showing Binary Words with the XOR operation is a group

Now we’ll look at the set of all binary words of length n, called \mathbb{B}^{n} coupled with the operation XOR.2 We want to show that this set along with the XOR operation forms an algebraic structure called a group. A group is one of the most basic algebraic structures we can study. We will be exploring all sorts of things we can do and build with groups in future posts, so first we need to define a group.

What’s a group?

From Pinter (1982)

A group is a set G coupled with an operation \ast, denoted \langle G, \ast \rangle that satisfies the following three axioms:

(G1: Associativity of the operation) The operation \ast is associative. (See the previous post for a review of associativity.)

(G2: Existence of an identity element) There is an identity element inside the set G that we will call e such that for every element g \in G, e\ast g = g\ast e =g

(G3: Existence of an inverse for every element) For every element g \in G, there is a corresponding element g^{-1} \in G such that g\ast g^{-1} = g^{-1}\ast g = e

 

All three properties were discussed in the previous post. It’s important to note that a group is a set and an operation. If we change one, then we either have a different group, or we lose the group classification. Real numbers under addition are a group, as are nonzero real numbers under multiplication, but those are two different groups. (This will be important when we get to rings and fields.) Integers under addition are also a group, but a different one than real numbers under addition.

Let’s prove \langle\mathbb{B}^{n}, \oplus \rangle is a group.

Showing a set and operation is a group is pretty algorithmic, in a sense. We just have to show that all three axioms are satisfied.3

(G1): Associativity

This one will be a bit tedious. We have to show associativity for words of any length n. But fear not. Since XOR of words is done bit wise, we can exploit that and first show associativity for binary words of length 1, then “scale it up”, if you will.

In this case, for words of length 1, we just have to brute-force it. We have to show that for any a,b,c \in \mathbb{B}^{1}, that

(a\oplus b) \oplus c = a \oplus (b \oplus c)

So,

\begin{aligned} 1\oplus (1\oplus 1) = 1 \oplus 0 = 1 &\text{ and } (1 \oplus 1) \oplus 1 = 0 \oplus 1 = 1\\ 1\oplus (1 \oplus 0) = 1\oplus 1 = 0 &\text{ and } (1\oplus 1) \oplus 0 = 0 \oplus 0 = 0\\ &\vdots\end{aligned}

Continue in this fashion until you have tried all combinations. (It does work out.)

Now that we have that this is true for words of length 1, we just use the definition of XOR operation on words of length 1 to “scale up” to words of length n. Since the operation is done component-wise, and it is associative on each component, it is associative on the whole word. We’ll show this formally now:

Let \mathbb{a,b,c} \in \mathbb{B}^{n}. So \mathbf{a} = a_{1}a_{2}\ldots a_{n}, \mathbf{b} = b_{1}b_{2}\ldots b_{n}, and \mathbf{c} = c_{1}c_{2}\ldots c_{n}. Then

\begin{aligned}\mathbf{a}\oplus (\mathbf{b} \oplus \mathbf{c}) &= a_{1}a_{2}\ldots a_{n}\oplus [(b_{1}\oplus c_{1})(b_{2}\oplus c_{2})\ldots (b_{n}\oplus c_{n})]\\ &= (a_{1} \oplus (b_{1} \oplus c_{1}))(a_{2} \oplus (b_{2} \oplus c_{2}))\ldots (a_{n} \oplus (b_{n} \oplus c_{n}))\\&= ((a_{1} \oplus b_{1})\oplus c_{1})((a_{2}\oplus b_{2})\oplus c_{2})\ldots ((a_{n}\oplus b_{n})\oplus c_{n})\\&= (\mathbf{a} \oplus \mathbf{b})\oplus \mathbf{c}\end{aligned}

That third equality holds because we already showed that XOR was bit-wise associative. The last equality just recalls what it means to XOR two binary words. With that, we have shown associativity of the XOR operation.

(G2): Existence of an identity element

When we want to show that a group has an identity element, we must actually find a candidate and show it meets the criterion. Here is (frustratingly, sometimes), where intuition and experience tend to play the biggest role. But, as my mother always told me when I was frustrated at my middle school math homework: “Make it look like something you’ve seen before”. XOR is bitwise addition, just with a twist (add then mod out by 2). So let’s start by considering the identity element for addition: 0. We’re looking at binary words of length n, so a good candidate for our identity element e would be a string of n 0s. But does it fit?

Any word

\begin{aligned}a_{1}a_{2}\ldots a_{n} \oplus 000\ldots 0 &= (a_{1}\oplus 0)(a_{2} \oplus 0)\ldots (a_{n} \oplus 0)\\&= a_{1}a_{2}\ldots a_{n}.\end{aligned}

You can check also that 0 \oplus \mathbf{a} = \mathbf{a}, and thus our candidate is a match!4

(G3): Existence of an inverse element for every element in the set

This one is a little bit trickier. We need to be able to find any generic binary word, and show that there is another binary word such that when we XOR them together, we get the sequence of all 0s. (Computer science friends are ahead on this one.)

Think back to how we looked at the XOR operation as a form of error checking. If we XORed two words together, there was a 1 in every position in which they differ, and a 0 in every position in which they were identical.

Therefore, we come to the interesting conclusion that every element is its own inverse! If you XOR an element with itself, you will get a sequence of 0s. With our error checking interpretation, this makes perfect sense. We know the communication line transmits perfectly if we can XOR the sent and received word and get all 0s every time.

Conclusion: every element is its own inverse, and every element in the group is, well, in the group, so we have satisfied the third axiom.

Therefore, we have a group, fellow math travelers!

Bonus round: showing that \langle \mathbb{B}^{n}, \oplus \rangle is an abelian group

Notice that we are missing one property that all of you likely take for granted in regular arithmetic: being able to add and multiply in any order you like. That is, you take for granted that 2+3 = 3+2. This property is called commutativity, and groups with this bonus property are called abelian groups. Not all groups are abelian. Square matrices under matrix multiplication come to mind. However, we will show that \langle \mathbb{B}^{n}, \oplus \rangle is an abelian group.

This actually can be done quite simply by exploiting two things we already know: modulo arithmetic and bit-wise operations. We know that XOR is basically bit wise addition modulo 2. Addition is absolutely commutative, and modding out by 2 doesn’t change that, since we add in the regular ol’ way, then divide by 2 and take the remainder.

That means that XORing binary words of length 1 is commutative. Now, since XORing is done bit-wise, which we noted while proving associativity, we can use that same reasoning again (in fact, it looks almost exactly the same), to conclude that XOR is indeed a commutative operation, and thus we have an abelian group.5

Conclusion

We explored the definition of a group using a common operation and set from computer science. We needed this foundation to be able to study more. The next post in the coding theory and algebra series will take an in depth look at maximum likelihood decoding. This decoding process is a way to attempt to decode and correct errors in transmission when the communication channel is noisy (and most are, somewhat).

Abstract algebra is a powerful branch of mathematics that touches many more things than most of us realize. Now that we know binary words under the XOR operation is a group, we can start extending this and studying it as a structure, or skeleton, rather than focusing on the individual elements. The elements and operation are really just the drywall or stucco on a house. The algebraic structure is the skeleton of beams inside the house. We learn a lot more by studying the structure than the covering, which is why it is important to enter that abstract realm. We will uncover similarities in strange places, and applications we didn’t know were possible.
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Concatenation as an Operation

Concatenation as an Operation

Mathematics is like any activity, sport, or skill: it must be honed and practiced. With that in mind, I have been bolstering up my abilities in algebra with a fantastic book A Book of Abstract Algebra, by Charles C. Pinter.1.

As I go through the chapters, I will be posting and discussing selected relevant exercises that have applications. This first post is a short one on operations and concatenation. I discussed operators here, but will revisit the topic.

 

What’s an operation?

First, we define an operation. From Pinter:

 

Definition (operation): An operation on a given set A, often denoted \ast is a rule which assigns a pair of elements (a,b) in A to exactly one element denoted a\ast b in A

 

Pinter stresses several things about this definition that should not just be read and glossed over. First, a\ast b has to be defined for every single pair of elements in the set A. Some obvious examples include multiplication on integers, or addition on real numbers.

Some rules that try to sneak by, but are not operations including division on the real numbers. Why does this fail? Remember that 0 is a real number. I can divide 0 by anything I like, but can I divide anything by 0? No. A real number divided by 0 is not defined.2.

Secondly, the rule sends a pair of elements to exactly one element. Mathematically, this means a rule must be well defined to be an operation. It can’t take the same pair of elements and get two possible answers.

Lastly, a rule must be closed to be an operation. Whatever (a,b) gets mapped to must also be in the set A. Here again, addition of real numbers is closed. Adding two real numbers yields a real number. Division also becomes our counterexample here, but let’s look at division on the space of just integers. Dividing 2 by 3 is certainly defined, but the result is not an integer; it’s a rational number.3.

 

On to Concatenation

Operations are not just defined on numbers, and they don’t necessarily have to just be your standard arithmetic examples (addition, subtraction, multiplication). We can define operations on any space we want, provided we satisfy the definition. Here we will talk about concatenation as an operation on the space of sequences of symbols. If you’re a programmer, you probably assume we’re looking at binary sequences, but we concatenate English words too. (We just call these compound words, like “lifetime” or “backbone”.)

Let’s call A the alphabet. If we are living in binary, then A = \{0,1\}. If we are speaking of English letters, the alphabet is A = \{a,b,c,\ldots,z\}. Now we can call A^{\ast} the set of all sequences of symbols in the alphabet A.4.

Now we will define an operation on A^{*}: concatenation. Many of you already know what this is, so we’ll formally define it here:

If a and b are two sequences in A^{*}, where \mathbf{a} = a_{1}a_{2}\ldots a_{n}, and \mathbf{b} = b_{1}b_{2},\ldots b_{m} (and each a_{i}, b_{i} are drawn from the alphabet), then the concatenation of  and is given by

\mathbf{ab} = a_{1}a_{2}\ldots a_{n}b_{1}b_{2}\ldots b_{m}

That is, just stick b to the end of a. Quick example, if we live in binary, then for \mathbf{a} = 1010 and \mathbf{b} = 001, then

\mathbf{ab} = 1010001

Let’s also note that there is an empty sequence or NULL sequence \lambda that consists of nothing at all. It’s pretty easy to check that concatenation meets the definition of an operation. Pinter asks us to do three simple things here:

1. Prove that the operation defined above is associative.

Associativity is the property of an operation that allows us to group however we like. Addition is associative:

(1+2) + 3 = 1+ (2+3)

In general, to show associativity of an operation, we need to show that for \mathbf{a,b,c} \in A^{*},

(\mathbf{ab})\mathbf{c} = \mathbf{a}(\mathbf{bc})

First, we will define three generic sequences in our A^{*}. Let

\begin{aligned}\mathbf{a} &= a_{1}a_{2}\ldots a_{m}\\\mathbf{b} &=b_{1}b_{2}\ldots b_{n}\\\mathbf{c} &= c_{1}c_{2}\ldots c_{p}\end{aligned}

Notice here that I did not specify an alphabet, and that the length of \mathbf{a,b} and \mathbf{c} are different. We need to keep this as general as possible to prove the statement. Every restriction we place (same length, specific alphabet, etc) weakens the argument.

Now we will show associativity formally:

\begin{aligned}(\mathbf{ab})\mathbf{c}&=(a_{1}a_{2}\ldots a_{m}b_{1}b_{2}\ldots b_{n})c_{1}c_{2}\ldots c_{p}\\&= a_{1}a_{2}\ldots a_{m}b_{1}b_{2}\ldots b_{n}c_{1}c_{2}\ldots c_{p}\end{aligned}

Here we can see that we can put parentheses any way we want to, and it doesn’t affect the end word. That is,

\begin{aligned}(\mathbf{ab})\mathbf{c}&=(a_{1}a_{2}\ldots a_{m}b_{1}b_{2}\ldots b_{n})c_{1}c_{2}\ldots c_{p}\\&= a_{1}a_{2}\ldots a_{m}b_{1}b_{2}\ldots b_{n}c_{1}c_{2}\ldots c_{p}\\&=a_{1}a_{2}\ldots a_{m}(b_{1}b_{2}\ldots b_{n}c_{1}c_{2}\ldots c_{p})\\&=\mathbf{a}(\mathbf{bc})\end{aligned}

So we’ve concluded that the grouping in which you perform the operation on several elements doesn’t matter, and thus we have concluded that concatenation is associative.

2. Explain why the operation is not commutative.

commutative operation is an operation where the order of the elements in the operation doesn’t matter. That is, for an operation \ast, a\ast b = b\ast a. Examples of commutative operations are addition and multiplication on real numbers ( 2+5 = 5+2, for example). An example of a noncommutative operation is matrix multiplication. In general, the order in which you multiply matrices matters. As an explicit illustration,

\begin{bmatrix}1&2\\1&1\end{bmatrix}\begin{bmatrix}1& 1\\ 1 & 2\end{bmatrix} =\begin{bmatrix} 3 & 5 \\ 2 & 3\end{bmatrix}\neq \begin{bmatrix}2 & 3\\3 & 4\end{bmatrix}=\begin{bmatrix}1& 1\\ 1 & 2\end{bmatrix}\begin{bmatrix}1&2\\1&1\end{bmatrix}

We can see now why concatenation is not commutative. If you append \mathbf{b} to \mathbf{a}, you will definitely not get the same result as appending \mathbf{a} to \mathbf{b}.  “Townhome” and “hometown” are certainly two different words, but the two pieces: “home” and “town” are the same. Switching the order of concatenation gave a different result. Since I have produced a single counter example, concatenation cannot be commutative in general.5

3. Prove there is an identity element for this operation.

An identity element is an element that lives in A^{*} that, when applied via the operation in question to any other element in the set, in any order, returns that other element. Formally, \mathbf{e} \in A^{*} is an identity element for operation \ast if and only if \mathbf{ea} = \mathbf{ae} = \mathbf{a} for every single element in \mathbf{A^{*}}.

Some examples:

  • the identity element for regular addition on the real numbers is 0. 0 + x = x + 0 = x for any real number x
  • the identity element for multiplication on the real numbers is 1.

For concatenation, we seek an element that, when concatenated with any other element, in any order, returns that element. Proving existence has several strategies. The simplest one (in theory) is to find a candidate element, and show it meets the criteria.6. Here, since concatenation involves essentially “compounding” two things together, the only way we could keep an element “unchanged” is to concatenate nothing to it. The NULL element \lambda (which is certainly a word, just a NULL word. Computer scientists are very familiar with this concept) fits our bill here. Try it out. If you take nothing and concatenate a word to it, you just get that word. Conversely, concatenating nothing to a given word doesn’t change the word. So the NULL element is our identity element.

 

Conclusion

Operations and rules aren’t equivalent. A rule is anything we make up for some purpose, but an operation has to meet specific criteria to be called one. This post was meant to show that operations are not restricted to just the space of numbers that we deal with every day, but that we can look at spaces of objects, matrices, functions, words, sequences…anything we like. In addition, we can define operations, as long as we satisfy the definition. This represents the power of abstract algebra: we can take structure we thought only belongs to a very restrictive space (like numbers, or even matrices), and look at other things, like concatenation, in a different light.
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Like Clockwork: Modulo Addition and Finite Groups of Integers

Like Clockwork: Modulo Addition and Finite Groups of Integers

Modulo arithmetic always scared me in college. (Honestly, it’s still not something I can do as easily as integration or spotting pdfs hidden inside icky integrals.) Then when you add abstract algebra (or modern algebra as some call it) and group theory on top of it, the whole topic can become intimidating. Many books on algebra don’t slow down enough to really examine the examples they give, so I’m going dive in and explore some of these. Algebra is a powerful branch of mathematics, with ripple effects all throughout mathematics, engineering, experimental design, and computer science. It’s worth study and understanding, and the topics that scare us tend to lose their fangs after we get to know them a bit.

Addition on the clock

Most of us do modulo arithmetic more often than we think, especially if we look at an analog clock to tell time. For example, if I asked you to convert 14:00 from military time into “regular” time, you’d respond “2 pm” pretty quickly. But how did you do it?

I picked 2 o’clock because I think it should become a standard in America to break at 2 for tea.

 

You got that answer because you went all the way around the clock and then had 2 remaining. This is modulo addition — in this case \mod 12.

Modulo addition is regular addition with an extra couple steps: dividing by your “base” n (on a clock n = 12) and then grabbing the remainder1. We write mathematically that for integers a,b, and n, that

a \equiv b \bmod n \Leftrightarrow b-a= k\cdot n \text{ for an integer } k

You read this as  a is equivalent to b modulo n if b-a is a multiple (integer multiple) of n. Another way to write it is to say that

a \equiv b \bmod n \Leftrightarrow b = k\cdot n + a \text{ for an integer } k

In other words, if b can be expressed as a multiple of n plus some extra amount a, then a is equivalent to b modulo n.

Back to our clock example, obviously 2 \neq 14, but 2\equiv 14 \bmod 12, because

14 = 1\cdot 12 + 2

Here, our b, 14 can be expressed as an integer multiple of 12 (k=1), plus some extra a=2, so we say that 2\equiv 14 \bmod 12.

That n matters. If I changed the n, that statement doesn’t hold true anymore. That is, 2  is not equivalent to 14 \mod 5 because the remainder when you divide 14 by 5 is 4. Next, let’s study the set of possible remainders when we divide an integer by a given n

The Finite Groups \mathbb{Z}_{n}

Since you’re super clever, I bet you’re already one step ahead. If we look at dividing any number by 12, the remainder can only be the numbers 0 – 11. Why? Well if the remainder is 12, then it’s a multiple of 12. The remainder of any integer, no matter how big it is, when dividing by n has to be somewhere in the set \{0,1,2,\ldots, n-1\}

Let’s look at a few with n=12. As an exercise, verify each of these on your own:

1\equiv 13 \bmod 12 \text{ because } 13 = 1\cdot 12 + 1 0\equiv 24 \bmod 12 \text{ because } 24 = 2\cdot 12 + 0 11\equiv 35 \bmod 12 \text{ because } 35 = 2\cdot 12 + 11 0\equiv 48 \bmod 12 \text{ because } 48 = 4\cdot 12 + 0 3\equiv 123 \bmod 12 \text{ because } 123 = 10\cdot 12 + 3

That k is telling us how many times we’ve gone around the clock. So we’re always stuck on the clock, no matter how many times we go around, which means that the remainder can’t be any larger than 11, or we’ve started another cycle around the clock. The set \{0,1,2,\ldots, n-1\} forms an algebraic structure called a group when we pair the set with the operation modulo addition.

So what’s a group?

group is a set paired with an operation that has to have 3 certain properties. The operation doesn’t always have to be addition, or modulo addition, and the set doesn’t have to be integers. Operations can be addition, multiplication, function composition, or some weird one you define. But to pair an operation with a set, we have to check to make sure some rules are followed:

  • First, your operation has to fit the formal definition of an operation. An operation is defied mathematically as a rule that takes any two elements in a set A and returns a unique element that is still in A. This means that you can’t create a rule that lets you escape the set. If my set is integers, and my rule is division, then it’s not an operation on that set because the result of 3/4 isn’t an integer. I escaped the set. The uniqueness requirement is what it means to be well-defined. If my rule is that I take two integers a and b and return the number whose square is the product ab, then 2\cdot 8 = 16 could return either 4 or -4. That’s not well-defined. The rule doesn’t guarantee that a unique element is returned.

Assuming our operation really is an operation (and modulo addition on the integers is. If you don’t believe me, try to prove it for yourself.), then we can move on. We have the set (integers modulo n) and the operation (modulo addition). Now, to meet the definition of a group, the set and operation must follow these three axioms

  1. The operation is associative. That is, the grouping or order shouldn’t matter when I go to execute the operation on 3 elements. That is, [(a+b) + c] \bmod n \equiv [a + (b+c)]\bmod n.
  2. There is an identity element in my set. In other words, there has to be some element in my set such that performing the operation (in this case, modulo n) on any other element in the set return that other element, and the order doesn’t matter. If you take some element and operate on it using the identity, or if you take the identity element and operate on it using that other element, you would always get that other element. 2
  3. Every element in the set has to have an inverse. This means that for every element in my group, there has to be another element such that performing the operation on the element and its proposed inverse in any order returns the identity element we talked about in (2).3

 

We have a special way of writing the set of the integers modulo n: \mathbb{Z}_{n} = \{0,1,\ldots,n-1\}. We’ll do a specific example and verify that \mathbb{Z}_{3} under modulo addition is a group.

\mathbb{Z}_{3} and operation tables

We can examine groups via operation tables, which are a nice visual. Think of your multiplication tables when you were young. The elements are placed on the rows and columns, and the entries are the result of applying the operation to the two elements. The row entry goes first, then the column entry. For \mathbb{Z}_{3}, we know the elements are \{0,1,2\}, so the operation table looks like this:

\begin{array}{l|rrr}\color{purple}{+}&\color{purple}{0} &\color{purple}{1} & \color{purple}{2}\\\color{purple}{0} & 0 & 1 & 2\\\color{purple}{1} & 1 & 2 & 0\\\color{purple}{2} & 2 & 0 & 1\end{array}

So, reading the table, the first entry is row 1 + column 1 (\bmod 3\!), which is

(0 + 0) \bmod 3 \equiv 0 \bmod 3

Another one is the row 2 + column 3:

(1 + 2) \bmod 3 = 3 \bmod 3 \equiv 0\bmod 3

Why? Well, after we add, we have to divide by 3 and take the remainder. As an exercise, make sure you can generate this table yourself. It gives good practice, and if you can comfortably generate this, then you’re doing great!

Now, let’s verify that \mathbb{Z}_{3} under modulo addition is a group

We know that modulo addition is an operation, so let’s see if combining it with \mathbb{Z}_{3} fits the definition of a group. This means we have to verify the three axioms, starting with the first:

 

Axiom 1: Associativity:

We need to show the following to satisfy the first axiom:

[(a+b) + c] \bmod 3 \equiv [a + (b+c)]\bmod 3.

First, let’s take any three unspecified elements of \mathbb{Z}_{3}. Call the a, b, and c. We don’t specify these because we have to keep things general in order to show that it doesn’t matter which three elements we pick.4 We are going to start with [(a+b) + c]\bmod 3 and manipulate it to show that it’s equivalent to [a+(b+c)]\bmod 3. To differentiate between regular old addition and modulo addition (modulo 3 in this case), I am going to use the notation +_{3} when I mean addition modulo 3.

First, by the definition of modulo arithmetic,

[(a+b)+c]\bmod 3 = (a+b)\bmod 3 +_{3} c\bmod 3

The right hand side of the above is the same thing as using regular addition to add a, b, and c together first, then take that result modulo 3. 5 So we write that

\begin{aligned}[(a+b)+c]\bmod 3 &= (a+b)\bmod 3 +_{3} c\bmod 3\\&= (a+b+c)\bmod 3\end{aligned}

Look inside the parentheses. Now we’re back in the nice, safe world of regular addition, and we know we can group and add any ol’ way we want to inside those parentheses. So let’s do that. We know that

(a+b+c) = (a+b) + c = a + (b+c).

Grab that last equality.6 Then we can write

\begin{aligned}[(a+b)+c]\bmod 3 &= (a+b)\bmod 3 +_{3} c\bmod 3\\&= (a+b+c)\bmod 3\\&=[a+(b+c)]\bmod 3\end{aligned}

Hmm, now it looks like we can “descend” the same ladder we walked up to start with. If [(a+b)+c]\bmod 3 = (a+b)\bmod 3 +_{3} c\bmod 3, then it also holds that [a+(b+c)]\bmod 3 = a\bmod 3 +_{3} (b+c)\bmod 3.7 \begin{aligned}[(a+b)+c]\bmod 3 &= (a+b)\bmod 3 +_{3} c\bmod 3\\&= (a+b+c)\bmod 3\\&=[a+(b+c)]\bmod 3\\&= a \bmod 3 +_{3} (b+c) \bmod 3\end{aligned}

Last step. We are now basically done. Use the definition of modulo arithmetic one more time to finish.

\begin{aligned}[(a+b)+c]\bmod 3 &= (a+b)\bmod 3 +_{3} c\bmod 3\\&= (a+b+c)\bmod 3\\&=[a+(b+c)]\bmod 3\\&= a \bmod 3 +_{3} (b+c) \bmod 3\\&\equiv [a+(b+c)]\bmod 3\end{aligned}

 

Axiom 2: Existence of an identity 

This one is nice and easy here. We already did the work by calculating out that operation table. Which element when added to any other element returns the other element? Does the order matter? Use the multiplication table to verify that 0 is the identity element of \mathbb{Z}_{3}

 

Axiom 3: Every element has to have an inverse

We also did the setup for this one too. Look at each element (there are only 3). Each element needs to have a corresponding pair that you can modulo add to it to retrieve the identity element 0 \mod 3. Clearly, 08 is its own inverse.9 (0+0) \bmod 3 = 0\mod 3. Also,

(1+2)\bmod 3 \equiv (2+1) \bmod 3 \equiv 0 \bmod 3.

So 1 and 2 are inverses of each other. Therefore, every element has an inverse, and we’ve proven that \mathbb{Z}_{3} is a group under the operation modulo addition.

 

I read all the way down to here, so what should I take from this?

All the things we just did were to learn a couple things:

  1. There is more than one kind of addition. Modulo addition is very commonly used, with the typical example being clock reading. We also use it when we program computers (because machines operate in binary, which is actually \mathbb{Z}_{2}. It also shows up in many programming applications, like calendars (also use modulo addition) and the storage mapping function. (For a fantastic explanation of the “why I should care”, check out I Programmer’s article on SMF and the article on the mod function. )
  2. Group theory can simplify our lives by giving us fewer things to study. If I want to study the effect dividing by 3 has on numbers, I can just study \mathbb{Z}_{3}, because dividing by 3 can only have, well, 3 possible options: a remainder of 0, 1, or 2. Now I don’t have to study all the integers in this situation to get the information I need.

This article will be just the start of various explorations of group theory. Here I gave an introduction to a specific type of group, the integers modulo n, because it will turn out that the structure and behavior of these groups are actually the same as other groups that look much harder to study, like symmetries of regular shapes. For example, \mathbb{Z}_{3} has the same mathematical structure as all the lines of symmetry through an equilateral triangle.10. This has vast uses in chemistry and physics, studying the shapes of molecules.

Abstract algebra is a vastly useful subject, despite its reputation for living “in the clouds.” Since I myself need to deepen my understanding of this remarkable branch of mathematics, join me on the journey and stay tuned for more posts exploring notions in group theory. Perhaps as I become better friends with algebra, it will appear more useful and beautiful to you as well.
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Uncorrelated and Independent: Related but not Equivalent

Uncorrelated and Independent: Related but not Equivalent

Mathematics is known for its resolute commitment to precision in definitions and statements. However, when words are pulled from the English language and given rigid mathematical definitions, the connotations and colloquial use outside of mathematics still remain. This can lead to immutable mathematical terms being used interchangeably, even though the mathematical definitions are not equivalent. This occurs frequently in probability and statistics, particularly with the notion of uncorrelated and independent. We will focus this post on the exact meaning of both of these words, and how they are related but not equivalent.

Independence

First, we will give the formal definition of independence:

 


Definition (Independence of Random Variables).

Two random variables X and Y are independent if the joint probability distribution P(X, Y) can be written as the product of the two individual distributions. That is,

P(X, Y) = P(X)P(Y)

 

Essentially this means that the joint probability of the random variables X and Y together are actually separable into the product of their individual probabilities. Here are some other equivalent definitions:

  • P(X \cap Y) = P(X)P(Y)
  • P(X|Y) = P(X) and P(Y|X) = P(Y)

This first alternative definition states that the probability of any outcome of X and any outcome of Y occurring simultaneously is the product of those individual probabilities.

For example, suppose the probability that you will put ham and cheese on your sandwich is P(H \cap C) = 1/3. The probability that ham is on your sandwich (with or without any other toppings) is P(H) = 1/2 and the probability that cheese is on your sandwich (again, with or without ham or other goodies) is P(C) = 1/2. If ham and cheese were independent sandwich fixings, then P(H\cap C) = P(H)P(C), but

P(H\cap C)= 1/3 \neq 1/4 = P(H)P(C)

Thus, ham and cheese are not independent sandwich fixings. This leads us into the next equivalent definition:

Two random variables are independent if  P(X|Y) = P(X) and P(Y|X) = P(Y).

The vertical bar is a conditional probability.  P(X|Y) reads “probability of X given Y“, and is the probability X will have any outcome x given that the random variable Y has occurred.

The second definition means that two random variables are independent if the outcome of one has no effect on the other. That is, putting cheese on my sandwich doesn’t affect the likelihood that I will then add ham, and  if I started with ham, then it doesn’t affect the likelihood I will add cheese second. The example above already showed that ham and cheese were not independent, but we’ll repeat it again with this other equivalent definition.

By Bayes’ formula,

P(H | C) = \frac{P(H\cap C)}{P(C)} = \frac{1/3}{1/2} = \frac{2}{3} \neq \frac{1}{2} = P(H)

The probability that ham will be on my sandwich given that cheese is on my sandwich is 2/3. This means the presence of cheese increases the likelihood that ham will be there too, which also tells me ham and cheese are not independent.

P(C | H) = \frac{P(H\cap C)}{P(H)} = \frac{2}{3} \neq \frac{1}{2} = P(C)

In addition, I’m more likely to add cheese to the sandwich if ham is already there. In both of these, the presence of one affects the probability of the other, so they are not independent.

Coin flips are independent. The probability of me flipping a quarter and getting a head doesn’t affect the probability of you then getting a tail (or head) when you pick up that quarter and flip it after me. Independence is a common assumption in statistics because most distributions rely on it, though real data is rarely truly independent. For the development of the theory of dependent random variables, see herehere, and here.

Next we’ll define what it means to be uncorrelated, and discuss some of the subtleties and equivalent interpretations.

Uncorrelated

When people use the word “uncorrelated”, they are typically referring to the Pearson correlation coefficient (or product-moment coefficient) having a value of 0. The Pearson correlation coefficient of random variables X and Y is given by

\rho = \frac{\text{Cov}(X,Y)}{\sqrt{\text{Var}(X)}\sqrt{\text{Var}(Y)}}

Where \text{Var}(X) is the variance of the random variable, and \text{Cov}(X,Y) is the covariance between X and Y. A correlation of 0, or X and Y being uncorrelated implies \text{Cov}(X,Y) = 0, and thus it suffices to just look at the numerator.

The covariance between two random variables X and Y measures the joint variability, and has the formula

\text{Cov}(X,Y) = E[XY]-E[X]E[Y] E[\cdot] is the expectation operator and gives the expected value (or mean) of the object inside. E[X] is the mean value of the random variable X. E[XY] is the mean value of the product of the two random variables X and Y.

For an example, suppose X and Y can take on the joint values (expressed as ordered pairs (0,0), (1,0), and (1,1) with equal probability. Then for any of the three possible points (x,y), P( (X,Y) = (x,y)) = 1/3. We will find the covariance between these two random variables.

The first step is to calculate the mean of each individual random variable. X only takes on two values, 0 and 1, with probability 1/3 and 2/3 respectively. (Remember that two of the points have X = 1, with each of those probabilities as 1/3.) Then

E[X] = 0\cdot 1/3 + 1\cdot 2/3 = 2/3

Similarly, E[Y] = 0\cdot 2/3 + 1\cdot 1/3 = 1/3. Now, we must calculate the expected value of the product of X and Y. That product can take on values 0 or 1 (multiply the elements of each ordered pair together) with respective probabilities 2/3 and 1/3. These probabilities are obtained the same way as for the individual expectations. Thus,

E[XY] = 0\cdot 2/3 + 1\cdot 1/3 = 1/3

Finally, we put it all together:

\text{Cov}(X,Y) = E[XY]-E[X]E[Y] = \frac{1}{3}-\frac{2}{3}\cdot\frac{1}{3} = \frac{1}{3}-\frac{2}{9} = \frac{1}{9}

Covariance (and correlation, the normalized form of covariance) measure the linear relationship between two random variables. This is important. Next, we’ll look at how independence and correlation are related.

Independence \Rightarrow Uncorrelated

Here’s the start of where the confusion lies. First, it is absolutely true that if two random variables are independent, then they are uncorrelated. It’s important to prove this, so I will do it. I will prove this for discrete random variables to avoid calculus, but this holds for all random variables, both continuous and discrete.


Theorem. If two random variables X and Y are independent, then they are uncorrelated.


 

Proof. Uncorrelated means that their correlation is 0, or, equivalently, that the covariance between them is 0. Therefore, we want to show that for two given (but unknown) random variables that are independent, then the covariance between them is 0.

Now, recall the formula for covariance:

\text{Cov}(X,Y) = E[XY]-E[X]E[Y]

If the covariance is 0, then E[XY] = E[X]E[Y]. Therefore, we say mathematically that it is sufficient to show that E[XY] = E[X]E[Y]. This is because uncorrelated is equivalent to a 0 covariance, which is equivalent to E[XY] = E[X]E[Y], and thus showing this last equality is equivalent to showing that X and Y are uncorrelated.

OK, we are given that X and Y are independent, which by definition means that P(X,Y) = P(X)P(Y). This is our starting point, and we know what we want to show, so let’s calculate E[XY]:

E[XY] = \sum_{x}\sum_{y}x\cdot y\cdot P(X=x, Y=y)

This is what E[XY] is. We have to sum the product of the two random variables times the probability that X=x and Y=y over all possible values of X and all possible values of Y. Now, we use the definition of independence and substitute P(X=x)P(Y=y) for P(X=x, Y=y):

\begin{aligned}E[XY] &= \sum_{x}\sum_{y}x\cdot y\cdot P(X=x, Y=y)\\&=\sum_{x}\sum_{y}x\cdot y\cdot P(X=x)P(Y=y)\end{aligned}

If I sum over Y first, then everything related to X is constant with respect to Y. 1. Then I can factor out everything related to X from the sum over y:

\begin{aligned}E[XY] &= \sum_{x}\sum_{y}x\cdot y\cdot P(X=x, Y=y)\\&=\sum_{x}\sum_{y}x\cdot y\cdot P(X=x)P(Y=y)\\&= \sum_{x}x\cdot P(X=x)\sum_{y}y\cdot P(Y=y)\end{aligned}

Then if I actually carry out that inner sum over y, it becomes a completed object with no relationship to the outer sum going over x. That means I can put parentheses around it and pull it out of the sum over x:

\begin{aligned}E[XY] &= \sum_{x}\sum_{y}x\cdot y\cdot P(X=x, Y=y)\\&=\sum_{x}\sum_{y}x\cdot y\cdot P(X=x)P(Y=y)\\&= \sum_{x}x\cdot P(X=x)\left(\sum_{y}y\cdot P(Y=y)\right)\\&= \left(\sum_{y}y\cdot P(Y=y)\right)\left(\sum_{x}x\cdot P(X=x)\right)\end{aligned}

Looking at the objects in each group of parentheses, we see that it matches the definition of expectation for X and Y. That is E[X] = \sum_{x}x\cdot P(X=x), and similar for Y. Therefore, we have shown that E[XY] = E[X]E[Y], and have proven that independence always implies uncorrelated.

Now, to use these two words (independent and uncorrelated) interchangeably, then we would have to know that the converse of the statement we just proven is true: that

Uncorrelated implies independence

If we find even one counterexample (an example where the two variables have 0 correlation but do not fit the definition of independence), then the converse is false and we cannot use those terms interchangeably.

No luck, I have a counterexample

 

Let’s take X and Y to exist as an ordered pair at the points (-1,1), (0,0), and (1,1) with probabilities 1/4, 1/2, and 1/4. Then E[X] = -1\cdot 1/4 + 0 \cdot 1/2 + 1\cdot 1/4 = 0 = E[Y] and

E[XY] = -1\cdot 1/4 + 0 \cdot 1/2 + 1\cdot 1/4 = 0 = E[X]E[Y]

and thus X and Y are uncorrelated.

Now let’s look at the marginal distributions of X and Y. X can take on values -1, 0, and 1, and the probability it takes each of those is 1/4, 1/2, and 1/4. Same with Y. Then looping through the possibilities, we have to check if P(X=x, Y=y) = P(X=x)P(Y=y) P(X=-1, Y=1) =1/4 \neq 1/16 = P(X=-1)P(Y=1)

We loop through the other two points, and see that X and Y do not meet the definition of independent.

Therefore, we just found an example where two uncorrelated random variables are not independent, and thus the converse statement does not hold.

 

Conclusion

Correlation is a linear association measure. We saw in our counterexample that there was no linear relationship between the two random variables2 That doesn’t mean the variables don’t have any effect on each other (again, as we saw in our counterexample.) The common mistake is to forget that correlation is a restrictive measure of relationships, since it only covers linear types. Independence is a measure of “probabilistic effect”, which encompasses far more than simply linear association.

The words uncorrelated and independent may be used interchangeably in English, but they are not synonyms in mathematics. Independent random variables are uncorrelated, but uncorrelated random variables are not always independent. In mathematical terms, we conclude that independence is a more restrictive property than uncorrelated-ness.
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.