### Browsed byCategory: Algebra

Applications of Reflections: Taking a Group to its “Abelian” Form

## Applications of Reflections: Taking a Group to its “Abelian” Form

In continuing the exploration of explicit applications and examples of category-theoretic concepts, we highlight the versatility of reflections and reflective subcategories. This concept can be used to perform all kinds of desired actions on a category to yield a subcategory that “nicer” in some way. This article explores how we can use reflections to make groups abelian.

Please see the previous article by Traylor and Fadeev for the definition of a category and other properties.

Definition (Subcategory):

A category $\mathbf{A}$ is a subcategory of a category $\mathbf{B}$ if
(1) $\mathrm{Ob}(\mathbf{A}) \subseteq \mathrm{Ob}(\mathbf{B})$
(2) for each $A, A' \in \mathrm{Ob}(\mathbf{A})$, $\mathrm{hom}_{\mathbf{A}}(A,A') \subseteq \mathrm{hom}_{\mathbf{B}}(A,A')$
(3) for each $A \in \mathrm{Ob}(\mathbf{A})$, the $\mathbf{B}-$identity on $A$ is the same as the $\mathbf{A}-$identity on $A$.
(4) composition in $\mathbf{A}$ is the restriction of composition in $\mathbf{B}$ to the morphisms of $\mathbf{A}$

Adámek et al, Abstract and Concrete Categories(1990)

Point by point, we’ll pick the definition apart. The first part is pretty clear. The collection of objects in a subcategory is contained in the collection of objects in its “parent”. The second criterion says that the set of morphisms from one object $A$ to another object $A'$ inside the littler category $\mathbf{A}$ should be a subset of all the morphisms from the same $A$ to the same $A'$, but inside $\mathbf{B}$. That is, there are morphisms from $A \to A'$ in $\mathbf{B}$ that won’t live in the subcategory $\mathbf{A}$.

The third criterion just states that the identity morphisms on objects should match in both categories. The final criterion tells us that composition inside the subcategory $\mathbf{A}$ only “works” on the morphisms inside $\mathbf{A}$, but is otherwise the same composition as in $\mathbf{B}$. We just only perform compositions on morphisms in $\mathbf{A}$ when we’re in $\mathbf{A}$.

We now define a reflection.

Definition ($A-$ reflection)

Let $\mathbf{A}$ be a subcategory of $\mathbf{B}$, and let $B \in \mathrm{Ob}(\mathbf{B})$.

(1) An $\mathbf{A}-$ reflection for $B$ is a morphism $B \xrightarrow{r} A$ from $B$ to $A \in \mathrm{Ob}(\mathbf{A})$ such that for any morphism $B \xrightarrow{f} A'$ from $B$ to $A' \in \mathrm{Ob}(\mathbf{A})$ there exists a unique $f': A \to A'$ such that $f = f' \circ r$.
(2) $\mathbf{A}$ is a reflective subcategory for $\mathbf{B}$ if each $\mathbf{B}-$ object has an $\mathbf{A}-$reflection.

Currently, this definition seems a bit abstract. The following sections will illustrate concrete examples of reflections to understand this definition better.

## Taking a Group to Its Abelian Form

### The Category $\mathbf{Grp}$

For this example, we’ll be working with the category of groups, $\mathbf{Grp}$. The objects in this category are groups, and the morphisms are group homomorphisms. Some elements of this category are:

• $(\mathbb{R}, +) \xrightarrow{\phi} (\mathbb{R}^{+}, \cdot)$, $\phi(x) = e^{x}$. The real numbers under addition and the positive real numbers under multiplication are both groups, so they’re objects in this category. $\phi$ here is a group homomorphism1, so it’s a morphism in $\mathbf{Grp}$.
• The dihedral group $D_{6}$, and the permutation group $S_{3}$ are also objects in this class. Recall that the dihedral group $D_{6}$ is commonly visualized using the symmetries of an equilateral triangle2, but is commonly given using the following presentation: $$D_{6} = \langle r,s : r^{3} = s^{2} = 1, sr = r^{-1}s\rangle$$ Here, we see that $D_{6}$ is generated by two elements $r$ and $s$ ($r$ is the rotation by $2\pi/3$, and $s$ is one of the reflection lines). $S_{3}$ is the set of permutations on 3 elements– all permutations of the integers $\{1,2,3\}$. Both of these are groups, and both have 6 elements. If we define $\phi: \begin{cases} r \to (123) \\ s \to (12)\end{cases}$ Then $\phi$ is a group homomorphism that takes $D_{6}$ to $S_{3}$, and is also thus a morphism in $\mathbf{Grp}$
• As one last example, let $\mathrm{GL}_{2}(\mathbb{R})$ be the general linear group of degree $2$. This is the group of invertible $2\times 2$ matrices with real entries. Then we can create a group homomorphism $\phi: D_{6} \to \mathrm{GL}_{2}(\mathbb{R})$ by letting $$\phi(r) = \begin{bmatrix}\cos(\theta) & -\sin(\theta)\\ \sin(\theta) & \cos(\theta)\end{bmatrix} \qquad \phi(s) = \begin{bmatrix} 0 & 1\\ 1 & 0 \end{bmatrix}$$ so this $\phi$ is also a morphism in $\mathbf{Grp}$.

### The Category $\mathbf{Ab}$

A subcategory of $\mathbf{Grp}$ is $\mathbf{Ab}$, the category of abelian groups. Morphisms are again group homomorphisms, but this time we are only looking at morphisms between abelian groups. Some examples:

(1) $(\mathbb{R}, +) \xrightarrow{\phi} (\mathbb{R}^{+}, \cdot)$, $\phi(x) = e^{x}$. The real numbers under addition, and the positive real numbers under multiplication are abelian groups, so they’re in the subcategory $\mathbf{Ab}$.

(2) $(\mathbb{Z}_{2} \times \mathbb{Z}_{2}, +)$ and $(\mathbb{Z}_{4}, +)$ are also abelian groups. We can define a group homomorphism $\psi: \mathbb{Z}_{2} \times \mathbb{Z}_{2} \to \mathbb{Z}_{4}$ by $$\psi: \begin{cases} (0,0) \to 0 \\ (1,0) \to 1\\ (0,1) \to 2 \\ (1,1) \to 3\end{cases}$$, so $\psi$ is in the collection of morphisms in $\mathbf{Ab}$. Of course, it’s also in the morphisms of $\mathbf{Grp}$ as well.

(3) $D_{6}$ is not commutative, so it’s not an object in $\mathbf{Ab}$

(4)$\mathrm{GL}_{2}$ under matrix multiplication is not abelian, because matrix multiplication is not a commutative operation. So neither this group nor the group homomorphism between $D_{6}$ and $\mathrm{GL}_{2}$ are in $\mathbf{Ab}.$

### Creating the Commutator Subgroup

We now discuss the commutator element. The commutator of two elements $g,h$ in a group, denoted $[g,h]$ is given by $[g,h] = g^{-1}h^{-1}gh$.

Remark: The order matters here.

Let’s work with a nice easy group of matrices. Let $$G =\left\{I_{2},\begin{bmatrix}0&1\\1&0\end{bmatrix},\begin{bmatrix}0&1\\-1&-1\end{bmatrix},\begin{bmatrix}-1&-1\\0&1\end{bmatrix},\begin{bmatrix}-1&-1\\1&0\end{bmatrix},\begin{bmatrix}1 & 0\\-1&-1\end{bmatrix}\right\}$$ under matrix multiplication. We’ll name these matrices $\{I,A,B,C,D,K\}$ respectively. This group is not commutative, as shown in the group table below:

 $\cdot$ I A B C D K I I A B C D K A A I C B K D B B K D A I C C C D K I A B D D C I K B A K K B A D C I

Next, we’re going to form the commutator subgroup, which is the subgroup of $G$ consisting of all commutators of $G$. Let’s call this subgroup $\tilde{G}$. The quotient $G/\tilde{G}$ is always an abelian group, so “quotienting out by” $\tilde{G}$ and working with the cosets gives us an “abelian version” of our previously non-abelian group.

We’ll calculate a few commutator elements to demonstrate how, but will not run through every combination. \begin{aligned}[I,X]&=[X,I]=I\\ [A,B]&=A^{-1}B^{-1}AB=ADAB=D\\ [B,A]&=B^{-1}A^{-1}BA=DABA=B\\ [C,D]&=C^{-1}D^{-1}CD=CBCD=B\\ \vdots\end{aligned}

Continuing with all combinations, we find that there are only three commutator elements: $\{I, B, D\}$. Thus3 $\tilde{G} = \{I,B,D\}$. Now, $G/\tilde{G}$ gives us the left cosets of $\tilde{G}$: \begin{aligned}A\tilde{G}&= C\tilde{G}=K\tilde{G}=\{A,C,K\}\\ B\tilde{G}&= D\tilde{G}=I\tilde{G}=\{I,B,D\}\end{aligned}Thus, the commutator subgroup is $G/\tilde{G} = \{A\tilde{G}, \tilde{G}\}$. This two-element group is a bit dull, admittedly, but it certainly is abelian, with the identity element as $\tilde{G}$.

### Back to Reflections

Something else we can do with this little two-element group is map it to $(\mathbb{Z}_{2}, +)$ via the homomorphism $\phi: G/\tilde{G} \to \mathbb{Z}_{2}$, where $\phi(\tilde{G}) = 0$, and $\phi(A\tilde{G}) = 1$

What does any of this have to do with reflections? Recall that $G \in \mathrm{Ob}(\mathbf{Grp})$, but not in $\mathrm{Ob}(\mathbf{Ab})$. $G/\tilde{G}$ is in $\mathrm{Ob}(\mathbf{Ab})$, and so is $\mathbb{Z}_{2}$

A reflection for $G$ into $\mathbf{Ab}$ is a morphism $r$ such that for any morphism $\psi:G \to A'$, $A' \in \mathrm{Ob}(\mathbf{Ab})$, we can find a morphism $\phi$ completely contained in $\mathbf{Ab}$ such that $\psi = \phi \circ r$

Let $A'= \mathbb{Z}_{2}$, and define $\psi: G \to \mathbb{Z}_{2}$ by $$\psi: \begin{cases}A \to 1 \\ C \to 1 \\ K \to 1 \\ B \to 0\\D \to 0 \\I \to 0\end{cases}$$ This is certainly a homomorphism ($\psi(XY) = \psi(X)\psi(Y)$).

What if we could “bounce” this group $G$ off another group $H$ in $\mathbf{Ab}$, then move from $H$ to $\mathbb{Z}_{2}$? That might reveal a little more about $\psi$ than just the seemingly contrived definition we put forth.

Let $r: G \to G/\tilde{G}$ be the canonical map sending each element of $G$ to the appropriate coset. That is $r(A) = A\tilde{G}$, $r(B) = \tilde{G}$, etc. Then the image of $r$ is $G/\tilde{G}$, and $\phi$ as defined above is the unique morphism that will take $G/\tilde{G}$ to $\mathbb{Z}_{2}$ such that $\psi = \phi \circ r$

One reflection, $r$, taking $G$ to some object $A$ in $\mathbf{Ab}$. Grab a morphism $\psi$ from $G$ to somewhere else in $\mathbf{Ab}$ (we picked $\mathbb{Z_{2}}$). Then we have to be able to find a unique $\phi$ such that $\psi$ decomposes into the composition of $r$ with that $\phi$. This same $r$ (and thus the same $A$) should be able to perform this action for any $A'$, any $\psi$. In our case, the reflection is the canonical map from $G$ to its commutator subgroup.

## Conclusion

Reflections can perform many different actions to take objects in one category to objects in a subcategory. We focused on reflections making things “abelian” in a nice way, which helped reveal some structures that would have otherwise been hidden.

Equivalence v. Isomorphisms in Category Theory

## Introduction

Editor’s Note: The article is co-written by Rachel Traylor (The Math Citadel/Marquette University) and Valentin Fadeev (The Open University, UK). Substantial additional review, contributions, and discussions were provided by Matt Kukla and Jason Hathcock. A pdf is available for download at the end of this post.

The biggest challenge we have found in studying category theory is the lack of “concrete” examples to illustrate concepts that aren’t themselves highly abstract. This may be due to our more “vulgar” nature as a primarily applied mathematician (from the viewpoint of many academic mathematicians). However, we firmly believe that if category theory wishes to deliver on the promises it makes to help bind many branches of mathematics (and even the sciences) together, it needs a little boring vulgarity sometimes when illustrating concepts as opposed to a never-ending march towards greater abstraction and generality.

Perhaps we are better poised than many to do the dirty work of illustrating some of these concepts in a more boring way, on topics so familiar many mathematicians ignore them. For this particular article, we will elaborate on the difference between isomorphic categories and equivalent categories. The difference between the two has a degree of subtlety, but exploring a tangible example involving matrices and finite-dimensional vector spaces really clarified these concepts for us.

## Necessary Definitions

We will not assume a familiarity with all the terms we will be using, so some definitions for reference are provided here. It is perfectly acceptable to skim these until they are invoked later, then return for further examination. For reference, please see Abstract and Concrete Categories by Adámek, Herrlich, and Strecker (1990).

Definition (Category)

A category is a quadruple $\mathbf{A} = (\mathscr{O}, \text{hom}, \text{id}, \circ)$ that consists of

1. A class $\mathscr{O}$ whose members are called $\mathbf{A}-$objects.1
2. For each pair $(A,B)$ of $\mathbf{A}-$objects, a set $\hom(A,B)$ whose elements are called morphisms from $A$ to $B$. Denoted $A \xrightarrow{f} B$ or $f: A \to B$. The collection of all morphisms in $\mathbf{A}$ is denoted $\text{Mor}(\mathbf{A})$.
3. For each $\mathbf{A}-$object $A$, there is an identity morphism $A \xrightarrow{id_{A}} A$.
4. A composition law that associates with each $\mathbf{A}-$morphism $A \xrightarrow{f} B$ and $B \xrightarrow{g} C$ an $\mathbf{A}-$morphism $A \xrightarrow{g \circ f} C$ we call the composite of $f$ and $g$.2 This composite map $g\circ f$ is subject to the following conditions:
• Composition is associative. That is, for $A \xrightarrow{f} B$, $B \xrightarrow{g} C$, and $C \xrightarrow{h} D$

$$h \circ (g\circ f) = (h \circ g) \circ f.$$
• For any morphism $A \xrightarrow{f} B$, $id_{B} \circ f = f$ and $f \circ id_{A} = f$.3
• The sets $\hom(A,B)$ are pairwise-disjoint, meaning that a morphism $f$ cannot belong to $\hom(A,B)$ and $\hom(A',B')$ at the same time. It can only be in one set of morphisms. This condition guarantees each morphism has a unique domain (where we’re mapping from) and codomain (where we map to).
Abstract and Concrete Categories, Adámek, Herrlich, Strecker (1990)

For those who have never seen any category theory at all, it is worth a pause to list a couple examples of categories. We shall only list the ones that of interest to this article. There are many more examples, and many more “types” of categories beyond these.

• $\mathbf{Set}$ is the category of all sets. $\mathscr{O}$ is the class of all sets. (Here a particular set is an object. $\mathbb{N}$ is an object in this category, as is the set of all permutations of the numbers $\{1,2,3\}$. We don’t care about individual elements of these sets. We move from one whole set to another whole set. $\hom(A,B)$ is the set of all functions from set $A$ to set $B$. $id_{A}$ is just the identity function on a set $A$, and $\circ$ is our standard composition of functions.
• $\mathbf{Vect}$ is the category whose objects are all real vector spaces. Morphisms in this category are linear transformations from one vector space to another, so $\hom(A,B)$ is the set of all linear transformations from vector space $A$ to vector space $B$.4 The identity morphisms are identity transformations, and composition in this category is the standard composition of linear maps.
• $\mathbf{Mat}$ an interesting category worth spending some time on. Objects here are not matrices, as one might guess. Objects in $\mathbf{Mat}$ are natural numbers. Morphisms are $m \times n$ real-valued matrices. (We can generalize this concept to $\mathbf{Mat}_{R}$, where the entries of the matrices are the elements of a particular specified ring $R$, but let’s just stick with real-valued matrices here. Put differently, morphisms take a natural number $m$ “to” a natural number $n$ by assigning it a matrix of dimensions $m \times n$ with real-valued entries. $\hom(m,n)$ is the set of all $m\times n$ real-valued matrices. Structurally, there is only one morphism from $m$ to $n$. While we can have different $m\times n$ matrices that perform the task of mapping $m$ to $n$, they are structurally equivalent in that they map two natural numbers to the same dimensions. In this category, the identity morphism $id_{m}$ are $m \times m$ identity matrices. Composition in this category is defined by $A \circ B = BA$, where $BA$ is just the usual matrix multiplication. It’s worth noting here that a category doesn’t require that all morphisms be composable. $\mathbf{Mat}$ provides a nice example of this. We can only multiply matrices when the dimensions are appropriate to do so, which means only certain morphisms are composable. This doesn’t violate the definition of a category or associativity, etc. Composition must be associative when composition can be done.

Morphisms allow us to move between objects in a category. If we want to move from one category to another, we’ll need the concept of a functor.

Definition (Functor)

If $\mathbf{A}$ and $\mathbf{B}$ are categories, then a functor $F$ from $\mathbf{A}$ to $\mathbf{B}$ is a mapping that assigns to each $\mathbf{A}-$object $A$ and $\mathbf{B}-$object $FA$. It also assigns to each $\mathbf{A}-$morphism $A \xrightarrow{f} A'$ a $\mathbf{B}$-morphism $FA \xrightarrow{Ff} FA'$ such that the following conditions hold:

1. $F(f\circ g) = Ff \circ Fg$ (Composition is preserved whenever $f\circ g$ is defined.)
2. For every $\mathbf{A}-$object $A$, $F(id_{A}) = id_{FA}$. This makes sure the functor maps the identity morphism on $A$ to the identity morphism on the object that $A$ maps to in the category $\mathbf{B}$.

Notationally, we’ll see functors written $F: \mathbf{A} \to \mathbf{B}$, or $F(A\xrightarrow{f}B)$, which shows more clearly that functors move objects and morphisms.

Functors themselves can have different properties, that are a bit similar to concepts from studying regular functions (e.g. injectivity, surjectivity). We can also compose functors, and that composite functor is indeed a functor.

Definition (Isomorphism as a functor)

A functor $F : \mathbf{A} \to \mathbf{B}$ is called an isomorphism if there is a functor $\mathbf{G}: \mathbf{B} \to \mathbf{A}$ such that $G \circ F = id_{\mathbf{A}}$ and $F \circ G = id_{\mathbf{B}}$.

Note $id_{\cdot}$ here is the identity functor for a category, not the identity morphism on an object. The identity functor will just take all objects to themselves, and all morphisms to themselves, but the identity functor acts on a category, whereas the identity morphism acts on an object within a category.

Definition (Faithful Functor)

A functor is faithful if all the hom-set restrictions $F: \hom_{\mathbf{A}}(A,A') \to \hom_{\mathbf{B}}(FA, FA')$ are injective.

Here, this means that if a morphism $f \in \text{hom}_{\mathbf{A}}(A,A')$ maps to $Ff \in \hom_{\mathbf{B}}(FA, FA')$, and another morphism $g \in \hom_{\mathbf{A}}(A,A')$ maps to $Fg \in \hom_{\mathbf{B}}(FA, FA')$ we have that $Ff = Fg$ (the two “target” morphisms are the same), then $f = g$. This needs to hold for all sets of morphisms between any two objects.

Definition (Full Functor)

A functor $F$ is full if all the hom-set restrictions $F: \hom_{\mathbf{A}}(A,A') \to \hom_{\mathbf{B}}(FA, FA')$ are surjective.

In other words, for a functor to be full, we need a morphism $g$ in $\hom_{\mathbf{B}}(FA, FA')$ to have a morphism $f$ in $\hom_{\mathbf{A}}(A,A')$ such that $Ff = g$. $g$ should have a “preimage” under the functor.

Definition (Isomorphism Dense)

A functor is isomorphism-dense if for any $\mathbf{B}-$object $B$ there exists some $\mathbf{A}-$object $A$ such that $FA \cong B$

Two objects $FA$ and $B$ in a category $\mathbf{B}$ are isomorphic if there exists a morphism between the two that is an isomorphism. It is worth clearly noting the definition of an isomorphism at it pertains to morphisms in a category:

Definition (Isomorphism for Morphisms)

A morphism $f: A \to B$ in a category $\mathbf{A}$ is an isomorphism if there exists a morphism $g: B \to A$ in $\text{Mor}(\mathbf{A})$ such that $f\circ g = id_{B}$ and $g\circ f = id_{A}$ where $id_{A}$ and $id_{B}$ are the identity morphisms for the objects $A$ and $B$.

Finally, we can combine a few of these to get the notion of an equivalence:

Definition (Equivalence}

A functor is an equivalence if it is full, faithful, and isomorphism-dense.

If we re-write the definition of an isomorphism (functor), then an isomorphism is a functor that is full, faithful, and bijective on objects, whereas an equivalence is a functor that is full, faithful, and isomorphism-dense. The only difference between these two is the notion of being bijective on objects v. being isomorphism-dense. An isomorphism is more restrictive than an equivalence in the sense that all  isomorphisms are equivalences, but we can exhibit equivalences that are not isomorphisms. When we talk about categories being either equivalent or isomorphic, these two words are not interchangeable. In the next sections, we’ll discuss this difference, and illustrate with an explicit example.

## Linear Algebra and Matrices

At this point, we’ll revisit briefly some standard results from linear algebra, focusing on real vector spaces. In most undergraduate linear algebra courses, the notion that a linear transformation can be represented in matrix form is simply stated (sometimes proven, but not often). In particular, those linear algebra courses designed for engineers may only focus on linear systems of equations (which are linear transformations) and omit much talk of linear transformations at all. I then consider it reasonable to revisit some of these ideas. For more details, please consult your favorite linear algebra text.

### Linear Spaces (Vector Spaces)

Remark: The terms linear space and vector space are interchangeable. Different texts will use different terms.

Definition (Linear Space)

A linear space over a field $\mathbb{F}$ is a set of elements (vectors) $V$ with two operations $+, \cdot$. The operation $+$ is called vector addition takes two elements $v, w \in V$ and returns another element $v+w \in V$. The operation $\cdot$ is scalar multiplication, and takes a scalar element $\alpha$ from the field $\mathbb{F}$ and multiplies it by an element $v \in V$ and returns a “scaled” element $\alpha v \in V$. We require closure under linear combinations. For any $\alpha_{1},\alpha_{2},\ldots,\alpha_{n} \in \mathbb{F}$ and $\mathbf{v}_{1}, \mathbf{v}_{2},\ldots,\mathbf{v}_{n} \in V$,
$$\sum_{i=1}^{n}\alpha_{i}\mathbf{v}_{i} \in V.$$ This is a consolidation of the ten axioms a linear space must satisfy. The expanded axioms are relegated to the appendix.

We’re going to focus on real vector spaces rather than this abstract definition. Real vector spaces have $\mathbb{R}$ as our field (so we multiply vectors by real numbers).

The vector spaces most often encountered in engineering problems are $\mathbb{R}^{n}$–the standard $n-$dimensional real vector spaces on real-valued vectors. (Also known as a standard Euclidean space.)

Now suppose we wish to move from one vector space $V$ to another $W$. There are many ways we can define such a move, but the one we’re interested in are linear transformations.

Definition (Linear Transformation)

If $V,W$ are linear spaces, then a function $T: V \to W$ is a linear transformation if

1. $T(x+y) = T(x) + T(y)$ for all $x,y \in V$. If we add two vectors, and then transform the result into the space $W$, it should yield the same vector in $W$ as if we transformed each of $x$ and $y$ into the space $W$, then added them together.
2. $T(\alpha x) = \alpha T(x)$ for all $\alpha \in \mathbb{F}$, $x \in V$. (We can “factor out” scalars.)

We can take linear transformations from $\mathbb{R}^{m} \to \mathbb{R}^{n}$ by transforming coordinates as in the following example:

$T:\mathbb{R}^{2} \to \mathbb{R}^{2}$ is given by $T(x) = x+2y$ and $T(y) = 3y$

From linear algebra, we know we may rewrite the transformation in vector form, since we live in “vector space” (in 2 dimensions here):

$$T\left(\begin{bmatrix}x\\y\end{bmatrix}\right) = \begin{bmatrix}x+2y\\3y\end{bmatrix}$$

Typically the next step in linear algebra is to say that we may represent the transformation $T$ in matrix form such that multiplication of our “source vector” by the matrix yields the result of the linear transformation. For our example, the matrix $A$ corresponding to the linear transformation $T$ is

$$A = \begin{bmatrix}1 & 2 \\ 0 & 3\end{bmatrix},$$ so that for $\mathbf{x} = \begin{bmatrix}x\\y\end{bmatrix}$ $$A\mathbf{x} = \begin{bmatrix}x+2y \\ 3y\end{bmatrix}$$

At this point, this equivalence between linear transformations and matrix representation (with operation matrix/vector multiplication) is simply stated and expected to be accepted. However, we’re going to use the notion of equivalences in category theory to establish this formally.

## Moving from Mat to Vect

As mentioned before, we’re going to focus only on finite dimensional vector spaces ($\mathbb{R}^{n}$ for various values of $n$ or other spaces of finite dimension ) and real-valued matrices to keep things simple. These concepts generalize much further, though. Since we’re operating on real-valued matrices, we’ll be using $\mathbf{Mat}_{\mathbb{R}}$5

To show the equivalence between$\mathbf{Vect}$ and $\mathbf{Mat}$, we must establish an equivalence functor between the two that allows us to map $\mathbf{Mat}$ to $\mathbf{Vect}$. We define the functor as follows:

$F:\mathbf{Mat}\to\mathbf{Vect}$ assigns objects $n$ in $\mathbf{Mat}$ the vector space $\mathbb{R}^{n}$. Thus, we simply assign a natural number $n$ in $\mathbf{Mat}$ to the corresponding $n$-dimensional real vector space in $\mathbf{Vect}$, which is an object in $\mathbf{Vect}$.

Functors also map morphisms in the source category ($\mathbf{Mat}$ in our case) to morphism in the destination category ($\mathbf{Vect}$ here). Morphisms in $\mathbf{Mat}$ are $m \times n$ matrices with real values. Morphisms in $\mathbf{Vect}$ are linear transformations. Thus, it’s natural to have our functor take an $m \times n$ matrix $A$ in$\mathbf{Mat}$ and assign to it the linear map $T$ in $\text{Mor}(\textbf{Vect})$ from $\mathbb{R}^{m} \to \mathbb{R}^{n}$ that takes an $m-$dimensional vector $\mathbf{x}=\begin{bmatrix}x_{1}\\\vdots \\x_{m}\end{bmatrix} \in \mathbb{R}^{m}$ maps it to the $n-$dimensional vector $\mathbf{y} = \begin{bmatrix}y_{1}\\\vdots\\y_{n}\end{bmatrix}^{T} \in \mathbb{R}^{n}$ given by $\mathbf{x}^{T}A$.6

We need to check that this functor satisfies the definition of both a functor and an equivalence. That $F$ is a functor follows from matrix multiplication on one hand, and from composition of linear maps on the other. To ensure it satisfies the definition of an equivalence, we must check to see that the functor is full, faithful, and isomorphism-dense.

To show the functor is faithful, take two linear transformations $T$ and $T'$ in $\mathbf{Vect}$ such that $T = T'$. In other words, these two transformations do the exact same thing to input vectors. It’s clear there can if we had two different matrices $A$ and $A'$ such that multiplication by $\mathbf{x}$ yields the exact same vector $\mathbf{y}$, then $A = A'$. Thus the functor is indeed faithful. To show the functor is full, we need to show that for any linear transformation $T \in \text{Mor}(\mathbf{Vect})$, there is a matrix $A \in \text{Mor}(\mathbf{Mat})$ such that $F(A) = T$. A linear transformation by definition (see above) results in a component $x_{i} \in \mathbb{R}^{m}$ being taken to a linear combination of components $x_{j} \in \mathbb{R}^{n}$. Refer back to the example above, where we mapped from $\mathbb{R}^{2}$ to $\mathbb{R}^{2}$.

$T\left(\begin{bmatrix}x\\y\end{bmatrix}\right) = \begin{bmatrix}x+2y\\3y\end{bmatrix}$

Since multiplication of matrices results in components which are linear combinations of the source vector by definition of matrix multiplication, there is certainly a matrix $A \in \text{Mor}(\mathbf{Mat})$ that will map to $T \in \text{Mor}(\mathbf{Vect})$ for any linear transformation $T$. Thus, $F$ is certainly a full functor.

It remains to show that $F$ is isomorphism-dense. To do so, we’re going to attempt to create its “inverse” functor $G : \mathbf{Vect} \to \mathbf{Mat}$. It is at this point we’ll really understand the difference between being isomorphism-dense and being a true isomorphism.

## Creating the Inverse Functor

We’re going to create a functor $G$ that maps $\mathbf{Vect}$ to $\mathbf{Mat}$. For this, we’ll define it as the “natural” inverse. $G$ will take vector spaces in $\mathbf{Vect}$ to their dimension $n \in \mathbb{N}$, which is an object in $\mathbf{Mat}$. We’ll also take $T \in \text{Mor}(\mathbf{Vect})$ to its corresponding matrix $A \in \text{Mor}(\mathbf{Mat})$, simply “reversing” the direction.

We can show in an identical manner to what we did for $F$ that the functor $G$ is full. Isomorphisms in $\mathbf{Vect}$ are invertible linear maps between vector spaces. Since the objects of $\mathbf{Mat}$ are natural numbers, we cannot have isomorphisms between different objects. That is, $m$ cannot be isomorphic to $n$ inside $\mathbf{Mat}$ unless $m=n$.

We now illustrate the key difference between isomorphism and isomorphism-dense functors. Take the space of polynomials of order $m$ with real coefficients. This is indeed a real vector space, and belongs to $\mathbf{Vect}$. This polynomial space is isomorphic to $\mathbb{R}^{m}$ inside of $\mathbf{Vect}$, meaning that there is an invertible linear map $K$ from the polynomial space to $\mathbb{R}^{m}$. Recall that functors not only map objects to objects, but they also map morphisms to morphisms. Since all the functor $F$ originating $\mathbf{Mat}$ cares about is the dimension of the space, the isomorphism $K$ between the polynomial space and $\mathbb{R}^{m}$ is “invisible” in the category $\mathbf{Mat}$, because $\mathbf{Mat}$ sees the polynomial space and $\mathbb{R}^{m}$ just as “some” $m-$dimensional spaces. Thus the morphism $K \in \text{Mor}(\textbf{Vect})$ has no morphism in $\text{Mor}(\textbf{Mat})$ that will map to it.

The functor $G$ is not faithful because $T$ and $L$ will map to the same matrix $A_{m\times n}$ in $\text{Mor}(\textbf{Mat})$

Therefore, the two categories $\mathbf{Vect}$ and $\mathbf{Mat}$ are equivalent, but not isomorphic, because $F$ is only isomorphism-dense, not fully isomorphic.

## Conclusion

We exhibited an explicit example involving the category $\mathbf{Mat}$ of matrices over $\mathbb{R}$ and the category $\mathbf{Vect}$ of finite dimensional real vector spaces to illustrate the difference between functors that are isomorphisms and functors that are equivalences. The key difference is the notion of isomorphism-dense, which means that an object in the target category of a functor may not have a “preimage” in the source space, but is isomorphic inside the target category to an object which does have a “preimage” in the source space.

## Appendix

For a linear space $V$ over a field $\mathbb{F}$, the following axioms regarding the two operations must be satisfied:

1. For $x,y \in V$, $x+y \in V$. That is, if we perform the addition operation on two elements of $V$, the result can’t be outside the set $V$.
2. For each $\alpha \in \mathbb{F}$ and $v \in V$, $\alpha v \in V$. Multiplying an element by a scalar shouldn’t “knock” us outside the set $V$.
3. For $x,y \in V$, $x+y = y+x$. (Commutativity)
4. For $x,y,z \in V$, $x+(y+z) = (x+y)+z$ (Associativity)
5. There must be some element in $V$, denoted $0_{V}$ that is an \textit{additive identity}. That is, for any $x \in V$, $x + 0_{V} = 0_{V} + x = x$.
6. For every $x \in V$, the element $(-1)x$ has the property that $x + (-1)x = 0_{V}$. (Existence of “negatives”, or additive inverses)
7. For $\alpha, \beta \in \mathbb{F}$, $v \in V$, $\alpha(\beta v) = (\alpha\beta)v$. (Associativity of multiplication)
8. For all $x,y \in V$, and for all $\alpha \in \mathbb{F}$, $\alpha(x+y) = \alpha x + \alpha y$ (Distributivity of vector addition)
9. For all $\alpha, \beta \in \mathbb{F}$, $v \in V$, $(\alpha + \beta)v = \alpha v + \beta v$.7
10. For every $v \in V$, we have that $1v = v$, where $1$ is the multiplicative identity in $\mathbb{F}$.

## Acknowledgements

We’d like to gratefully acknowledge Matt Kukla and Jason Hathcock for their insights, review, and meaningful discussion.

Using Boolean Algebra to Find all Maximal Independent Sets in a Graph

## Using Boolean Algebra to Find all Maximal Independent Sets in a Graph

Graph theory may be one of the most widely applicable topics I’ve seen in mathematics. It’s used in chemistry, coding theory, operations research, electrical and network engineering, and so many other places. The subject is mainly credited to have begun with the famous  Seven Bridges of Königsberg problem posed by Leonard Euler in 1736. Frank Harary should also be credited with his massive work in bringing applications of graph theory to the sciences and engineering with his famous textbook written in 1969.

My own research forced me to stumble into this area once my  research partner, Jason Hathcock, suggested we explore the idea of viewing dependency relations in the sequences of variables we were studying as digraphs. Since then, I’ve been buried in graph theory texts, finding a wealth of fascinating topics to explore.

Of this article’s particular interest is finding all maximally independent sets in a graph using Boolean algebra.

## What’s a maximally independent set?

Firstly, what’s an independent set?

Definition (Independent Set): A set of vertices of a graph is independent if no two vertices in the set are adjacent.

If we take a look at the digraph above (from our paper on vertical dependence), and look at the underlying graph1, $\{1,6,11\}$ form an independent set, as an example. There are lots more, and of varying sizes. Of particular interest here are maximal independent sets.

Definition:(Maximal Independent Set): An independent set to which no other vertex in the graph can be added to retain the independence property

An example from the graph above is $\{2,3,4,5,13\}$. If we added any other vertex to that set, it would be adjacent to some vertex already in there.

A few notes:

(1) There are many maximal independent sets in a graph, and they may not all have the same cardinality.

(2) Maximal and maximum are not the same thing. An independent set may be a maximal independent set without being the largest independent set in the graph. The largest cardinality among all the maximal independent sets is called the independence number of the graph and is denoted $\beta(G)$.

## Why do we care about maximal independent sets?

Of the many applications that arise, one in particular is in coding theory. We want to find the largest error correcting codes we can, particularly in internet transmissions that can lose packets. A paper discussing this can be found here. (Paywall warning). We’ve discussed some basics of coding theory on this site as well. Finding error correcting codes with desirable properties is equivalent to solving the problem of finding maximal independent sets. The purpose of this article isn’t to discuss the applications here, but I’ve learned long ago that no one will keep reading unless I mention at least one application.

## Finding a maximal independent set

Finding a maximal independent set is relatively simple. Start with any vertex $v \in V(G)$. Add another vertex $u$ that is not adjacent to $v$. Continue adding vertices that are not adjacent to any already in the set. For a finite graph2, this process will terminate and the result will be a maximally independent set.

Will it be one of largest cardinality? Not necessarily.

For example, using one more of our dependency graphs generated by $\alpha(n) = \sqrt{n}$, we can take the order to be 24 as shown, and build a maximal independent set starting with vertex 3. Note that none of vertices 9-15 or 1 can be in the set, since they’re all adjacent to vertex 3. Vertex 2 is not adjacent to vertex 3, so we add it into our set: $V = \{2,3\}$. Now, the next vertex we add can’t be adjacent to either 2 or 3, so that rules out 1, 9-15, and 4-8. Grab vertex 16. Now $V = \{2,3,16\}$. Notice that none of the remaining vertices are adjacent to any of the previous vertices. Continuing this process, we’ll get that $V = \{2,3,16,17,18,19,20,21,22,23,24\}$. Notice that if we add any other vertices to this set, they’ll be adjacent to something already in it.

## Finding all Maximal Independent Sets

We’re rarely interested in just finding one maximal independent set. We’d prefer to find them all, and doing it by inspection is not very palatable. The heart of the article is an admittedly not optimal but still interesting way to find all maximal independent sets for reasonably small graphs.

We’ll illustrate the method on the 6-node graph above.

### Getting started

First, we’ll assign a Boolean variable to each vertex according to its inclusion in a maximal independent set. For example $A = 1$ implies $A$ is in the maximal independent set. Recall from Boolean algebra that

$$x+y = \left\{\begin{array}{lr}1, & x = 1 \text{ or } y = 1 \text{ or } (x=y=1)\\0,&x=0 \text{ and } y=0\end{array}\right.$$

Remark: $x+y$ is just another way of writing a union. This isn’t addition mod 2 here.

$$xy=\left\{\begin{array}{lr}1, & x = 1 =y\\0,&\text{ otherwise}\end{array}\right.$$

What we’ve done here is set up inclusion into our maximal independent sets in a Boolean fashion. So $x+y = 1$ corresponds to the inclusion of either vertex $x$ OR vertex $y$ OR both vertices $x$ and $y$. Similarly, $xy = 1$ corresponds to the inclusion of both vertices $x$ and $y$.

Now, we can express an edge of a graph as a Boolean product $xy$, where $x$ and $y$ are the vertices at either end of the edge.

Finally, set up the sum of all edges and call it $\phi$:

$$\phi = \sum xy \text{ for all } (x,y) \in G$$

For our graph above,

$$\phi = AB + AD + AE + BC + CE + CF + DE + EF$$

### Why did we do this?

For a vertex to be in an independent set, it can’t be adjacent to any other vertices in the set. Put another way, for each edge, we can only have at most one of the vertices that make it up. If we include $A$ in the independent set $V$, then $B$ cannot be in there.

Returning to our $\phi$, note that its value under Boolean algebra can only be 0 or 1. If $\phi = 1$, then at least one edge has both of its vertices “on”. This means, only combinations of $A, B, C, D, E, F$ that yield $\phi = 0$ will give us a maximally independent set.

### Solving the problem

Our goal now is to find all combinations of our Boolean vertex variables that yield $\phi = 0$. As it turns out, solving this directly is pretty annoying3. If we want $\phi = 0$, that’s logically equivalent to seeking $\phi^{c} = 1$, where $\phi^{c}$ is the Boolean complement (or negation) of $\phi$

Recall from Boolean algebra the following4:

\begin{aligned}(xy)^{c}&=x^{c}+y^{c}\\(x+y)^{c} &= x^{c}y^{c}\end{aligned}

So, if we take $\phi^{c}$ for our graph above,

\begin{aligned}\phi^{c}&=(A^{c}+B^{c})(A^{c}+D^{c})(A^{c}+E^{c})(B^{c}+C^{c})(C^{c}+E^{c})\\&\quad(C^{c}+F^{c})(D^{c}+E^{c})(E^{c}+F^{c})\end{aligned}

What does the negation here actually mean? By taking the complement, instead of finding vertices to include, now we’re finding vertices to excludeWhen we multiply this expression out, we’ll get a sum of terms, where each term is a product of complements of our original Boolean variables. To get $\phi^{c} = 1$, all we need is one of those terms to be 1. To get a term to be 1, all members of the product must themselves be 1, meaning each term gives us a set of variables to exclude. Excluding these variables gives us one maximally independent set for each term, so this gives us all the maximally independent sets.

The nice thing about dealing with Boolean arithmetic is that we can program a computer to do this for us. Any time we can invoke a relationship with Boolean algebra, we can enlist a friendly helpful computer.

### Finishing the example

We’ll do it by hand here, because I’m old-school like that. For larger graphs, obviously one would want to enlist some computational help, or just be very patient. We’ll remember a few other rules for Boolean algebra before we finish5:

\begin{aligned}xx &=x\\x+x &=x\\x +xy&=x\end{aligned}

After an insane amount of tedious Boolean algebra,

$$\phi^{c} = A^{c}C^{c}E^{c}+A^{c}B^{c}E^{c}F^{c}+A^{c}C^{c}D^{c}F^{c}+B^{c}C^{c}D^{c}E^{c}+B^{c}D^{c}E^{c}F^{c}$$

Recall that each term now tell us which sets of vertices to exclude from a maximal independent set. We negated the question logically. That means we have 5 maximal independent sets:

$$\{B,D,F\}, \{C,D\}, \{B,E\}, \{A,F\}, \{A,C\}$$

We can actually say what the independence number is as well, since we just have to find the maximum cardinality among the sets listed. For this graph, $\beta(G) = 3$.

## Conclusion

I happened to find this interesting, and ended up obsessed with it for a day, much to the chagrin of my daily planner, which expected me to be working on writing my research monograph. I tried several different ways of solving this beyond the one given. I tried using the direct equation $\phi$, and tried using regular arithmetic on just $\{0,1\}$, setting up a sort-of structure function similar to the reliability block diagrams detailed here.

I always hesitate to blame the method, and rather my own arithmetic errors, but I didn’t have much luck with the structure-function method, though I may try again to see if it’s an equivalent method. I believe it should be.

Looking at $\phi^{c}$ makes more sense after playing with this problem for some hours. The sum/union is quite nice, because it neatly separates out the various sets to exclude. It’s a better exploitation of Boolean algebra than trying to work with $\phi$ but aiming for a sum of 0. I still think it should be possible to work with it directly, even if not advisable. If I decide to torture myself with it further, and end up with something to write about, perhaps I’ll append here.

I always end up ending my articles with some takeaway. I don’t have much of one here, except it was a curiosity worth sharing. Perhaps a decent takeaway is to reveal a bit of the tedium and dead-ends mathematicians can run into when exploring something. That’s just part of research and understanding. It’s entirely possible to spend hours, days, weeks on something and all you conclude is that the original method you saw is definitely superior than the one you were trying to develop.

Cayley’s Theorem: Powerful Permutations

## Cayley’s Theorem: Powerful Permutations

We’ve discussed before how powerful isomorphisms can be, when we find them. Finding isomorphisms “from scratch” can be quite a challenge. Thankfully, Arthur Cayley proved one of the classic theorems of modern algebra that can help make our lives a bit easier. We’ll explore this theorem and note the power of groups of permutations

Isomorphisms: Making Mathematics More Convenient

## Isomorphisms: Making Mathematics More Convenient

Much of pure mathematics exists to simplify our world, even if it means entering an abstract realm (or creating one) to do it. The isomorphism is one of the most powerful tools for discovering structural similarities (or that two groups are identical structurally) between two groups that on the surface look completely unrelated. In this post, we’ll look at what an isomorphism is.

All the Same Opposites

## All the Same Opposites

###### Editor’s note: see this appendix for supporting proofs.

Fields are among the most convenient algebraic structures, preserving much of the arithmetic we know and love from familiar fields like the rationals $\mathbb{Q}$ and the real numbers $\mathbb{R}$.

Now, it is unnecessary that a set possess infinitely many elements to possibly constitute a field (under the right binary operations). Indeed, Dr. Rachel Traylor recently invited readers to finite field theory by way of GF(4), the field with four elements. In this post, I propose to discuss the simplest of all possible fields, namely $\text{GF}(2)$.

## What is $\text{GF}(2)$?

As Dr. Traylor explains in her post on GF(4), a field is built via a nonempty set $\mathbb{F}$ and two binary operations, typically denoted $+$ and $\cdot$ when no further specifying is needed.  Speaking abstractly, $\text{GF}(2)$ is the case when $\mathbb{F}$ is taken to be any set of two elements, say, $\{a,b\}$, satisfying the operation tables below1.

 $+$ $a$ $b$ $a$ $a$ $b$ $b$ $b$ $a$

 $\cdot$ $a$ $b$ $a$ $a$ $a$ $b$ $a$ $b$

## So, what makes $\text{GF}(2)$ simplest?

Briefly put, there is no field with fewer elements than has $\text{GF}(2)$. Why is this so? The operations $+$ and $\cdot$ each require an identity element, and these must be distinct. As a result, any field must contain at least two elements. And, as it happens, those are enough to define a fully-fledged field.

As it is the most basic of fields, it might be expected that $\text{GF}(2)$ is only trivially interesting, or only appears in coverage of the subject on the theoretical front. (No judgment here; sometimes the simplest examples of structures are illustrative of very little. See the trivial topology or trivial algebra of sets on a set $X$.) However, $\text{GF}(2)$ is the mathematical representation of many dualities we encounter on a daily basis. We will take a look at some of the prominent cases.

## Even & Odd

Let’s denote by ${\bf e}$ and ${\bf o}$ arbitrarily chosen even and odd integers, respectively. Truly, in this way, we are defining two equivalence classes on the set of integers, $\mathbb{Z}$, by way of modulo 2 addition.

Reminder, we say an integer $m$ is even to mean there exists an integer $k$ such that $m=2k$, and $m$ is odd to mean there exists an integer $j$ such that $m=2j+1$. Collect these in a set: $\{{\bf e},{\bf o}\}$, and consider this set under ordinary addition $+$ and multiplication $\cdot$ .

These results are summarized in tables.  For instance, in the $+$ table below, we can think of ${\bf e}+{\bf o}$ as

$$(2k)+(2j+1)=2(k+j)+1\equiv{\bf o},$$

since $k+j$ is also an integer.

 $+$ ${\bf e}$ ${\bf o}$ ${\bf e}$ ${\bf e}$ ${\bf o}$ ${\bf o}$ ${\bf o}$ ${\bf e}$

 $\cdot$ ${\bf e}$ ${\bf o}$ ${\bf e}$ ${\bf e}$ ${\bf e}$ ${\bf o}$ ${\bf e}$ ${\bf o}$

From these tables, ${\bf e}$ serves as additive identity for the field, and ${\bf o}$ the multiplicative identity.

## Binary

Even more readily encountered than even and odd is binary arithmetic. We have a series of posts here on the theory of coding, and all of it rests on $\{0,1\}$ when taken with the operations of $+_2$ (addition modulo 2) and $\cdot$ .

 $+_2$ $0$ $1$ $0$ $0$ $1$ $1$ $1$ $0$

 $\cdot$ $0$ $1$ $0$ $0$ $0$ $1$ $0$ $1$

The similarities to the tables for $(\{{\bf e},{\bf o}\},+,\cdot)$ are evident.  With binary, $0$ is the additive (modulo 2) identity, and $1$ is the multiplicative identity.  This seems to follow naturally.  After all, $0$ is an even integer, and $1$ is an odd integer, so these elements are easily believed to work alike.  With that in mind, I move to an example with a less immediate connection.

## Truth & Falsehood

Now I want to consider the set $\{{\bf T},{\bf F}\}$ of truth values2 true and false, respectively.

It is worthwhile to stop and think on which operations make sense for this set. Two are needed to construct a field. Just as even and odd integers may be combined by adding and multiplying, mathematical statements may be combined via disjunction (“or,” $\vee$), conjunction (“and,” $\wedge$), and implication (“if, then,” $\Rightarrow$).  For this case, I am interested in the special “exclusive or,” also called XOR3, denoted by $\oplus$, and conjunction.

 $\oplus$ ${\bf F}$ ${\bf T}$ ${\bf F}$ ${\bf F}$ ${\bf T}$ ${\bf T}$ ${\bf T}$ ${\bf F}$

 $\wedge$ ${\bf F}$ ${\bf T}$ ${\bf F}$ ${\bf F}$ ${\bf F}$ ${\bf T}$ ${\bf F}$ ${\bf T}$

## Opposite… in Precisely the Same Way

The only thing setting these examples apart is an exchange of symbols. Truly,

$a$, ${\bf e}$, $0$, and ${\bf F}$,

are interchangeable, just as are

$b$, ${\bf o}$, $1$, and ${\bf T}$.

What matters is that these individual elements behave in exactly the same way with respect to their operations.  In the language of algebra, it is said

$(\{a,b\},+,\cdot)$

$(\{{\bf e},{\bf o}\},+,\cdot)$

$(\{0,1\},+_2,\cdot)$, and

$(\{{\bf F},{\bf T}\},\oplus,\wedge)$
are isomorphic, that is, structurally equivalent.

Definition (Isomorphism of fields). We say two fields $(\mathbb{F},+,\times)$ and $(\mathbb{H},\boxplus,\boxtimes)$ are isomorphic to mean there exists a function $\varphi\mathrel{:}\mathbb{F}\to\mathbb{H}$ such that $\varphi$ is one-to-one, onto, and, for all $x,y\in\mathbb{F}$,

$$\varphi(x+y)=\varphi(x)\boxplus\varphi(y);\quad\varphi(x\times y)=\varphi(x)\boxtimes\varphi(y).$$

In the case of $\text{GF}(2)$, proving isomorphism amounts to simply defining the function that swaps the symbols as needed.  For example, to show $(\{{\bf e},{\bf o}\},+,\cdot)$ and $(\{0,1\},+_2,\cdot)$ are isomorphic, define4$\varphi\mathrel{:}\{{\bf e},{\bf o}\}\to\{0,1\}$ by putting $\varphi({\bf e})=0$ and $\varphi({\bf o})=1$.

## Concluding Remarks

Mathematics in many respects is the art5 of extension and generalization. The goal is frequently to take an existing object, assume an aerial viewpoint to learn its structure and what makes it work. (This is often carried out by seeing how few pieces of the whole are really needed to accomplish those key characteristics.)

With the right perspective, even these sets of opposites can bear likeness.  I take comfort in that.

Mailbox Answers: Calculating New Parity After an Overwrite

## Mailbox Answers: Calculating New Parity After an Overwrite

I recently did some work for Mr. Howard Marks, an independent analyst and founder of Deep Storage on the subject of data protection and data loss. He e-mailed me with a question regarding calculating the new parity for a stripe of data on a storage system.

Let us consider the case of a $10+1$ RAID 5 set with a strip size of 64KB. When an application performs a 4KB write the system must:

1. Read the 64KB strip that contains the 4KB to be overwritten into a memory buffer
2. Modify the memory buffer with the data to be written
3. Read however much other data as would be needed to recalculate the parity strip for this stripe
4. Write the new data and new parity strip (Min 1 4KB write, 1 64KB write)

When we casually talk about this condition we say the RAID controller would need to read all 10 data strips in the stripe so it can XOR all ten together as part of step 4. I, however have been thinking about XOR and think that rather than requiring N+1 total I/Os I can get it down to three.

If $P$, the parity strip, already contains

$D_1 \oplus D_2 \oplus D_3 \oplus D_4 \oplus D_5 \oplus D_6 \oplus D_7 \oplus D_8 \oplus D_9 \oplus D_{10}$

and we’re overwriting part of $D_4$ couldn’t I [do the following]:

1. Read the existing $D_4$ into a variable $D'_4$.
2. Modify the data into $D_4$.
3. Calculate the changes as $D_4\oplus D_{4}'$ into variable $C$
4. Read the parity strip $P$
5. Calculate new parity strip as $P=P \oplus C$

In short, the answer is yes. We’re going to prove that here, because I think this is a great exercise to really show off the power of XOR. We’ve explored the operation here and began our discussion of coding theory by looking at maximum likelihood decoding. Let’s take a brief review of the XOR (or mod 2) operation:

## XOR is just modulo 2 addition

Let’s call generic binary words $D_{j}$. That is, each $D_{j}$ is simply a string of 1s and 0s of whatever length $l$ we feel like setting. So a binary word $D_{j} = d_{1}^{(j)}d_{2}^{(j)}\ldots d_{l}^{(j)}$ consists of binary bits given by the lowercase1 $d_{i}$.  XOR operation works bit-by-bit, and will be denoted by $\oplus$:

\begin{aligned}D_{j} \oplus D_{k} &= (d_{1}^{(j)}d_{2}^{(j)}\ldots d_{l}^{(j)})\oplus d_{1}^{(k)}d_{2}^{(k)}\ldots d_{l}^{(k)}\\&= (d_{1}^{(j)} \oplus d_{1}^{(k)})(d_{2}^{(j)}\oplus d_{2}^{(k)})\ldots (d_{l}^{(j)}\oplus d_{l}^{(k)})\end{aligned}

For a quick numerical example, suppose $D_{j} = 1011$ and $D_{k} = 0010$. Then

$$D_{j} \oplus D_{k} = 1011 \oplus 0010 = (1\oplus 0)(0\oplus 0)(1\oplus 1)(1\oplus 0)$$

Remember, too, that XOR is addition modulo 2, so we add the bits together, then divide by 2 and take the remainder. So, in particular, $1\oplus 1 = 0$ because 1+1 leaves a remainder of 0 when divided by 2. So,

$$D_{j} \oplus D_{k} = 1001$$

## Back to the question

Mr. Marks’ question can be stated mathematically in the following way (and I’m going to generalize it to any finite amount of words of any length XORed together, because that’s what mathematicians do):

Suppose $P = D_{1} \oplus D_{2} \oplus \ldots \oplus D_{j} \oplus \ldots \oplus D_{K}$ for some $K$, and one word (say $D_{j}$) is modified to give $D_{j}'$. Let $C$ be the binary word that represents the changes between $D_{j}$ and $D_{j}'$. That is,

$C = D_{j} \oplus D_{j}'$

(Note: XOR as an operation to identify differences in binary words is one of the more elegant features. If all the bits in two words are the same, then bitwise XORing would always give the binary word of all 0s. Only when two bits are different is their XOR result a 1.) Call $P'$ the new XOR sum with $D_{j}'$ substituted for $D_{j}$. So

$P' := D_{1}\oplus D_{2}\oplus \ldots \oplus D_{j}'\oplus \ldots \oplus D_{K}.$

Then does $P'= P \oplus C$?

### Numerical example

Whenever I’m seeking to prove a statement, I always “play” with an example. Now, simply finding an example that fits the statement doesn’t constitute proof. But playing with explicit numbers can often yield a way to prove the statement in general. Plus, we can deepen our understanding by really “seeing the theorem in action,” as opposed to just manipulating symbols via logic.

Let’s just test this with a sum of 3 words to make our lives easier. Let $D_{1} = 110$, $D_{2} = 010$, and $D_{3} = 101$. Then

$$P = D_{1} \oplus D_{2} \oplus D_{3} = 110 \oplus 010 \oplus 101 = 001$$

Now suppose we change $D_{2}$ to $D_{2}' = 101$. First, the new sum $P'$ is given by

$P' = 110 \oplus 101 \oplus 101 = 110$

Now, the change in $D_{2}$ and $D_{2}'$ is given by

$$C = 010 \oplus 101 = 111$$

Notice that all three positions changed. Each position that is different has a 1.Let’s see if $P \oplus C = P'$

$P \oplus C = 001 \oplus 111 = 110$

Success! Now, this doesn’t mean we’re done. One example doesn’t constitute proof. We have to show this is true for any finite number of binary words of any  length.

### Time to prove this statement is true

So, let $D_{1},...,D_{K}$ be binary words of generic length $l$. Choose one word $D_{j}$ and modify it to form the new word $D_{j}'$. Let $C = D_{j} \oplus D_{j}'$ denote the change vector. Then

\begin{aligned}P^{\prime}&=D_{1}\oplus D_{2}\oplus\ldots D^{\prime}_{j}\oplus\ldots D_{K}\end{aligned}

Now, let’s note that $C = D_{j}\oplus D^{\prime}_{j}$ tells us which positions changed between the two. Another way to look at it is that $C$ is the word you need to XOR with $D_{j}$ to get to the new $D^{\prime}_{j}$. That is, $D^{\prime}_{j} = D_{j} \oplus C$.2. Now, let’s plug in the new expression for $D_{j}'$ into $P'$:

\begin{aligned}P^{\prime}&=D_{1}\oplus D_{2}\oplus\ldots D^{\prime}_{j}\oplus\ldots D_{K}\\&=D_{1}\oplus D_{2}\oplus\ldots (D_{j} \oplus C)\oplus\ldots D_{K}\end{aligned}

Now, we know from this post that XOR is a commutative operation. Coupled with the associative property3, we can actually rearrange the order of the XORing to put $C$ last.

(You’ve done this with regular arithmetic all the time. $5 + 1 + 6 + 3$ can be rearranged and grouped into $(6+3+1) + 5 = 10 +5 = 15$. Commutativity and associativity combined allow this to happen with any operation.)

So,

\begin{aligned}P^{\prime}&=D_{1}\oplus D_{2}\oplus\ldots D^{\prime}_{j}\oplus\ldots D_{K}\\&=D_{1}\oplus D_{2}\oplus\ldots (D_{j} \oplus C)\oplus\ldots D_{K}\\&=(D_{1}\oplus D_{2}\oplus\ldots D_{j}\oplus\ldots D_{K})\oplus C\end{aligned}

But wait, that last thing in parenthesis is exactly $P$. Therefore,

\begin{aligned}P^{\prime}&=D_{1}\oplus D_{2}\oplus\ldots D^{\prime}_{j}\oplus\ldots D_{K}\\&=D_{1}\oplus D_{2}\oplus\ldots (D_{j} \oplus C)\oplus\ldots D_{K}\\&=(D_{1}\oplus D_{2}\oplus\ldots D_{j}\oplus\ldots D_{K})\oplus C\\&= P \oplus C\end{aligned}

Since we showed this for any generic number of binary words added together, and allowed to binary words to be any length, we’ve proven the statement.

## Bonus: Multiple modifications

What if we modified more than one word in our original sum $P$? It’s a pretty simple extension to run through the proof again with multiple modified words and show that if we have multiple $C^{\prime}$s, one for each modified word, we can perform the same substitution and show that the new $P^{\prime}$ is simply the old $P$ XORed with all of the change vectors. Alternatively, you could XOR all the change vectors first into one big change vector, then XOR that with your original $P$ to compute the new $P^{\prime}$. If you want to verify it for yourself formally, simply follow the same steps we did above for one modified word. You’ll just be performing the same type of substitution multiple times to account for each modification.

## Conclusion

Mr. Marks brought this up because he was seeking a way to compute the new parity strip in a more efficient way (with fewer arithmetic steps) than simply following the definition. You can absolutely “brute force” your way to calculating the new parity strip. Companies and startups are always concerned about “scalability”. Sure, you won’t notice the time different between 10 extra things added together. But what about 10 million? 1 billion? More than that? None of those numbers are infeasible for the amount of calculations we perform on data now. In those cases, the brute force method of simply using the definition starts to cause performance problems. It was worth taking the time to “be clever” and search for a nice, elegant way that cuts down the number of operations necessary to calculate a new parity strip. It took a little upfront work, but the result speaks loudly for itself.

Welcome to GF(4)

## Welcome to GF(4)

Everyone has solved some version of a linear system in either high school or college mathematics. If you’ve been keeping up with some of my other posts on algebra, you know that I’m about to either take something familiar away, or twist it into a different form. This time is no different; we’re going to change the field we operate over, and solve a basic linear system in a Galois field called GF(4).

A Partition by any Other Name

## A Partition by any Other Name

I promise I’m actually a probability theorist, despite many of my posts being algebraic in nature. Algebra, as we’ve seen in several other posts, elegantly generalizes many things in basic arithmetic, leading to highly lucrative applications in coding theory and data protection.  Some definitions in mathematics may not have obvious “practical use”, but turn out to yield theorems and results so powerful we can use them to send image data cleanly from space.

Reduce the Problem: Permutations and Modulo Arithmetic

## Reduce the Problem: Permutations and Modulo Arithmetic

We’ve all seen permutations before. If you have ten distinct items, and rearrange them on a shelf, you’ve just performed a permutation. A permutation is actually a function that is performing the arrangement on a set of labeled objects. For simplicity, we can just number the objects and work with permuting the numbers.