Browsed by
Author: Rachel Traylor

Applications of Reflections: Taking a Group to its “Abelian” Form

Applications of Reflections: Taking a Group to its “Abelian” Form

In continuing the exploration of explicit applications and examples of category-theoretic concepts, we highlight the versatility of reflections and reflective subcategories. This concept can be used to perform all kinds of desired actions on a category to yield a subcategory that “nicer” in some way. This article explores how we can use reflections to make groups abelian.

Please see the previous article by Traylor and Fadeev for the definition of a category and other properties.

Definition (Subcategory):

A category \mathbf{A} is a subcategory of a category \mathbf{B} if 
(1) \mathrm{Ob}(\mathbf{A}) \subseteq \mathrm{Ob}(\mathbf{B})
(2) for each A, A' \in \mathrm{Ob}(\mathbf{A}), \mathrm{hom}_{\mathbf{A}}(A,A') \subseteq \mathrm{hom}_{\mathbf{B}}(A,A')
(3) for each A \in \mathrm{Ob}(\mathbf{A}), the \mathbf{B}-identity on A is the same as the \mathbf{A}-identity on A.
(4) composition in \mathbf{A} is the restriction of composition in \mathbf{B} to the morphisms of \mathbf{A}

Adámek et al, Abstract and Concrete Categories(1990)

Point by point, we’ll pick the definition apart. The first part is pretty clear. The collection of objects in a subcategory is contained in the collection of objects in its “parent”. The second criterion says that the set of morphisms from one object A to another object A' inside the littler category \mathbf{A} should be a subset of all the morphisms from the same A to the same A', but inside \mathbf{B}. That is, there are morphisms from A \to A' in \mathbf{B} that won’t live in the subcategory \mathbf{A}.

The third criterion just states that the identity morphisms on objects should match in both categories. The final criterion tells us that composition inside the subcategory \mathbf{A} only “works” on the morphisms inside \mathbf{A}, but is otherwise the same composition as in \mathbf{B}. We just only perform compositions on morphisms in \mathbf{A} when we’re in \mathbf{A}.

We now define a reflection.

Definition (A- reflection)

Let \mathbf{A} be a subcategory of \mathbf{B}, and let B \in \mathrm{Ob}(\mathbf{B}).

(1) An \mathbf{A}- reflection for B is a morphism B \xrightarrow{r} A from B to A \in \mathrm{Ob}(\mathbf{A}) such that for any morphism B \xrightarrow{f} A' from B to A' \in \mathrm{Ob}(\mathbf{A}) there exists a unique f': A \to A' such that f = f' \circ r.
(2) \mathbf{A} is a reflective subcategory for \mathbf{B} if each \mathbf{B}- object has an \mathbf{A}-reflection. 

Currently, this definition seems a bit abstract. The following sections will illustrate concrete examples of reflections to understand this definition better.

Taking a Group to Its Abelian Form

The Category \mathbf{Grp}

For this example, we’ll be working with the category of groups, \mathbf{Grp}. The objects in this category are groups, and the morphisms are group homomorphisms. Some elements of this category are:

  • (\mathbb{R}, +) \xrightarrow{\phi} (\mathbb{R}^{+}, \cdot), \phi(x) = e^{x}. The real numbers under addition and the positive real numbers under multiplication are both groups, so they’re objects in this category. \phi here is a group homomorphism1, so it’s a morphism in \mathbf{Grp}.
  • The dihedral group D_{6}, and the permutation group S_{3} are also objects in this class. Recall that the dihedral group D_{6} is commonly visualized using the symmetries of an equilateral triangle2, but is commonly given using the following presentation: D_{6} = \langle r,s : r^{3} = s^{2} = 1, sr = r^{-1}s\rangle Here, we see that D_{6} is generated by two elements r and s (r is the rotation by 2\pi/3, and s is one of the reflection lines). S_{3} is the set of permutations on 3 elements– all permutations of the integers \{1,2,3\}. Both of these are groups, and both have 6 elements. If we define \phi: \begin{cases} r \to (123) \\ s \to (12)\end{cases} Then \phi is a group homomorphism that takes D_{6} to S_{3}, and is also thus a morphism in \mathbf{Grp}
  • As one last example, let \mathrm{GL}_{2}(\mathbb{R}) be the general linear group of degree 2. This is the group of invertible 2\times 2 matrices with real entries. Then we can create a group homomorphism \phi: D_{6} \to \mathrm{GL}_{2}(\mathbb{R}) by letting \phi(r) = \begin{bmatrix}\cos(\theta) & -\sin(\theta)\\ \sin(\theta) & \cos(\theta)\end{bmatrix} \qquad \phi(s) = \begin{bmatrix} 0 & 1\\ 1 & 0 \end{bmatrix} so this \phi is also a morphism in \mathbf{Grp}.

The Category \mathbf{Ab}

A subcategory of \mathbf{Grp} is \mathbf{Ab}, the category of abelian groups. Morphisms are again group homomorphisms, but this time we are only looking at morphisms between abelian groups. Some examples:

(1) (\mathbb{R}, +) \xrightarrow{\phi} (\mathbb{R}^{+}, \cdot), \phi(x) = e^{x}. The real numbers under addition, and the positive real numbers under multiplication are abelian groups, so they’re in the subcategory \mathbf{Ab}.

(2) (\mathbb{Z}_{2} \times \mathbb{Z}_{2}, +) and (\mathbb{Z}_{4}, +) are also abelian groups. We can define a group homomorphism \psi: \mathbb{Z}_{2} \times \mathbb{Z}_{2} \to \mathbb{Z}_{4} by \psi: \begin{cases} (0,0) \to 0 \\ (1,0) \to 1\\ (0,1) \to 2 \\ (1,1) \to 3\end{cases}, so \psi is in the collection of morphisms in \mathbf{Ab}. Of course, it’s also in the morphisms of \mathbf{Grp} as well.

(3) D_{6} is not commutative, so it’s not an object in \mathbf{Ab}

(4)\mathrm{GL}_{2} under matrix multiplication is not abelian, because matrix multiplication is not a commutative operation. So neither this group nor the group homomorphism between D_{6} and \mathrm{GL}_{2} are in \mathbf{Ab}.

Creating the Commutator Subgroup

We now discuss the commutator element. The commutator of two elements g,h in a group, denoted [g,h] is given by [g,h] = g^{-1}h^{-1}gh.

Remark: The order matters here.

Let’s work with a nice easy group of matrices. Let G =\left\{I_{2},\begin{bmatrix}0&1\\1&0\end{bmatrix},\begin{bmatrix}0&1\\-1&-1\end{bmatrix},\begin{bmatrix}-1&-1\\0&1\end{bmatrix},\begin{bmatrix}-1&-1\\1&0\end{bmatrix},\begin{bmatrix}1 & 0\\-1&-1\end{bmatrix}\right\} under matrix multiplication. We’ll name these matrices \{I,A,B,C,D,K\} respectively. This group is not commutative, as shown in the group table below:


 Next, we’re going to form the commutator subgroup, which is the subgroup of G consisting of all commutators of G. Let’s call this subgroup \tilde{G}. The quotient G/\tilde{G} is always an abelian group, so “quotienting out by” \tilde{G} and working with the cosets gives us an “abelian version” of our previously non-abelian group. 

We’ll calculate a few commutator elements to demonstrate how, but will not run through every combination. \begin{aligned}[I,X]&=[X,I]=I\\ [A,B]&=A^{-1}B^{-1}AB=ADAB=D\\ [B,A]&=B^{-1}A^{-1}BA=DABA=B\\ [C,D]&=C^{-1}D^{-1}CD=CBCD=B\\ \vdots\end{aligned}

Continuing with all combinations, we find that there are only three commutator elements: \{I, B, D\}. Thus3 \tilde{G} = \{I,B,D\}. Now, G/\tilde{G} gives us the left cosets of \tilde{G}: \begin{aligned}A\tilde{G}&= C\tilde{G}=K\tilde{G}=\{A,C,K\}\\ B\tilde{G}&= D\tilde{G}=I\tilde{G}=\{I,B,D\}\end{aligned}Thus, the commutator subgroup is G/\tilde{G} = \{A\tilde{G}, \tilde{G}\}. This two-element group is a bit dull, admittedly, but it certainly is abelian, with the identity element as \tilde{G}.

Back to Reflections

Something else we can do with this little two-element group is map it to (\mathbb{Z}_{2}, +) via the homomorphism \phi: G/\tilde{G} \to \mathbb{Z}_{2}, where \phi(\tilde{G}) = 0, and \phi(A\tilde{G}) = 1

What does any of this have to do with reflections? Recall that G \in \mathrm{Ob}(\mathbf{Grp}), but not in \mathrm{Ob}(\mathbf{Ab}). G/\tilde{G} is in \mathrm{Ob}(\mathbf{Ab}), and so is \mathbb{Z}_{2}

A reflection for G into \mathbf{Ab} is a morphism r such that for any morphism \psi:G \to A', A' \in \mathrm{Ob}(\mathbf{Ab}), we can find a morphism \phi completely contained in \mathbf{Ab} such that \psi = \phi \circ r

Let A'= \mathbb{Z}_{2}, and define \psi: G \to \mathbb{Z}_{2} by \psi: \begin{cases}A \to 1 \\ C \to 1 \\ K \to 1 \\ B \to 0\\D \to 0 \\I \to 0\end{cases} This is certainly a homomorphism (\psi(XY) = \psi(X)\psi(Y)). 

What if we could “bounce” this group G off another group H in \mathbf{Ab}, then move from H to \mathbb{Z}_{2}? That might reveal a little more about \psi than just the seemingly contrived definition we put forth. 

Let r: G \to G/\tilde{G} be the canonical map sending each element of G to the appropriate coset. That is r(A) = A\tilde{G}, r(B) = \tilde{G}, etc. Then the image of r is G/\tilde{G}, and \phi as defined above is the unique morphism that will take G/\tilde{G} to \mathbb{Z}_{2} such that \psi = \phi \circ r

One reflection, r, taking G to some object A in \mathbf{Ab}. Grab a morphism \psi from G to somewhere else in \mathbf{Ab} (we picked \mathbb{Z_{2}}). Then we have to be able to find a unique \phi such that \psi decomposes into the composition of r with that \phi. This same r (and thus the same A) should be able to perform this action for any A', any \psi. In our case, the reflection is the canonical map from G to its commutator subgroup. 


Reflections can perform many different actions to take objects in one category to objects in a subcategory. We focused on reflections making things “abelian” in a nice way, which helped reveal some structures that would have otherwise been hidden.


On the optimal storage of bulk cargoes: an exercise in calculus.

On the optimal storage of bulk cargoes: an exercise in calculus.

Editor’s Note: This article is authored by Valentin Fadeev, a physics PhD candidate at The Open University, UK. You can reach him via Twitter at @valfadeev.

Despite their undeniable and enormous role in applications, the methods of calculus are often seen by students present and past as a form of mathematical art removed from everyday life. Evaluating integrals may seem as a sport for the initiated. Extremal problems are mostly to demonstrated on the stripped-down textbook examples. Coming across a use case in an unusual and easily understandable context is rare. We present one such use case here and hope the reader finds it instructive.

We begin with some background from maritime logistics. The term “bulk cargo” is used to denote the type of cargo, such as coal, ores, fertilizers, grain and the like, carried by merchant vessels “in bulk”, that is put directly in the holds and not stuffed in boxes or any other containers. In some cases, while waiting to be picked up by vessels or other modes of transportation, such cargoes will be stored in stacks spilled by belt conveyors or other handling mechanisms.

We consider here a simplified problem of calculating the optimal dimensions of one such stack given its total volume. For the criteria of optimality we pick the ground area occupied by the stack and the work against gravity required to form it.

Figure 1

We shall approximate the stack with a geometrical body in the shape of an obelisk of height H, having a rectangle L_2\times B at its base and 4 facets inclined towards the centre at equal angles, thus forming the upper horizontal edge of length L_1 (see Figure 1).

The volume of the stack can be calculated as follows:\begin{aligned}V& =\int_0^HS(x)\,\mathrm{d}x , & (1)\end{aligned} where the x-axis is directed downwards and S(x) is the area of the horizontal cross-section as a function of x, given explicitly as \begin{aligned}S(x)&=a(x)\cdot b(x)\\&=\left(L_{1}+\frac{L_{2}-L_{1}}{H}x\right)\cdot\frac{Bx}{H}\\&=B\left(L_{1}\frac{x}{H}+\left(L_{2}-L_{1}\right)\left(\frac{x}{H}\right)^{2}\right).&(2)\end{aligned}
Substituting (2) in (1) and integrating we find \begin{aligned}V&=BH\int_0^1\left(L_1\frac{x}{H}+(L_2-L_1)\left(\frac{x}{H}\right)^2\right)\,\mathrm{d}\left(\frac{x}{H}\right)\\&=\frac{HB}{6}(2L_2+L_1) &(3)\end{aligned}

Now we shall determine the amount of work (energy spent) performed against the forces of gravity that is required to form the stack. Assume that density of the cargo is \gamma\; \mathrm{kg}/\mathrm{m}^3. The work \mathrm{d}A required to lift a layer of thickness \mathrm{d}x to height H-x is given as follows:
dA=\gamma gdV\cdot(H-x)=\gamma gB\left(L_1\frac{x(H-x)}{H}+\frac{(L_2-L_1)x^2(H-x)}{H^2}\right)\,\mathrm{d}x,
where g\approx 9.81 \,\mathrm{kg} \cdot \mathrm{m} / \mathrm{s}^2 is the acceleration due to gravity. Thus, the total work is \begin{aligned}A&=\gamma gB\int_0^H\left(L_1\frac{x(H-x)}{H}+\frac{(L_2-L_1)x^2(H-x)}{H^2}\right)\,\mathrm{d}x\\&=\frac{\gamma gBH^2}{12}(L_1+L_2). & (4)\end{aligned}

Now we can ask the following question: if the volume V given, what should be base length L_{2} and width B of so that the stack occupies the smallest possible area? First, we simplify our parametrization of the problem by introducing the angle of the natural slope \chi. This is a term from the soil mechanics which denotes the angle that an unattached slope of the material forms with the horizontal surface at mechanical equillibrium. It depends on the properties of the material such as density and viscosity, and can be looked up in special tables.

In our model \begin{aligned}\frac{2H}{B}=\tan{\chi}.\end{aligned}. We can, therefore, express everything in terms of L\equiv L_2, B and \chi \begin{aligned}H=\frac{1}{2}B\tan{\chi}, &\qquad L_1=L_2-\frac{2H}{\tan{\chi}}=L-B& (5).\end{aligned}
Then the volume (3) can be rewritten as follows \begin{aligned}V&=\frac{B^2\tan{\chi}}{12}(3L-B).&(6)\end{aligned} Solving (6) for L we obtain \begin{aligned}L&=\frac{4V}{B^2\tan{\chi}}+\frac{B}{3}.&(7)\end{aligned} The base area S can then be expressed as the function of B S(B)=LB=\frac{4V}{B\tan{\chi}}+\frac{B^2}{3}, whereby taking the derivative with respect to B and solving S'(B) = 0 we find the stationary point B_0 \begin{aligned}B_0&=\sqrt[3]{\frac{6V}{\tan{\chi}}},& (8)\end{aligned} which happens to be a point of minimum for S. Using (8) L_0 is found as follows \begin{aligned}L_0&=\left(\frac{4}{\sqrt[3]{36}}+\frac{2}{3}\sqrt[3]{36}\right)\sqrt[3]{\frac{V}{\tan{\chi}}},&(9)\end{aligned} so that the minimal area is \begin{aligned}S_0=S(B_0)&=4\left(\frac{1}{\sqrt[3]{6}}+1\right)\left(\frac{V}{\tan{\chi}}\right)^{2/3}\approx 6.2\left(\frac{V}{\tan{\chi}}\right)^{2/3}.&(10)\end{aligned}

Next we shall determine the values of L, B which minimize the amount of work required to form the stack. Rewriting (4) using (5) and (7) we obtain A=\frac{\gamma g \tan^2{\chi}}{144}\left(\frac{24VB}{\tan{\chi}}-B^4\right)
Differentiating with respect to B we find the stationary point to be given precisely by (8) again! Thus we find that, given the volume and the natural slope (8) and (9) actually produce the optimal solution, both in terms of the ground area, and the energy cost. Finally, we can use the above results to determine the largest admissible volume of a single stack given the constraint on the ground pressure, q,\, \mathrm{kg}/\mathrm{m}^2. Given V, S and \gamma the safety condition is expressed by the inequality \frac{V\gamma}{S} < q
Solving for V and using (10) we derive \begin{aligned}V&<\frac{qS}{\gamma}\approx\frac{q}{\gamma}6.2\left(\frac{V}{\tan{\chi}}\right)^{2/3},\\V&<\frac{(4.4q)^3}{\gamma^3\tan{\chi}}\approx 238.48\frac{q^3}{\gamma^3\tan{\chi}}.\end{aligned}

The model can be modified to account for the fact that in practice the edges of the stack can develop surfaces that look like half-cones. In this case the reader will find, following the steps similar to the above that the total area and the work required to form the stack are given by \begin{aligned}S(B) = \frac{4V}{B\tan{\chi}} + \frac{\pi}{12}B^2, &\qquad A(B) = \frac{\gamma g \tan^2{\chi}}{144}\left(\frac{24VB}{\tan{\chi}}-\frac{3\pi}{4}B^4\right),\end{aligned}
respectively. The values B_{S} and B_{A} minimizing the above expressions will not be the same anymore
\begin{aligned}B_{S}=\sqrt[3]{3}B_{A}, &\qquad B_{A} = 2\sqrt[3]{\frac{V}{\pi\tan{\chi}}}.\end{aligned}
We have thus determined the maximum allowed volume, given the safety constraint and the properties of the material, assuming the optimal dimensions of the stack.
In conclusion, we have considered a simple but non-trivial application of the very basic tools of calculus to a problem that has important engineering applications. However, the main point is that calculus is ubiquitous and sometimes can deliver useful and elegant results in places one would least expect it to.

Using Dirac Delta Formalism to Calculate Shear Force and Bending Moment

Using Dirac Delta Formalism to Calculate Shear Force and Bending Moment

Editor’s note: This article was written by Valentin Fadeev, a physics PhD candidate in the UK. He also co-authored this article on equivalences and isomorphisms in category theory. You can find him on Twitter via the handle @ValFadeev.

The delta function was introduced by P.A.M. Dirac, one of the founders of quantum electrodynamics. The delta function belongs to the abstract concepts of function theory. In a rigorous sense it is a functional that picks a value of a given function at a given point. However, it also arises as the result of the differentiation of discontinuous functions. One of the most appealing properties of the delta function is the fact that it allows us to work with discretely and continuously distributed quantities in the same way, replacing discrete sums with integrals. Delta and its derivative meet the physical intuition in cases where we deal with quantities of large magnitude whose impact is limited to a small region of space or a very short amount of time (impulses). The inspiration for this particular article came during a course on mechanics in university.

In structural mechanics, it is often desired to understand the relationship between bending moment and shear force for objects, particularly beams, under applied loads. An explanation of the typical method is given here.

In the text I was using at the time, the proof of the relation between bending moment and shear force did not make explicit use of delta function. The final result was valid, but there was a serious fault in the proof. Point forces were considered as mere constant terms in the sum and differentiated away, despite the fact their contribution was accounted for only starting form certain points (suggesting delta-terms). I gave this problem a mathematically rigorous treatment which led to some nice closed formulae applicable to a very general distribution of load.

Relationship Between Shear Force and Bending Moment

A beam with various forces

Consider a rigid horizontal beam of length L, and introduce a horizontal axis x directed along it. Assume the following types of forces are applied:

  • point forces F^{p}_{i} at x_i, i=1,2,…,n, 0<x_i<L
  • distributed forces with linear density given by continuous functions \phi^{d}_{j}(x) on intervals (a_j,b_j),(j=1,2,…,m), 0<a_{j-1}<b_{j-1}<a_j<b_j<L
  • moments (pairs of forces) M_k with axes of rotation at x_k, (k=1,2,…,p), 0<x_k<L.

Note that we ignore our own mass of the beam for the sake of clarity, although it can be easily accounted for if necessary.

Point Forces

Although the point forces F^{p}_{i} are applied at certain points by definition, in reality the force is always applied to a certain finite area. In this case we can consider the density of the force being very large within this area and dropping to 0 outside it. Hence, it is convenient to define the distribution density as follows: \phi^{p}_{i}(x)=F^{p}_{i}\delta(x-x_i). Shear force F at point x is defined as the sum of all forces applied before that point (we are moving from the left end of the beam to the right one): F=\sum_{x_i<x}F_{i}
F^{p}(x)=\sum_{i}\int_0^x \phi^{p}_{i}(z)\mathrm{d}z=\sum_{i}F^{p}_{i}e(x-x_i), where e(x)=\begin{cases}1,\qquad x>0\\0, \qquad x\neq 0\end{cases} is the step function.

Distributed Forces

Now we shall find the contribution from the distributed forces. \phi^{d}_j(x) may be formally defined on the whole axis, so we must cut off the unnecessary intervals outside (a_j,b_j). Consider the following expressions: \phi^{d}_{j}(x,a_j,b_j)=\phi^{d}_j(x)[e(x-a_j)-e(x-b_j)]. Indeed it is easy to ascertain that the right side is equal to \phi^{d}_j(x) within (a_j,b_j) and vanishes everywhere else. Calculating shear force due to distributed forces demonstrates some useful properties of the delta function: \begin{aligned}F^d&=\sum_{j}\int_{0}^{x}\phi_{j}^{d}(z)[e(z-a_j)-e(z-b_j)]\mathrm{d}z\\&=\sum_{j}\left[\int_0^x\phi^{d}_j(z)e(z-a_j)\mathrm{d}z-\int_0^x \phi^{d}_j(z)e(z-b_j)\mathrm{d}z\right]\\&=\sum_{j}\left[\int_{a_j}^x\phi^{d}_{j}(z)e(z-a_j)\mathrm{d}z-\int{b_j}^{x}\phi^{d}_j(z)e(z-b_j)\mathrm{d}z\right]\\&=\sum_{j}\left[\left.\left(e(z-a_j)\int_{a_j}^x\phi^{d}_j(z)\mathrm{d}z\right)\right|_{a_j}^x-\int_{a_j}^x\left(\int_{a_j}^x\phi^{d}_j(z)\mathrm{d}z\right)\delta(z-a_j)\mathrm{d}z\right.\\&\left.-\left.\left(e(z-b_j)\int_{b_j}^x\phi^{d}_j(z)\mathrm{d}z\right)\right|{b_j}^x+\int_{b_j}^x\left(\int_{b_j}^x\phi^{d}_j(z)\mathrm{d}z\right)\delta(z-b_j)\mathrm{d}z\right]\\&=\sum_{j}\left[e(x-a_j)\int_{a_j}^x\phi^{d}_j(z)\mathrm{d}z+e(x-b_j)\int_{x}^{b_j}\phi^{d}_j(z)\mathrm{d}z\right]\end{aligned}

Here we used the defining property of delta: f(x)\delta(x-x_0)=f(x_0)\delta(x-x_0), from which it follows, in particular that \left(\int_{a_j}^x\phi^{d}_j(z)\mathrm{d}z\right)\delta(x-a_j)=\left(\int_{a_j}^{a_j}\phi^{d}_j(z)\mathrm{d}z\right)\delta(x-a_j)=0.

Bending Moments

Now we shall calculate bending moments created by all types of forces involved. Consider F^{p}_{i} applied at x_i. The bending moment created by this force evaluated at x>x_i can be determined as follows:

\begin{aligned}M_{i}^{p}(x)&=F_{i}^{p}e(x-x_i)(x-x_i),\\M_{i}^{p}(x+\mathrm{d}x)&=F_{i}^{p}e(x+\mathrm{d}x-x_i)(x+\mathrm{d}x-x_i)=F^{p}_{i}e(x-x_i)(x-x_i+\mathrm{d}x),\\dM_{i}^{p}&=F_{i}^{p}e(x-x_i)\mathrm{d}x,\\M_{i}^{p}&=F_{i}^{p}\int_0^{x}e(z-x_i)\mathrm{d}z=F^{p}_{i}\frac{x-x_i+|x-x_i|}{2}.\end{aligned} Hence, the total moment due to point forces is M^{p}=\sum_{i}F_{i}^{p}\frac{x-x_i+|x-x_i|}{2}.

To calculate moment created by the distributed forces we use the following approach adopted in mechanics. Replace the distributed force to the left of the point of summation with a point force applied at the center of gravity of the figure enclosed by the graph of \phi_j(x) and lines x=a_j, x=b_j. If a_j<x<b_j: \begin{aligned}F^{d}_{j}(x)&=\int_{a_j}^x\phi^{d}_{j}(z)\mathrm{d}z,\\ M^{d}_{j}&=\int_{a_j}^x\phi^{d}{j}(z)\mathrm{d}z\left(x-\frac{\int_{a_j}^x z\phi^{d}_{j}(z)\,\mathrm{d}z}{\int_{a_j}^x\phi^{d}_{j}(z)\mathrm{d}z}\right)\\&=x\int_{a_j}^x\phi^{d}_{j}(z)\mathrm{d}z-\int_{a_j}^x z\phi^{d}_{j}(z)\mathrm{d}z.\end{aligned} Differentiating both parts with respect to x we obtain \frac{\mathrm{d}M^{d}_{j}}{\mathrm{d}x}=\int_{a_j}^x\phi^{d}_{j}(z)\mathrm{d}z+x\phi^{d}_{j}(x)-x\phi^{d}_{j}(x)=\int_{a_j}^x\phi^{d}_{j}(z)\mathrm{d}z=F^{d}_{j}(x)

In fact, we could as well include point forces in the above derivation considering total density \phi(x)=\phi^{p}(x)+\phi^{d}(x), which is the correct way to account for point forces in this proof. We can therefore derive the general relation between shear force and bending moment: F=\frac{\mathrm{d}M}{\mathrm{d}x}. Now we need to calculate the contribution made by moments of pairs of forces. Each of these moments can be considered as a pair of equal large forces F applied at a small distance h of each other and oppositely directed. They will make the following contribution to the expression for total density \mu(x)=F\delta(x+h)-F\delta(x)=Fh\frac{\delta(x+h)-\delta(x)}{h}=M\frac{\delta(x+h)-\delta(x)}{h}. Taking the limit as h \to 0: \mu(x)=M\delta'(x)

Note that the derivative of \delta(x) is called the unit doublet.
This does not imply that expression for shear force will contain terms of the form M\delta(x). This is due to the fact that shear force consists of vertical components of all forces and for pairs of forces those components will cancel out each other.
Therefore, the total bending moment of the pairs of forces is expressed as follows M=\sum_{k}M_ke(x-x_k). Finally we can write the expressions for shear force and bending moment in the most general case \begin{aligned}F(x)&=\sum_{i}F^{p}_{i}e(x-x_i)\\&\qquad+\sum_{j}\left[e(x-a_j)\int_{a_j}^x\phi^{d}_j(z)\mathrm{d}z+e(x-b_j)\int_{x}^{b_j}\phi^{d}_j(z)\mathrm{d}z\right],\\M(x)&=\sum_{k}M_ke(x-x_k)+\sum_{i}F^{p}_{i}\frac{x-x_i+|x-x_i|}{2}\\&\qquad+\sum_{j}\int_0^x\left[e(t-a_j)\int_{a_j}^t \phi^{d}_j(z)\mathrm{d}z+e(t-b_j)\int_{t}^{b_j}\phi^{d}_j(z)\mathrm{d}z\right]\mathrm{d}t.\end{aligned} In an important particular case where \phi^{d}_j(x)\equiv \phi^{d}_j=const these expressions reduce to the following \begin{aligned}F(x)&=\sum_{i}F^{p}_{i}e(x-x_i)+\sum_{j}\phi^{d}_j\left[(x-a_j)e(x-a_j)+(x-b_j)e(x-b_j)\right], \\ M(x)&=\sum_{k}M_ke(x-x_k)+\sum_{i}F^{p}_{i}\frac{x-x_i+|x-x_i|}{2}\\ &\qquad+\sum_{j}\frac{\phi^{d}_j}{2}\left[(x-a_j)^2e(x-a_j)+(x-b_j)^2e(x-b_j)\right].\end{aligned}


In this example we have demonstrated the application of Dirac delta and the related functions to a problem where the input data has a combination of continuous and discrete distributions. Singular functions allow us to work with this data in a uniform and relatively rigorous way. The advantage is that we get the final results in the closed form, as opposed to an algorithmic step-by-step method of solution. We can also obtain concise proofs, such as that for the relation between shear force and bending moment, and extend the results to very general inputs.

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Should I accept this shipment?

Should I accept this shipment?

Suppose you work for an engineering or manufacturing firm, and you receive shipments of different parts from various suppliers. It’s not good business practice to just blindly accept a shipment, because some parts in your batch may be defective. Too many defective parts, and you’ll slow down manufacturing (in addition to wasting valuable time and money). You come up with a brilliant plan (which is used in industry already): you’ll take a sample from the shipment, test the sample, and if the number of defective parts in the sample is below some predetermined number, you’ll accept the shipment. Otherwise, you’ll send it back and probably place an angry call to your supplier.


(1) Why not just test everything?

For one, shipments may contain thousands of parts. That will take forever. Secondly, some tests are destructive, meaning you push an item to its breaking point, which renders it unusable. (In these cases, you’re testing to ensure the breaking point is high enough.) Thirdly, certain kinds of tests may be expensive or time-consuming. We have real costs — time and money and physical scarcity — we must consider now.

(2) How big should our sample be?

There are statistical considerations and business/cost considerations. There’s no perfect answer here. Sampling theory is a large branch of statistics, and there are tools for analysis of optimal sample size. However, we must also consider the specific case. If we have a destructive test, the “statistically optimal” size may destroy too many. If the test is time-consuming, we may lack the time to perform the test on a large number of items1

(3) What should be the “cutoff” for shipment rejection?

This discussion is the main focus of this article. We’re going to take a pretty small and simplistic overview of it, just to illustrate how powerful even basic probability can be for engineering and business problems. To do this, we’ll briefly describe the hypergeometric distribution, and then illustrate its use in an operating characteristic curve.

The Hypergeometric Distribution

Suppose we have N objects belonging to one of two classes, with N_{1} in the first class, and N_{2} in the second class, so N = N_{1} + N_{2}. (In this example, N_{1} is the total number of defective parts in the batch, and N_{2} is the number of good parts.) We select n objects without replacement2. Find the probability that exactly x belong to the first class and n-x to the second.

We can select x objects from the N_{1} in class one in {N_{1} \choose x} = \frac{N_{1}!}{x!(N_{1}-x)!} ways3. There are n-x left and they must come from N_{2} objects. We can select those in {N_{2} \choose n-x} ways. Then P(X=x) = \frac{{N_{1} \choose x}{N_{2} \choose n-x}}{{N \choose n}}.

Evaluating a shipment acceptance plan

We’ll create a small example with manageable numbers to illustrate the use of the hypergeometric distribution in acceptance sampling.

A supplier ships parts to another company in lots of 25. Some items in the shipment may be defective. The receiving company has an acceptance sampling plan to inspect n=5 parts from the lot without replacement. If none of the sample is defective, they accept the shipment, otherwise they reject it. Suppose we want to evaluate this plan.

If X is the random number of defectives in the sample, then X has a hypergeometric distribution. Except this time, N_{1} and N_{2} (number of defective and nondefective parts respectively) is unknown. We only know N_{1} + N_{2} = 25.

In designing an acceptance plan, we want the probability of accepting the lot to be large if N_{1} is small. That is, we want to have a high probability of accepting the lot if the true number of defective parts is very small. We also want the probability of “falsely accepting” the lot to be low. (That is, we want the probability of acceptance to be low when N_{1} is high).

When we treat these probabilities as a function of N_{1} (or equivalently, the fraction defective given by p = \frac{N_{1}}{25} in the lot) we call this the operating characteristic curve. Mathematically, the operating characteristic curve, denoted OC(p) is given in this case by:

OC(p) = P(X=0) = \frac{{N_{1} \choose 0}{25-N_{1} \choose 5}}{{25 \choose 5}}

Note here that OC(p) = P(X=0) because that’s the plan in this fictional example. If the plan changed, the operating characteristic curve would be defined by something different. For example, if the plan was to accept shipments that contain 1 or fewer defects, then OC(p) = P(X\leq 1) and we would recalculate those probabilities using the hypergeometric probability mass function given above.

Let’s look at some numerical values of OC(p) for our fictional example. Remember that p = \frac{N_{1}}{25}.


Is the acceptance plan satisfactory? With N_{1} = 1, OC(0.04) = 0.8 which may be considered too low (we may reject perfectly valid shipments), and with N_{1} = 4, OC(0.16) = 0.383 may be too high (we may not want that high a probability of accepting a shipment with that many defects). Thus, we may want to reconsider the number of items we test, or reconsider our acceptance plan.

Usually lot sizes are far larger than the numbers seen here, and sample sizes are in the hundreds, so as the values get large, this computation becomes cumbersome. We can approximate the hypergeometric distribution with the Poisson distribution, which we won’t cover here.


This is a small illustration of the very practical use of the hypergeometric distribution to devise an intelligent strategy for accepting/rejecting shipments of goods. This type of work falls within the purview of the services we offer.

Energy Levels of Molecules are Bounded Below

Energy Levels of Molecules are Bounded Below

Editor’s Note: This guest submission is from Nikita Lisitsa, a professional software developer and mathematician. You can follow him on Twitter.

Are Molecules Perpetual Motion Machines?

Short answer: no, of course not. Perpetual motion machines do not exist. There are deep theoretical reasons for that, as well as less deep but nevertheless convincing experimental data. That’s why we care so much about renewable energy and stuff: we know that energy supply is limited in any real physical system.

From the perspective of mechanics (classical, relativistic, or quantum) when you take some energy from a system, it jumps to a state with lower energy. If there were a system with no lower bound on energy levels, we could withdraw energy forever, forcing the system to fall lower and lower on the energy ladder (this is roughly what forced Dirac to introduce the concept of the Dirac sea). Thus, a sufficiently good mechanical system should have a lower bound on possible energy states.

We now focus on atoms and molecules. What would bound their energy levels from below? To begin, we need a good framework to describe these energy levels. Unfortunately, classical mechanics fails here: the virial theorem implies that for a potential of the form V(r) \sim \frac{1}{r} the time average of kinetic and potential energies, denoted \langle T\rangle and \langle V\rangle, respectively, are related by 2\langle T \rangle = -\langle V \rangle,and the total energy E (which is conserved) is given by E = \langle E \rangle = \langle T \rangle + \langle V \rangle = \frac{1}{2} \langle V \rangle.

For a single electron (which has charge -e) moving in a Coulomb field of a nucleus with charge Ze, the potential energy is V(r) = -\frac{kZe^2}{r}, where k is a Couloumb constant. Thus, if the distance r to the nucleus tends to zero, the potential energy tends to -\infty, and so does the total energy E. This, together with the fact that an electron orbiting a nucleus would lose energy in the form of electromagnetic waves, was one of the major problems of classical mechanics that needed a “quantum treatment”.

Annoyingly Fast and Heavy Introduction to Quantum Mechanics

Let’s dive into the (non-relativistic) quantum world. Unfortunately, we are immediately forced to accept an approximation; as of this writing, I’m unaware of a solution that doesn’t use one. The Born-Oppenheimer approximation suggests that since the protons and neutrons are about 2000 times heavier than electrons, they move much slower and can be considered static. Thus, we must now prove that a system of N electrons moving in a field of K fixed nuclei has a lower bound on possible values of energy.

Under this framework, a molecule is a carefully chosen separable Hilbert space of possible states together with a “nice” essentially self-adjoint operator acting on it. The Hilbert space we are will use is denoted (L^2(\mathbb{R}^3))^{\otimes N} \cong L^2(\mathbb{R}^{3N}), where \cong denotes an isomorphism. The notation indicates that we can represent an element of our Hilbert space in two equivalent ways; we’ll use the second, which is a square-integrable function of 3N real variables which are coordinates of the electrons – 3 coordinates for each of the N electrons. The inner product of two functions \psi and \phi is defined as an integral over the whole 3N-dimensional space:

\langle\psi,\phi\rangle = \int\limits_{\mathbb{R}^{3N}} \overline\psi(r_1\dots r_N) \phi(r_1 \dots r_N) dr_1 \dots dr_N

Remark: The actual space should be tensored by \mathbb C^{2^N} to account for spin, but the effect of spin on energy is too small for us to care in this case. Nevertheless, it exists. Furthermore, the elements of the space are actually not functions;they are equivalence classes of functions modulo function having zero norm. Following the usual colloquial terminology, we’ll call them functions nevertheless; we’ll refer to them as wavefunctions.

A physical state is not the same as a wavefunction: you can multiply the wavefunction by an arbitrary non-zero complex constant, and the result represents the same state. States form some peculiar kind of space; we shall stick to wavefunctions, which are elements of a vector space so we can use the full power of linearity. Thus, it is common to assume wavefunctions to be normalized such that1 \langle \psi,\psi\rangle=1. We shall assume this too.

Next, we need to define the Hamiltonian. Here, the Hamiltonian is similar to that in non-quantum mechanics, with the exception that we must quantize it: turn numbers into operators. The general solution to this problem is unknown. (See Groenewold’s theorem on the inconsistency of canonical quantization and John Baez’s article series on some modern approaches). In our case, however, the recipe is simple: turn the kinetic energy \frac{p_i^2}{2m} into -\frac{\hbar^2}{2m} \Delta_i and turn the potential energy V(r_1,r_2,\dots,r_N) into an operator that multiplies a wavefunction by V.

The energy consists of 

  • the kinetic energy of electrons,
  • the attraction of these electrons to the nuclei, and
  • the repulsion energy between electrons

The usual quantum-mechanical way to write this is:
H = -\sum\limits_i \frac{\hbar^2}{2m} \Delta_i – \sum\limits_{i,A} \frac{kZ_Ae^2}{|r_i – R_A|} + \sum\limits_{i,j} \frac{ke^2}{|r_i – r_j|} We now detail each item in the above equation

  • i,j are indices that run over the electrons in the system
  • A runs over nuclei
  • \Delta_{i} is the Laplacian with respect to i-th electron coordinates, i.e. \frac{\partial^2}{\partial x_i^2}+\frac{\partial^2}{\partial y_i^2}+\frac{\partial^2}{\partial z_i^2}
  • m is the electron mass
  • k is the Coulomb constant
  • Z_A is the number of protons in the A-th nucleus
  • e is the electron charge
  • r_i is the position of i-th electron
  • R_A is the position of A-th nucleus (which are fixed by Born-Oppenheimer)

Nobody likes constants, right? Let’s stick to atomic units instead, where m=k=e=\hbar=1, and the speed of light is 137 (this is exactly \frac{1}{\alpha}, where \alpha is the fine-structure constant). The Hamiltonian becomes a bit cleaner: H = -\sum\limits_i \frac{1}{2} \Delta_i-\sum\limits_{i,A} \frac{Z_A}{|r_i – R_A|} + \sum\limits_{i,j} \frac{1}{|r_i – r_j|}

What does it mean? An operator should act on wavefunctions. For a function \psi, the operator takes some second derivatives of \psi, multiply \psi by other functions, and adds all this up. The result is another function – the result of operator acting on \psi, called H\psi. One important thing is that this operator is essentially self-adjoint, though we shall treat it as if it’s adjoint: this follows from the Laplacian \Delta_i being self-adjoint, and multiplication byreal-valued function being a self-adjoint operator as well.

The energy (well, the average or expectation value energy), of a state \psi is \langle \psi, H\psi\rangle (the inner product of \psi and H\psi), which happens to be equal to \langle H\psi , \psi\rangle, thanks to self-adjointness. Physicists love to use bra-ket notation for expressions like this, and write it as \langle \psi | H | \psi\rangle.

Concerning Energies

We have a space of functions, a differential operator acting on these functions, and an integral that we wish to bound from below whenever the functions are normalized.2

Our way to prove that the expression \langle \psi, H\psi\rangle is indeed bounded below will be broken up into three parts.

Ignoring Repulsion

A good thing about the average energy is that it is linear in the energy operator. This means that, for a sum of two operators, we have \langle \psi, (H_1+H_2)\psi \rangle = \langle \psi, H_1\psi \rangle + \langle \psi, H_2\psi \rangle, which means that we can analyze the terms in the Hamiltonian separately.

Now, the last term is electron-electron repulsion. Let’s write it down again:
\sum\limits_{i,j} \frac{1}{|r_i – r_j|} The only thing that matters for our purposes is that this operator is multiplication by a nonnegative function, which we’ll call f(r_1,\dots,r_N)=\sum\limits_{i,j} \frac{1}{|r_i - r_j|} for now. Non-negativeness implies that the expectation value of this term is never below zero:
\langle \psi, f \cdot \psi \rangle = \int\limits_{\mathbb R^{3N}} |\psi|^2 f dr_1 \dots dr_N \geq 0

Thus, since the expectation value is linear, we can simply drop the electron-electron repulsion term: this action can only lower the energy; since we want to find a lower bound on energy levels, this is okay.

Separating Electrons

Now the simplified Hamiltonian looks like this:
H’ = -\sum\limits_i \frac{1}{2} \Delta_i – \sum\limits_{i,A} \frac{Z_A}{|r_i – R_A|} The electrons do not interact anymore, which makes the total Hamiltonian a sum of single-particle Hamiltonians of individual electrons: H’ = \sum\limits_i \left[ -\frac{1}{2} \Delta_i – \sum\limits_{A} \frac{Z_A}{|r_i – R_A|}  \right] = \sum\limits_i H_i,

where each of H_i depends only on the i-th electron coordinates. In the tensor-product-version of the description of our Hilbert space of states, this means that H_i acts on the i-th component of the tensor product (L^2(\mathbb R^3))^{\otimes N}. Whatever Hilbert space description you use, this implies that the whole system Hamiltonian H' is separable, so analyzing the expectation values of H_i is, in essence, a single-particle problem: the wavefunction \psi(r_1, \dots, r_N) does still depend on all electrons’ coordinates, but they are ignored by H_i. So, if we find a lower bound for each of H_i, we are done.

Reducing to a Hydrogen-Like Atom

Everything before this section was actually pretty straightforward; it is now that we’ll do a simple trick. Look at H_i again:
H_i = -\frac{1}{2} \Delta_i – \sum\limits_{A} \frac{Z_A}{|r_i – R_A|} It would be awesome if we could put the Laplacian under the sum, for the resulting summands would be just the Hamiltonians of a hydrogen-like atom, — a quantum system with any charged particle orbiting around another charged particle. This is one of the most important quantum systems and one of the few that has an exact and known analytical solution.

So, let’s actually put the Laplacian under the sum!

H_i = \sum\limits_{A} \left[ -\frac{1}{2M} \Delta_i – \frac{Z_A}{|r_i – R_A|} \right] = \sum\limits_A H_{i,A},

where M is the number of nuclei. Now, each term of the sum is the Hamiltonian of an atom-like system, with a nucleus of charge Z_A and an electron orbiting it, except that the electron has mass M instead of 1. Thankfully, this is not a problem: plug m_e=M into the exact solutions of the hydrogen-like atom, and we get solutions for each H_{i,A} term of the sum above. We don’t need the actual solutions, though; all we need is the fact that expectation value of energy is bounded below in this solution.

Summing It Up

So, what we’ve done is shown H \geq H’ = \sum\limits_i H_i = \sum\limits_{i,A} H_{i,A}, and the expectation value of each H_{i,A} is bounded below, therefore the expectation value of H itself is bounded below.

This whole idea was inspired by the paper of Tosio Kato “Fundamental properties of Hamiltonian operators of Shrödinger type”. You’ll find many more details on the topic there.

All this is just a tiny bit of the whole story of why the world around us doesn’t explode/freeze/break apart. For a full exposition, check out “The stability of matter: from atoms to stars” by Elliott H. Lieb. I should thank Valentin Fadeev for introducing me to this awesome work.


Equivalence v. Isomorphisms in Category Theory

Equivalence v. Isomorphisms in Category Theory


Editor’s Note: The article is co-written by Rachel Traylor (The Math Citadel/Marquette University) and Valentin Fadeev (The Open University, UK). Substantial additional review, contributions, and discussions were provided by Matt Kukla and Jason Hathcock. A pdf is available for download at the end of this post.

The biggest challenge we have found in studying category theory is the lack of “concrete” examples to illustrate concepts that aren’t themselves highly abstract. This may be due to our more “vulgar” nature as a primarily applied mathematician (from the viewpoint of many academic mathematicians). However, we firmly believe that if category theory wishes to deliver on the promises it makes to help bind many branches of mathematics (and even the sciences) together, it needs a little boring vulgarity sometimes when illustrating concepts as opposed to a never-ending march towards greater abstraction and generality. 

Perhaps we are better poised than many to do the dirty work of illustrating some of these concepts in a more boring way, on topics so familiar many mathematicians ignore them. For this particular article, we will elaborate on the difference between isomorphic categories and equivalent categories. The difference between the two has a degree of subtlety, but exploring a tangible example involving matrices and finite-dimensional vector spaces really clarified these concepts for us. 

Necessary Definitions

We will not assume a familiarity with all the terms we will be using, so some definitions for reference are provided here. It is perfectly acceptable to skim these until they are invoked later, then return for further examination. For reference, please see Abstract and Concrete Categories by Adámek, Herrlich, and Strecker (1990).

Definition (Category)

A category is a quadruple \mathbf{A} = (\mathscr{O}, \text{hom}, \text{id}, \circ) that consists of

  1. A class \mathscr{O} whose members are called \mathbf{A}-objects.1
  2. For each pair (A,B) of \mathbf{A}-objects, a set \hom(A,B) whose elements are called morphisms from A to B. Denoted A \xrightarrow{f} B or f: A \to B. The collection of all morphisms in \mathbf{A} is denoted \text{Mor}(\mathbf{A}).
  3. For each \mathbf{A}-object A, there is an identity morphism A \xrightarrow{id_{A}} A.
  4. A composition law that associates with each \mathbf{A}-morphism A \xrightarrow{f} B and B \xrightarrow{g} C an \mathbf{A}-morphism A \xrightarrow{g \circ f} C we call the composite of f and g.2 This composite map g\circ f is subject to the following conditions:
    • Composition is associative. That is, for A \xrightarrow{f} B, B \xrightarrow{g} C, and C \xrightarrow{h} D

      h \circ (g\circ f) = (h \circ g) \circ f.
    • For any morphism A \xrightarrow{f} B, id_{B} \circ f = f and f \circ id_{A} = f.3
    • The sets \hom(A,B) are pairwise-disjoint, meaning that a morphism f cannot belong to \hom(A,B) and \hom(A',B') at the same time. It can only be in one set of morphisms. This condition guarantees each morphism has a unique domain (where we’re mapping from) and codomain (where we map to).
Abstract and Concrete Categories, Adámek, Herrlich, Strecker (1990)

For those who have never seen any category theory at all, it is worth a pause to list a couple examples of categories. We shall only list the ones that of interest to this article. There are many more examples, and many more “types” of categories beyond these.

  • \mathbf{Set} is the category of all sets. \mathscr{O} is the class of all sets. (Here a particular set is an object. \mathbb{N} is an object in this category, as is the set of all permutations of the numbers \{1,2,3\}. We don’t care about individual elements of these sets. We move from one whole set to another whole set. \hom(A,B) is the set of all functions from set A to set B. id_{A} is just the identity function on a set A, and \circ is our standard composition of functions.
  • \mathbf{Vect} is the category whose objects are all real vector spaces. Morphisms in this category are linear transformations from one vector space to another, so \hom(A,B) is the set of all linear transformations from vector space A to vector space B.4 The identity morphisms are identity transformations, and composition in this category is the standard composition of linear maps.
  • \mathbf{Mat} an interesting category worth spending some time on. Objects here are not matrices, as one might guess. Objects in \mathbf{Mat} are natural numbers. Morphisms are m \times n real-valued matrices. (We can generalize this concept to \mathbf{Mat}_{R}, where the entries of the matrices are the elements of a particular specified ring R, but let’s just stick with real-valued matrices here. Put differently, morphisms take a natural number m “to” a natural number n by assigning it a matrix of dimensions m \times n with real-valued entries. \hom(m,n) is the set of all m\times n real-valued matrices. Structurally, there is only one morphism from m to n. While we can have different m\times n matrices that perform the task of mapping m to n, they are structurally equivalent in that they map two natural numbers to the same dimensions. In this category, the identity morphism id_{m} are m \times m identity matrices. Composition in this category is defined by A \circ B = BA, where BA is just the usual matrix multiplication. It’s worth noting here that a category doesn’t require that all morphisms be composable. \mathbf{Mat} provides a nice example of this. We can only multiply matrices when the dimensions are appropriate to do so, which means only certain morphisms are composable. This doesn’t violate the definition of a category or associativity, etc. Composition must be associative when composition can be done.

Morphisms allow us to move between objects in a category. If we want to move from one category to another, we’ll need the concept of a functor.

Definition (Functor)

If \mathbf{A} and \mathbf{B} are categories, then a functor F from \mathbf{A} to \mathbf{B} is a mapping that assigns to each \mathbf{A}-object A and \mathbf{B}-object FA. It also assigns to each \mathbf{A}-morphism A \xrightarrow{f} A' a \mathbf{B}-morphism FA \xrightarrow{Ff} FA' such that the following conditions hold:

  1. F(f\circ g) = Ff \circ Fg (Composition is preserved whenever f\circ g is defined.)
  2. For every \mathbf{A}-object A, F(id_{A}) = id_{FA}. This makes sure the functor maps the identity morphism on A to the identity morphism on the object that A maps to in the category \mathbf{B}.

Notationally, we’ll see functors written F: \mathbf{A} \to \mathbf{B}, or F(A\xrightarrow{f}B), which shows more clearly that functors move objects and morphisms. 

Functors themselves can have different properties, that are a bit similar to concepts from studying regular functions (e.g. injectivity, surjectivity). We can also compose functors, and that composite functor is indeed a functor.

Definition (Isomorphism as a functor)

A functor F : \mathbf{A} \to \mathbf{B} is called an isomorphism if there is a functor \mathbf{G}: \mathbf{B} \to \mathbf{A} such that G \circ F = id_{\mathbf{A}} and F \circ G = id_{\mathbf{B}}.

Note id_{\cdot} here is the identity functor for a category, not the identity morphism on an object. The identity functor will just take all objects to themselves, and all morphisms to themselves, but the identity functor acts on a category, whereas the identity morphism acts on an object within a category.

Definition (Faithful Functor)

A functor is faithful if all the hom-set restrictions F: \hom_{\mathbf{A}}(A,A') \to \hom_{\mathbf{B}}(FA, FA') are injective.

Here, this means that if a morphism f \in \text{hom}_{\mathbf{A}}(A,A') maps to Ff \in \hom_{\mathbf{B}}(FA, FA'), and another morphism g \in \hom_{\mathbf{A}}(A,A') maps to Fg \in \hom_{\mathbf{B}}(FA, FA') we have that Ff = Fg (the two “target” morphisms are the same), then f = g. This needs to hold for all sets of morphisms between any two objects.

Definition (Full Functor)

A functor F is full if all the hom-set restrictions F: \hom_{\mathbf{A}}(A,A') \to \hom_{\mathbf{B}}(FA, FA') are surjective.

In other words, for a functor to be full, we need a morphism g in \hom_{\mathbf{B}}(FA, FA') to have a morphism f in \hom_{\mathbf{A}}(A,A') such that Ff = g. g should have a “preimage” under the functor. 

Definition (Isomorphism Dense)

A functor is isomorphism-dense if for any \mathbf{B}-object B there exists some \mathbf{A}-object A such that FA \cong B

Two objects FA and B in a category \mathbf{B} are isomorphic if there exists a morphism between the two that is an isomorphism. It is worth clearly noting the definition of an isomorphism at it pertains to morphisms in a category:

Definition (Isomorphism for Morphisms)

A morphism f: A \to B in a category \mathbf{A} is an isomorphism if there exists a morphism g: B \to A in \text{Mor}(\mathbf{A}) such that f\circ g = id_{B} and g\circ f = id_{A} where id_{A} and id_{B} are the identity morphisms for the objects A and B.

Finally, we can combine a few of these to get the notion of an equivalence:

Definition (Equivalence}

A functor is an equivalence if it is full, faithful, and isomorphism-dense.

If we re-write the definition of an isomorphism (functor), then an isomorphism is a functor that is full, faithful, and bijective on objects, whereas an equivalence is a functor that is full, faithful, and isomorphism-dense. The only difference between these two is the notion of being bijective on objects v. being isomorphism-dense. An isomorphism is more restrictive than an equivalence in the sense that all  isomorphisms are equivalences, but we can exhibit equivalences that are not isomorphisms. When we talk about categories being either equivalent or isomorphic, these two words are not interchangeable. In the next sections, we’ll discuss this difference, and illustrate with an explicit example. 

Linear Algebra and Matrices

At this point, we’ll revisit briefly some standard results from linear algebra, focusing on real vector spaces. In most undergraduate linear algebra courses, the notion that a linear transformation can be represented in matrix form is simply stated (sometimes proven, but not often). In particular, those linear algebra courses designed for engineers may only focus on linear systems of equations (which are linear transformations) and omit much talk of linear transformations at all. I then consider it reasonable to revisit some of these ideas. For more details, please consult your favorite linear algebra text.

Linear Spaces (Vector Spaces)

Remark: The terms linear space and vector space are interchangeable. Different texts will use different terms.

Definition (Linear Space)

A linear space over a field \mathbb{F} is a set of elements (vectors) V with two operations +, \cdot. The operation + is called vector addition takes two elements v, w \in V and returns another element v+w \in V. The operation \cdot is scalar multiplication, and takes a scalar element \alpha from the field \mathbb{F} and multiplies it by an element v \in V and returns a “scaled” element \alpha v \in V. We require closure under linear combinations. For any \alpha_{1},\alpha_{2},\ldots,\alpha_{n} \in \mathbb{F} and \mathbf{v}_{1}, \mathbf{v}_{2},\ldots,\mathbf{v}_{n} \in V,
\sum_{i=1}^{n}\alpha_{i}\mathbf{v}_{i} \in V. This is a consolidation of the ten axioms a linear space must satisfy. The expanded axioms are relegated to the appendix.

We’re going to focus on real vector spaces rather than this abstract definition. Real vector spaces have \mathbb{R} as our field (so we multiply vectors by real numbers). 

The vector spaces most often encountered in engineering problems are \mathbb{R}^{n}–the standard n-dimensional real vector spaces on real-valued vectors. (Also known as a standard Euclidean space.) 

Now suppose we wish to move from one vector space V to another W. There are many ways we can define such a move, but the one we’re interested in are linear transformations.

Definition (Linear Transformation)

If V,W are linear spaces, then a function T: V \to W is a linear transformation if

  1. T(x+y) = T(x) + T(y) for all x,y \in V. If we add two vectors, and then transform the result into the space W, it should yield the same vector in W as if we transformed each of x and y into the space W, then added them together.
  2. T(\alpha x) = \alpha T(x) for all \alpha \in \mathbb{F}, x \in V. (We can “factor out” scalars.)

We can take linear transformations from \mathbb{R}^{m} \to \mathbb{R}^{n} by transforming coordinates as in the following example:

T:\mathbb{R}^{2} \to \mathbb{R}^{2} is given by T(x) = x+2y and T(y) = 3y

From linear algebra, we know we may rewrite the transformation in vector form, since we live in “vector space” (in 2 dimensions here):

T\left(\begin{bmatrix}x\\y\end{bmatrix}\right) = \begin{bmatrix}x+2y\\3y\end{bmatrix}

Typically the next step in linear algebra is to say that we may represent the transformation T in matrix form such that multiplication of our “source vector” by the matrix yields the result of the linear transformation. For our example, the matrix A corresponding to the linear transformation T is 

A = \begin{bmatrix}1 & 2 \\ 0 & 3\end{bmatrix}, so that for \mathbf{x} = \begin{bmatrix}x\\y\end{bmatrix} A\mathbf{x} = \begin{bmatrix}x+2y \\ 3y\end{bmatrix}

At this point, this equivalence between linear transformations and matrix representation (with operation matrix/vector multiplication) is simply stated and expected to be accepted. However, we’re going to use the notion of equivalences in category theory to establish this formally.

Moving from Mat to Vect

As mentioned before, we’re going to focus only on finite dimensional vector spaces (\mathbb{R}^{n} for various values of n or other spaces of finite dimension ) and real-valued matrices to keep things simple. These concepts generalize much further, though. Since we’re operating on real-valued matrices, we’ll be using \mathbf{Mat}_{\mathbb{R}}5

To show the equivalence between\mathbf{Vect} and \mathbf{Mat}, we must establish an equivalence functor between the two that allows us to map \mathbf{Mat} to \mathbf{Vect}. We define the functor as follows:

F:\mathbf{Mat}\to\mathbf{Vect} assigns objects n in \mathbf{Mat} the vector space \mathbb{R}^{n}. Thus, we simply assign a natural number n in \mathbf{Mat} to the corresponding n-dimensional real vector space in \mathbf{Vect}, which is an object in \mathbf{Vect}.

Functors also map morphisms in the source category (\mathbf{Mat} in our case) to morphism in the destination category (\mathbf{Vect} here). Morphisms in \mathbf{Mat} are m \times n matrices with real values. Morphisms in \mathbf{Vect} are linear transformations. Thus, it’s natural to have our functor take an m \times n matrix A in\mathbf{Mat} and assign to it the linear map T in \text{Mor}(\textbf{Vect}) from \mathbb{R}^{m} \to \mathbb{R}^{n} that takes an m-dimensional vector \mathbf{x}=\begin{bmatrix}x_{1}\\\vdots \\x_{m}\end{bmatrix} \in \mathbb{R}^{m} maps it to the n-dimensional vector \mathbf{y} = \begin{bmatrix}y_{1}\\\vdots\\y_{n}\end{bmatrix}^{T} \in \mathbb{R}^{n} given by \mathbf{x}^{T}A.6

We need to check that this functor satisfies the definition of both a functor and an equivalence. That F is a functor follows from matrix multiplication on one hand, and from composition of linear maps on the other. To ensure it satisfies the definition of an equivalence, we must check to see that the functor is full, faithful, and isomorphism-dense.

To show the functor is faithful, take two linear transformations T and T' in \mathbf{Vect} such that T = T'. In other words, these two transformations do the exact same thing to input vectors. It’s clear there can if we had two different matrices A and A' such that multiplication by \mathbf{x} yields the exact same vector \mathbf{y}, then A = A'. Thus the functor is indeed faithful. To show the functor is full, we need to show that for any linear transformation T \in \text{Mor}(\mathbf{Vect}), there is a matrix A \in \text{Mor}(\mathbf{Mat}) such that F(A) = T. A linear transformation by definition (see above) results in a component x_{i} \in \mathbb{R}^{m} being taken to a linear combination of components x_{j} \in \mathbb{R}^{n}. Refer back to the example above, where we mapped from \mathbb{R}^{2} to \mathbb{R}^{2}.

T\left(\begin{bmatrix}x\\y\end{bmatrix}\right) = \begin{bmatrix}x+2y\\3y\end{bmatrix}

Since multiplication of matrices results in components which are linear combinations of the source vector by definition of matrix multiplication, there is certainly a matrix A \in \text{Mor}(\mathbf{Mat}) that will map to T \in \text{Mor}(\mathbf{Vect}) for any linear transformation T. Thus, F is certainly a full functor. 

It remains to show that F is isomorphism-dense. To do so, we’re going to attempt to create its “inverse” functor G : \mathbf{Vect} \to \mathbf{Mat}. It is at this point we’ll really understand the difference between being isomorphism-dense and being a true isomorphism.

Creating the Inverse Functor

We’re going to create a functor G that maps \mathbf{Vect} to \mathbf{Mat}. For this, we’ll define it as the “natural” inverse. G will take vector spaces in \mathbf{Vect} to their dimension n \in \mathbb{N}, which is an object in \mathbf{Mat}. We’ll also take T \in \text{Mor}(\mathbf{Vect}) to its corresponding matrix A \in \text{Mor}(\mathbf{Mat}), simply “reversing” the direction. 

We can show in an identical manner to what we did for F that the functor G is full. Isomorphisms in \mathbf{Vect} are invertible linear maps between vector spaces. Since the objects of \mathbf{Mat} are natural numbers, we cannot have isomorphisms between different objects. That is, m cannot be isomorphic to n inside \mathbf{Mat} unless m=n.

We now illustrate the key difference between isomorphism and isomorphism-dense functors. Take the space of polynomials of order m with real coefficients. This is indeed a real vector space, and belongs to \mathbf{Vect}. This polynomial space is isomorphic to \mathbb{R}^{m} inside of \mathbf{Vect}, meaning that there is an invertible linear map K from the polynomial space to \mathbb{R}^{m}. Recall that functors not only map objects to objects, but they also map morphisms to morphisms. Since all the functor F originating \mathbf{Mat} cares about is the dimension of the space, the isomorphism K between the polynomial space and \mathbb{R}^{m} is “invisible” in the category \mathbf{Mat}, because \mathbf{Mat} sees the polynomial space and \mathbb{R}^{m} just as “some” m-dimensional spaces. Thus the morphism K \in \text{Mor}(\textbf{Vect}) has no morphism in \text{Mor}(\textbf{Mat}) that will map to it. 

The functor G is not faithful because T and L will map to the same matrix A_{m\times n} in \text{Mor}(\textbf{Mat})

Therefore, the two categories \mathbf{Vect} and \mathbf{Mat} are equivalent, but not isomorphic, because F is only isomorphism-dense, not fully isomorphic.


We exhibited an explicit example involving the category \mathbf{Mat} of matrices over \mathbb{R} and the category \mathbf{Vect} of finite dimensional real vector spaces to illustrate the difference between functors that are isomorphisms and functors that are equivalences. The key difference is the notion of isomorphism-dense, which means that an object in the target category of a functor may not have a “preimage” in the source space, but is isomorphic inside the target category to an object which does have a “preimage” in the source space. 


For a linear space V over a field \mathbb{F}, the following axioms regarding the two operations must be satisfied:

  1. For x,y \in V, x+y \in V. That is, if we perform the addition operation on two elements of V, the result can’t be outside the set V.
  2. For each \alpha \in \mathbb{F} and v \in V, \alpha v \in V. Multiplying an element by a scalar shouldn’t “knock” us outside the set V.
  3. For x,y \in V, x+y = y+x. (Commutativity)
  4. For x,y,z \in V, x+(y+z) = (x+y)+z (Associativity)
  5. There must be some element in V, denoted 0_{V} that is an \textit{additive identity}. That is, for any x \in V, x + 0_{V} = 0_{V} + x = x.
  6. For every x \in V, the element (-1)x has the property that x + (-1)x = 0_{V}. (Existence of “negatives”, or additive inverses)
  7. For \alpha, \beta \in \mathbb{F}, v \in V, \alpha(\beta v) = (\alpha\beta)v. (Associativity of multiplication)
  8. For all x,y \in V, and for all \alpha \in \mathbb{F}, \alpha(x+y) = \alpha x + \alpha y (Distributivity of vector addition)
  9. For all \alpha, \beta \in \mathbb{F}, v \in V, (\alpha + \beta)v = \alpha v + \beta v.7
  10. For every v \in V, we have that 1v = v, where 1 is the multiplicative identity in \mathbb{F}.


We’d like to gratefully acknowledge Matt Kukla and Jason Hathcock for their insights, review, and meaningful discussion.

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

The Cartesian Product of Two Graphs

The Cartesian Product of Two Graphs

(Ed. Note: A pdf version of this article is attached at the end of the post for offline reading.)

Introduction and Preliminaries

Graphs are objects like any other, mathematically speaking. We can define operations on two graphs to make a new graph. We’ll focus in particular on a type of graph product- the Cartesian product, and its elegant connection with matrix operations. 

We mathematically define a graph G to be a set of vertices coupled with a set of edges that connect those vertices. We write G = (V_{G}, E_{G}). As an example, the left graph in Figure 1 has three vertices V_{G} = \{v_{1}, v_{2}, v_{3}\}, and two edges E_{G} = \{v_{1}v_{2}, v_{2}v_{3}\}. The order of G is the number of vertices, and the size of G is the number of edges. So our graph G has order n=3 and size m=2. This graph in particular has a special name, P_{3}, because it’s a special type of graph called a path that consists of 3 vertices. Two vertices that are connected by an edge are adjacent and denoted with the symbol \sim. 

Cartesian Product of Graphs

Now that we’ve dispensed with necessary terminology, we shall turn our attention to performing operations on two graphs to make a new graph. In particular, a type of graph multiplication called the Cartesian product. Suppose we have two graphs, G and H, with orders n_{G} and n_{H} respectively. Formally, we define the Cartesian product G \times H to be the graph with vertex set V_{G} \times V_{H}. Pause here to note what we mean by this. The vertex set of the graph Cartesian  is the Cartesian product of the vertex sets of the two graphs: V_{G \times H} = V_{G} \times V_{H}. This means that the Cartesian product of a graph has n_{G}n_{H} vertices. As an example, P_{3} \times P_{2} has 3\cdot 2 = 6 vertices. The vertices of G \times H are ordered pairs formed by V_{G} and V_{H}. For P_{3} \times P_{2} we have, 

 V_{P_{3} \times P_{2}} = \{(v_{1}, x_{1}),(v_{1}, x_{2}),(v_{2}, x_{1}),(v_{2}, x_{2}), (v_{3}, x_{1}),(v_{1}, x_{2}) \}

The edge set of G \times H is defined as follows: (v_{i}, x_{k}) is adjacent to (v_{j}, x_{l}) if

  1. v_{i} = v_{j} and x_{k} \sim x_{l}, or
  2. x_{k} = x_{l} and v_{i} \sim v_{j}


Let’s create P_{3} \times P_{2} as an example:


The red edges are due to condition (1) above, and the red to (2). Interestingly enough, this operation is commutative up to isomorphism (a relabeling of vertices that maintains the graph structure). This will be examined in a later section.

Graph and Matrices

We can also operate on graphs using matrices. The pictures above are one way to represent a graph. An adjacency matrix is another. The adjacency matrix A_{G} of a graph G of order n is an n \times n square matrix whose entries a_{ij} are given by 

a_{ij} = \begin{cases} 1, & v_{i} \sim v_{j} \\0, & i=j \text{ or } v_{i} \not\sim v_{j}\end{cases}.

 The adjacency matrix for P_{3} and P_{2} respectively are 

A_{P_{3}} = \begin{bmatrix} 0 & 1 & 0\\ 1 & 0 & 1\\0 & 1 & 0\end{bmatrix} \qquad A_{P_{2}} = \begin{bmatrix} 0 & 1\\1 & 0\end{bmatrix}

Note that a relabeling of vertices simply permutes the rows and columns of the adjacency matrix. 

Adjacency Matrix of the Cartesian product of Graphs

What is the adjacency matrix of G \times H? If G and H have orders n_{G} and n_{H} respectively, we can show that

A_{G \times H} = A_{G} \otimes I_{n_{H}} + I_{n_{G}} \otimes A_{H}

where \otimes is the Kronecker product, and I_{m} is the m\times m identity matrix. Recall that the Kronecker product of two matrices A_{m \times n} \otimes B_{k \times l} is the mk \times nl matrix given by 

A \otimes B = \begin{bmatrix} a_{11}B & a_{12}B &\ldots & a_{1n}B \\ a_{21}B & a_{22}B & \ldots & a_{2n}B \\ \vdots & & \ddots& \vdots \\ a_{m1}B & a_{m2}B & \ldots & a_{mn}B\end{bmatrix}

In general, the Kronecker product is not commutative, so A \otimes B \neq B \otimes A. 

We can prove the formula for A_{G \times H} above using standard matrix theory, but this really wouldn’t be that illuminating. We’d see that the statement is indeed true, but we would gain very little insight into how this affects the graph and why. We shall break apart the formula to see why the Kronecker product is needed, and what the addition is doing. 

Let us return to finding P_{3} \times P_{2}. By our formula, A_{P_{3} \times P_{2}} = A_{P_{3}} \otimes I_{2} + I_{3} \otimes A_{P_{2}}. Notice first why the dimensions of the identity matrices “cross” each other; that is, we use the identity matrix of order n_{P_{2}} in the Kronecker product with A_{P_{3}} and vice versa n_{P_{3}} for A_{P_{2}}. This ensures we get the correct dimension for the adjacency matrix of P_{3} \times P_{2}, which is 6 \times 6. 

We’ll walk through the computation one term at a time. First,

\begin{aligned}A_{P_{3}} \otimes I_{2} &= \begin{bmatrix} 0 & 1 & 0\\ 1 & 0 & 1\\0 & 1 & 0\\\end{bmatrix} \otimes \begin{bmatrix} 1 & 0\\0 & 1\end{bmatrix} \\&= \begin{bmatrix} 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0\\ 1 & 0 & 0 & 0 & 1 & 0\\ 0 & 1 & 0 & 0 & 0 & 1\\ 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0\\\end{bmatrix}\end{aligned}

This matrix is the adjacency matrix of a 6-vertex graph given below:

Notice that we now have two copies of P_{3} now in one graph. Similarly, I_{3} \otimes A_{P_{2}} is 

\begin{aligned}I_{3} \otimes A_{P_{2}} &= \begin{bmatrix} 0 & 1 & 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 & 0 \\0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1\\ 0 & 0 & 0 & 0 & 1 & 0\end{bmatrix}\end{aligned}


Here the second Kronecker product term has made three copies of P_{2}. The last step is to add these two matrices together to obtain the adjacency matrix for P_{3} \times P_{2}. Graphically, what’s happening? This addition term overlays our 3 copies of P_{2} (usually written 3P_{2} or P_{2} \cup P_{2} \cup P_{2}) onto the respectively labeled vertices of our two copies of P_{3} (written 2P_{3} or P_{3} \cup P_{3}). That is,

\begin{aligned}A_{P_{3} \times P_{2}} &= A_{P_{3}} \otimes I_{2} + I_{3} \otimes A_{P_{2}}\\&= \begin{bmatrix} 0 & 1 & 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 1 & 0 & 0 \\1 & 0 & 0 & 1 & 1 & 0 \\ 0 & 1 & 1 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 & 0 & 1\\ 0 & 0 & 0 & 1 & 1 & 0\end{bmatrix}\end{aligned}

which is the adjacency matrix of our originally obtained graph


Note that here the vertex labels aren’t ordered pairs. That’s ok. Technically we shouldn’t have labeled the vertices of the two graphs identically, but the goal was to illustrate the action of the Cartesian product. We can always relabel the edges. Formally, the overlaying of 3P_{2} onto the vertices of 2P_{3} would be forming those ordered-pair vertices. 

Commutativity of the Graph Cartesian Product

A natural question for any operation is whether or not it possesses the property of commutativity. An operation is commutative if the order in which we take it produces the same result. That is, if \square is an operation, a\square b = b\square a.

Does G \times H yield the same graph as H \times G? In a way, yes. Let’s examine what happens if we switch the order of the graph Cartesian product and take P_{2} \times P_{3}. 

We won’t go quite as thoroughly through each step, and instead give the results. 

We know the adjacency matrix of P_{2} \times P_{3} is given by 

A_{P_{2} \times P_{3}} = A_{P_{2}} \otimes I_{3} + I_{2} \otimes A_{P_{3}}

\begin{aligned}A_{P_{2}} \otimes I_{3} &= \begin{bmatrix}0 & 0 & 0 & 1 & 0 & 0\\0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1\\ 1 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0\end{bmatrix}\end{aligned}

Notice again we have 3 copies of P_{2}, but the labeling and drawing is now different. If we create a mapping \phi that relabels v_{4} as v_{2}, v_{6} as v_{4}, and v_{2} as v_{6}, then we’d get back the graph we drew from the adjacency matrix formed by I_{3} \otimes A_{P_{2}}. Thus, the two graphs are structurally equivalent; it just looks different due to labeling and redrawing. The mapping \phi is called an isomorphism because it transforms one graph into the other without losing any structural properties, and the two graphs are isomorphic. 

This implies that A_{P_{2}} \otimes I_{3} \neq I_{3} \otimes A_{P_{2}}, but if we permute rows and columns, we can transform one matrix into the other. If the labelings on the vertices didn’t matter, then the operation is commutative up to isomorphism. If the labelings matter (more common in engineering), then we do not get the “same” graph by switching the order of the Kronecker product.

We’ll do the same thing and examine I_{2} \otimes A_{P_{3}} and compare it to A_{P_{3}}\otimes I_{2}:

\begin{aligned}I_{2} \otimes A_{P_{3}} &= \begin{bmatrix}0 & 1 & 0 & 0 & 0 & 0\\1 & 0 & 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 & 0 & 1 \\ 0 & 0 & 0 & 0 & 1 & 0\end{bmatrix}\end{aligned}

The corresponding graph is 

Notice again that here we have two copies of P_{3} just as when we created the graph formed by the matrix A_{P_{3}}\otimes I_{2}, but the labels are again permuted. We have generated an isomorphic graph. 

Finally, we add the two matrices together (or follow the definition to create the last step of the Cartesian product by overlaying vertices and fusing them together). 

\begin{aligned}A_{P_{2}} \otimes I_{3} + I_{2} \otimes A_{P_{3}} &= \begin{bmatrix}0 & 1 & 0 & 1 & 0 & 0\\1 & 0 & 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 & 0 & 1\\ 1 & 0 & 0 & 0 & 1 & 0\\ 0 & 1 & 0 & 1 & 0 & 1 \\ 0 & 0 & 1 & 0 & 1 & 0 \end{bmatrix}\end{aligned}

This is isomorphic to P_{3} \times P_{2}. We can show this by finding a mapping \phi that “untangles” P_{2} \times P_{3} by relabeling vertices. We can also redraw the ladder structure we saw with P_{3} \times P_{2} and label the vertices so that we get the structure of P_{2} \times P_{3}

Our conclusion is that the graph Cartesian product is “sort of” commutative as an operation. If the vertices were unlabeled or didn’t matter, then the operation is commutative because we can always relabel vertices or “untangle” the structure. If the vertices are labeled and matter, then we don’t get the same graph back when we switch the order in the Cartesian product. Thus, we can say that the graph Cartesian product is commutative to isomorphism. 


A natural question after looking through all of this is the following:

Given an adjacency matrix, can we decompose it into a Cartesian product of two graphs?

The next article will address this.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Exploiting Chemistry for Better Packet Flow Management 5: Chemical Congestion Control, Design Motifs, and Conclusion

Exploiting Chemistry for Better Packet Flow Management 5: Chemical Congestion Control, Design Motifs, and Conclusion

This represents the final installment of the series reviewing the 2011 technical report by Meyer and Tschudin. Part 1 gave an overview of the report and the problems it aimed to solve, as well as the chemistry basics necessary for further understanding. Part 2 discussed artificial chemistries and the extension to an artificial packet chemistry, setting the mathematical framework for analyses and implementation. Part 3 discussed briefly some methods of network analysis available once the artificial packet chemistry was developed. Part 4 gave an implementation of a scheduler based on the packet chemistry model as well as the idea of a chemical control plane and its implementation. This final part will discuss one final application — a congestion control algorithm–as well as mention design motifs pointed out by the authors and conclude our analysis of this report. 

Chemical Congestion Control Algorithm

As a final application of chemical networks, the authors combine LoMA-scheduled queues and flow-filtering patterns to schedule segments of a transport protocol to control congestion. To illustrate, they re-implement the additive increase/multiplicative decrease of the congestion avoidance mode of TCP-Reno. The congestion control algorithm reacts to packet loss automatically and naturally. 
The reproduced figure above shows how the chemical congestion control is implemented as a chemical reaction network (and by extension, a queueing network). 
  1. Arriving packets are put into a queue D. The transmission rate \nu_{tx} is controlled  by the quantity of pacemaker molecules R, so \nu_{tx} = k_{1}c_{R}c_{D}, once again according to the Law of Mass Action. To mimic the additive (linear) increase mechanism of TCP-Reno, the number of pace-maker molecules is increased at a rate \nu_{\text{inc}}.
  2.  Before packets are transmitted, they are tagged with a sequence number. If there is a gap in the sequence number of acknowledgments from the destination, the source regenerates the packets at a queue L.
  3. A lost packet will catalyze the destruction of pacemaker molecules by another reaction r_{2}, which will lead to the exponential decay of R-molecules and thus decrease the transmission rate. However, we wish to prevent too fast a destruction of pacemaker molecules, so a third reaction r_{3} will delay the destruction. 
The authors encourage the use of such a reaction graph at the design phase of flow management policies. The feedback nature is much clearer. In addition, the papers give a formal proof that their congestion control model is TCP-fair at equilibrium; that is, the transmission rate is proportional to \frac{1}{\sqrt{p_{\text{loss}}}} where p_{\text{loss}} is the probability of packet loss between source and destination. They also discuss an extended version that reacts to variations in round trip time (RTT) variation to more fully exploit the link bandwidths. The traffic statistics are not computed symbolically with chemical reactions. Instead, another reaction builds a difference sch that at equilibrium the fill level of its queue is proportional to the excess transmission rate. That signal decays the pacemaker molecules. They also supply simulations to illustrate their implementations. 

Design Motifs

This section is of a particular interest to me personally, and was treated the least. No design process can ever be fully automated, but the authors claim to have developed several design motifs of chemical reaction networks for a variety of purposes, including arithmetic computation of fill levels to communication patterns (anycast, neighborhood discovery, etc). Unfortunately, they do not give a direct citation as to where to look further. This report will be updated when such information is found.
Figure 12 shows the only two motifs provided by the authors. One for rate limiting (a) and the other bimolecular reaction to compute the difference between arrival rates for two queues. The concept of using these as design elements is extremely intriguing, and it was unfortunate that the authors did not choose to expand this further. 


Meyer and Tschudin have given an extensive report showing how powerful the application of chemical kinetics and chemical networks can be for the computer networking space. There are several research opportunities available for further study and implementation. As yet, there have been no citations of this work of note (the report came out in 2011), and thus the opportunity seems ripe for exploration. 


  1. Dittrich, P., Ziegler, J., and Banzhaf, W. Artificial chemistries – a review. Artificial Life 7(2001), 225–275.
  1. Feinburg, M. Complex balancing in general kinetic systems. Archive for Rational Mechanics and Analysis 49 (1972).
  2. Gadgil, C., Lee, C., and Othmer, H. A stochastic analysis of first-order reaction networks. Bulletin of Mathematical Biology 67 (2005), 901–946.
  3. Gibson, M., and Bruck, J. Effcient stochastic simulation of chemical systems with many species and many channels. Journal of Physical Chemistry 104 (2000), 1876–1889.
  4. Gillespie, D. The chemical langevin equation. Journal of Chemical Physics 113 (2000).
  5. Gillespie, D. The chemical langevin and fokker-planck equations for the reversible isomerizationreaction. Journal of Physical Chemistry 106 (2002), 5063–5071.
  6. Horn, F. On a connexion between stability and graphs in chemical kinetics. Proceedings of the RoyalSociety of London 334 (1973), 299–330.
  7. Kamimura, K., Hoshino, H., and Shishikui, Y. Constant delay queuing for jitter-sensitive iptvdistribution on home network. IEEE Global Telecommunications Conference (2008).
  8. Laidler, K. Chemical Kinetics. McGraw-Hill, 1950.
  9. McQuarrie, D. Stochastic approach to chemical kinetics. Journal of Applied Probability 4 (1967), 413–478.
  10.  Meyer, T., and Tschudin, C. Flow management in packet networks through interacting queues and law-of-mass-action-scheduling. Technical report, University of Basel.
  11.  Pocher, H. L., Leung, V., and Gilles, D. An application- and management-based approach to atm scheduling. Telecommunication Systems 12 (1999), 103–122.
  12. Tschudin, C. Fraglets- a metabolistic execution model for communication protocols. Proceedings of the 2nd annual symposium on autonomous intelligent networks and systems (2003).

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Exploiting Chemistry for Better Packet Flow Management 4: Scheduler Implementation

Exploiting Chemistry for Better Packet Flow Management 4: Scheduler Implementation

This article is the fourth part in a series based on a report reviewing the technical report of Meyer and Tschudin[11] who have extended the notion of an artificial chemistry to an artificial packet chemistry with the intention of exploiting the natural behavior of chemical reactions to design better flow management policies for computer networks. Part 1 introduced the chemistry foundations necessary for study of this technical report, including the Law of Mass Action. Part 2 elaborated on the mathematical model of an artificial packet chemistry. Part 3 discussed the various types of mathematical analyses for various queueing questions now available with the expression of a computer network and its flow as an artificial packet chemistry. This part will discuss an actual engineering application of all the ideas discussed thus far–a scheduler and a chemical control plane.

Implementation of a Scheduler Based on the Law of Mass Action

Likely, this section will be of greatest interest to network engineers. The authors have indeed designed and implemented a scheduler that utilizes this approach in an elegant fashion. In addition, they discuss a “chemical control plane” that can automatically be compiled from the abstract model. In another application, they relax the static nature of the network to allow an active networking approach that reshapes the queuing network at run-time. The authors do discuss specifics of implementation, though this article will only briefly touch on it.


Each network node/reaction vessel has its own scheduler. The scheduler computes the next occurrence time of each rule r \in R_{i} in its local node (this is equivalent to “serving” or processing a packet or set of packets for bimolecular reactions) according to the Law of Mass Action. It then will sort the events into a priority queue, wait until the first event occurs, then execute. The main difficulty for a scheduler is to dynamically react and reschedule events properly as packets are added to or drained from its queues. The authors note that an efficient mass action scheduler can be implemented that requires only O(\log(|\mathcal{R}|)) time to enqueue or dequeue packets. This is based on the Next Reaction Method[4] of Gibson and Bruck. 

Here we’ll recount an explicit example that illustrates the concept. If we return to Figure 1 reproduced below, we can walk through Meyer and Tschudin’s scheduler implementation.


There are two queues, X and Y. Reaction 1 (Server 1) is bimolecular: X+Y \rightarrow Z, so the server pulls packets from two queues to execute the service. Reaction 2 (Server 2) is unimolecular, pulling only from queue Y. If we assume the reaction constants k_{1} = 1000/(\text{packet}\cdot s) and k_{2} = 1000/\text{s}, that X begins with two packets in its queue, and Y begins with 3 packets in its queue, then the reaction rates \nu_{r}, r=1,2 are respectively \nu_{1} = k_{1}c_{X}c_{Y} = 1000\cdot2\cdot3 = 6000 and \nu_{2} = k_{2}c_{Y} = 1000\cdot 3 = 3000. The occurrence time is the reciprocal of the reaction rate, so the occurrence times \tau_{r} are respectively \tau_{1} = \frac{1}{6} ms and \tau_{2} = \frac{1}{3} ms. That means the first server executes its action first, extracting packets from both X and Y. 

Since the occurrence time of r_{2} is coupled with r_{1} (both servers pull from queue Y), the action of r_{1} requires a rescheduling of r_{2}. After r_{1} pulls a packet each from X and Y, there is 1 packet left in X and 2 in Y, which means we have to recalculate the rate \nu_{2} = 1000\cdot 2 = 2000. The occurrence time of r_{2} is at ms \frac{1}{3}, so its time of execution hasn’t arrived. But thanks for r_{1}‘s effect, we have to rescale and reschedule the occurrence time of r_{2}. This is done by the following:

\tau_{r,\text{new}}-\frac{\nu_{r,\text{new}}}{\nu_{r,\text{old}}}(\tau_{r,\text{old}}-t_{\text{now}}) + t_{\text{now}},

where (\tau_{r,\text{old}} -t_{\text{now}}) is the time remaining between the original execution time and the current time. The multiplier in front is a scaling effect. 

In this example, at t_{\text{now}} = 1/6 ms, r_{2} was supposed to go at time 1/3 ms, but will now be prolonged. 

A note here, I did the math for their specific example, and it seems off. I think the multiplier should be as I’ve written above. The authors wrote the reciprocal, which prolongs too far. I’ll work to contact the authors to verify this. 

There are other timed scheduling algorithms utilized in computer networking, such as Earliest Deadline First, which require tagging each packet with a timestamp. This scheduler does not require such an imposition. 

The Chemical Control Plane

Here, the authors describe what they term as a chemical control plane that is intended to avoid the messy necessity of sending packets through a complex queueing network in order to shape packet flow as desired. The control plane takes advantage of concepts in enzymatic chemical reactions in order to control flow. This is a different application than the flow networks discussed thus far (as I understand it).


Here the forwarding plane which executes actions is separated from the control plane which will shape the flow of packets in the forwarding plane. 

The chemical control plane will dynamically determine the service rates; the servers do not have them predefined. There are some number of FIFO queues n, one for each type of ingress packet flow and they are drained by one server each, representing a unimolecular reaction. In the control plane, each queue is represented by an input species X_{i} and product species X_{i}^{*}. The chemical reaction network lives abstractly in the control plane, which is designed by a traffic engineer and can look like any digraph or network he wishes. 

Here we note the difference here between the prior sections, which dealt with physical flows modeled by a chemical reaction network, and moving the chemical reaction network to an abstract control plane. The queues now are not necessarily physically linked together, but we can choose to couple them abstractly to shape traffic. 

When a packet physically enters one of the queues, the control plane injects one instance of the corresponding molecule species into the abstract network. The scheduler described previously is implemented and eventually an instance of the output species is generated. Once this happens, the corresponding server in the forwarding plane physically processes the packet and dequeues the next. The advantage here is that the abstract molecules in the control plane have no payload, so implementation of this model only requires storing an integer value for each species that keeps track of the number of packets in each queue. This allows analysis of behavior at the design phase. 

In the simplest case, a unimolecular reaction X \to X^{*} in the chemical control plane acts like a low-pass filter to the packet flow, smoothing bursts with high frequency components. If the differential equation \dot{x} = \lambda-kx that approximates a unimolecular reaction is converted to the frequency domain via the Laplace transform, the transfer function F(s) has a cut-off frequency at k, the reaction constant:

F(s) = \frac{\mu(s)}{\lambda(s)} = \frac{k}{s+k}

That is, higher-frequency flows will be attenuated, much like dark glasses do with sunlight. Applying this filter at an ingress point of a network leads to less chaotic traffic patterns, but with a cost of a delay \frac{1}{k} and memory to buffer the packets. Therefore, the mean queue length for this single queue will grow proportionally with the delay and flow rate. That is, \hat{x} = \frac{\lambda}{k}.

Another consideration of the LoMA queues described by Meyer and Tschudin that differs from the standard M/M/1 queuing models is that the service rate is ultimately unbounded (for infinite capacity queues/networks), since it is proportional to the queue length. This is undesirable to allow in a network, and thus the authors borrow from biological systems and design an abstract enzymatic reaction to limit the rate of packet flow.

In biological systems, enzymes bind to reactant molecules X, called substrates in order to prevent a particular molecule from reacting immediately. Some amount of enzyme molecules E exist, and they can either exist free-form or bound in a complex (EX). The more enzyme molecules in bound form, the slower the rate of transmission grows for an increasing arrival rate. At equilibrium, the influx and efflux of substrate-enzyme complex molecules are equal according to Kirchoff’s Law, so 

k_{w}c_{X}c_{E} = k_{s}c_{EX}

Take a look at Figure 8 above in the chemical control plane to see this action. The number of enzymes is constant, so c_{E} + c_{EX} = e_{0}, which yields the Michaelis-Menten equation, expressing the transmission rate \mu in terms of the queue length c_{X}.

\mu = \nu_{\max}\frac{c_{X}}{K_{M} + c_{X}},

which yields a hyperbolic saturation curve. \nu_{\max} = k_{s}e_{0}, and K_{M} = \frac{k_{s}}{k_{w}} and specifies the concentration of X at which half of \nu_{\max} is reached. 

When the queue length at queue X is high, the transmission rate converges to \nu_{\max}, and behaves like a normal unimolecular reaction when queue length is short. 

The authors also extend this model to handle dynamic changes to the topology of the queuing network, which means that instances of queues and flow relations can be generated “on the fly,” as it were. Tschudin[13] has created an executable string and multiset rewriting system called Fraglets that allow for the implementation and running of protocols based on the ideas put forth thus far. They describe in the paper how to implement explicitly the enzymatic rate-limiter in the chemical control plane in Figure 8. In this implementation, rather than flow interactions being static and determined at the design phase, each fraglet (packet) sorts itself into a queue. After a packet is serviced, the header of a fraglet is treated as code, allowing a packet to determine its route comparable to active networking. The relationship between the abstract model and execution layer remains, which allows a mathematical model of the behavior of a Fraglets implementation to be generated automatically, and a queuing network to be design and then realized easily in Fraglets language. 


The final part of this work will discuss an application of artificial packet chemistry to the implementation of a congestion control algorithm, briefly discuss design motifs, and conclude. 


  1. Dittrich, P., Ziegler, J., and Banzhaf, W. Artificial chemistries – a review. Artificial Life 7(2001), 225–275.
  1. Feinburg, M. Complex balancing in general kinetic systems. Archive for Rational Mechanics and Analysis 49 (1972).
  2. Gadgil, C., Lee, C., and Othmer, H. A stochastic analysis of first-order reaction networks. Bulletin of Mathematical Biology 67 (2005), 901–946.
  3. Gibson, M., and Bruck, J. Effcient stochastic simulation of chemical systems with many species and many channels. Journal of Physical Chemistry 104 (2000), 1876–1889.
  4. Gillespie, D. The chemical langevin equation. Journal of Chemical Physics 113 (2000).
  5. Gillespie, D. The chemical langevin and fokker-planck equations for the reversible isomerizationreaction. Journal of Physical Chemistry 106 (2002), 5063–5071.
  6. Horn, F. On a connexion between stability and graphs in chemical kinetics. Proceedings of the RoyalSociety of London 334 (1973), 299–330.
  7. Kamimura, K., Hoshino, H., and Shishikui, Y. Constant delay queuing for jitter-sensitive iptvdistribution on home network. IEEE Global Telecommunications Conference (2008).
  8. Laidler, K. Chemical Kinetics. McGraw-Hill, 1950.
  9. McQuarrie, D. Stochastic approach to chemical kinetics. Journal of Applied Probability 4 (1967), 413–478.
  10.  Meyer, T., and Tschudin, C. Flow management in packet networks through interacting queues and law-of-mass-action-scheduling. Technical report, University of Basel.
  11.  Pocher, H. L., Leung, V., and Gilles, D. An application- and management-based approach to atm scheduling. Telecommunication Systems 12 (1999), 103–122.
  12. Tschudin, C. Fraglets- a metabolistic execution model for communication protocols. Proceedings of the 2nd annual symposium on autonomous intelligent networks and systems (2003).

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Exploiting Chemistry for Better Packet Flow Management 3: Formal Analysis

Exploiting Chemistry for Better Packet Flow Management 3: Formal Analysis

The previous two posts introduced the ideas of Meyer and Tschudin [11] involving the application and exploitation of chemical kinetic theory to flow management in computer networking. The first part introduced the ideas and gave an overview of the entire work, and the second part took a deeper look into the formal model of a packet chemistry. This section discusses the analysis options available once a packet chemistry model has been created. 

This section can also be skipped for those less interested in the formal mathematics. Suffice it to say that there are a multitude of already created methods now available for the elegant analysis of computer networks when modeled by an artificial packet chemistry.

Formal Analysis of Artificial Packet Chemistry

By representing packet flow in a computer network as an artificial chemistry, a multitude of analyses are available, from high to low granularity. The authors give a heavily brief survey (and a good bibliography) of works that can be utilized to analyze these networks pulled from the physics and chemistry literature. A particular advantage of this method is the ability to study the transient states of the network rather than just steady states. The authors also claim the ability to determine the stability of the network flow based only on topology, a heavy advantage in design.

Stochastic Analysis at the Microscopic Level

The stochastic behavior of chemical reaction networks is described by the chemical master equation[10] which takes the form
\frac{\text{d}\mathbf{P}}{\text{d}t} = \mathbf{A}\mathbf{P}

which is a differential equation describing the evolution of state probabilities for a system. Here the states are discrete, and time is continuous. The matrix \mathbf{A} describes the transition rates (which can also be kinetic or reaction rates), and the stochastic process described is a Markov jump-process Since we’re on a network, the Markov jump process exists in an \mathcal{S}-dimensional integer lattice. Some work has been done to analyze several classes of chemical reaction networks to find the steady-state probability distribution of the state space. For example, if the total number of packets in the network has a bound, and the network contains only first order (unimolecular to unimolecular) reactions, the steady state probability distribution for the lengths of the queues in the network is a multinomial distribution[3]. On the other hand, if the network is open (we allow packets to exit the network completely), then the steady state probability distribution of the lengths of the queues follows a product of Poisson distributions (which is also Poisson)[3]. (This is an extremely desirable property, called a product-form.)

Deterministic Approximations

This is the most common approach utilized in computer network analysis today, simply because networks are so large and complex that stochastic modeling becomes too cumbersome. Here, the average trajectory is represented by a system of ordinary differential equations, building a fluid model. One downside to this in the networking space is that the analysis of protocols by this method requires manual extraction from source code and accuracy is uncertain.

In the chemistry sector (and now in the packet chemistry model), obtaining a fluid approximation is not only easier, but shown to be accurate. There are links between the stochastic master equation to several approximations[5,6] including a deterministic ODE model. Gillespie[5] showed that the ODE model accurately predicts the network flow trajectory in many cases.

One thing the authors note here is that the ODE model can be directly and automatically generated from the network topology. For example, a single server with a single queue (M/M/1) is simply modeled as one chemical species X. The arrival rate (inflow) is \lambda, and the service rate is proportional to the queue length, so \mu = kx, where x is the queue length. Then we get a simple differential equation
\dot{x} = \lambda-kx
describing the change in queue length as the difference of inflow and outflow. In the steady state, \dot{x} = 0, which lets us look for a fixed point \hat{x} = \frac{\lambda}{k}. This is the steady-state queue length, which allows us to derive the expected waiting time T = \frac{1}{k}, showing that the latency of a packet under this model is independent of the arrival rate and fill level. This model when implemented automatically adjusts the service rate such that in the steady state, every packet sees the same latency.

It’s also important to determine just how stable this steady state is by analyzing the sensitivity of the network and states to perturbations. The authors list several citations to show that no new approaches are needed to do this; one can look to signal and control theory literature. In particular, a network designer would desire to predict the stability of a complex network by studying the topology as opposed to an analysis of the system of ODEs. Fortunately, modeling a network this way allows for the use of the Deficiency Zero Theorem for complex chemical networks that gives conditions for stability of steady-state[2,7].

The authors give a formal convergence proof that the example network above converges to a stable fixed point and is asymptotically stable, comparing it to the proof of a similar protocol Push-Sum (a gossip protocol in computer networks).


The next post in this series will discuss Meyer and Tschudin’s implementation of a scheduler based on the principles discussed thus far. 


  1. Dittrich, P., Ziegler, J., and Banzhaf, W. Artificial chemistries – a review. Artificial Life 7(2001), 225–275.
  1. Feinburg, M. Complex balancing in general kinetic systems. Archive for Rational Mechanics and Analysis 49 (1972).
  2. Gadgil, C., Lee, C., and Othmer, H. A stochastic analysis of first-order reaction networks. Bulletin of Mathematical Biology 67 (2005), 901–946.
  3. Gibson, M., and Bruck, J. Effcient stochastic simulation of chemical systems with many species and many channels. Journal of Physical Chemistry 104 (2000), 1876–1889.
  4. Gillespie, D. The chemical langevin equation. Journal of Chemical Physics 113 (2000).
  5. Gillespie, D. The chemical langevin and fokker-planck equations for the reversible isomerizationreaction. Journal of Physical Chemistry 106 (2002), 5063–5071.
  6. Horn, F. On a connexion between stability and graphs in chemical kinetics. Proceedings of the RoyalSociety of London 334 (1973), 299–330.
  7. Kamimura, K., Hoshino, H., and Shishikui, Y. Constant delay queuing for jitter-sensitive iptvdistribution on home network. IEEE Global Telecommunications Conference (2008).
  8. Laidler, K. Chemical Kinetics. McGraw-Hill, 1950.
  9. McQuarrie, D. Stochastic approach to chemical kinetics. Journal of Applied Probability 4 (1967), 413–478.
  10.  Meyer, T., and Tschudin, C. Flow management in packet networks through interacting queues and law-of-mass-action-scheduling. Technical report, University of Basel.
  11.  Pocher, H. L., Leung, V., and Gilles, D. An application- and management-based approach to atm scheduling. Telecommunication Systems 12 (1999), 103–122.
  12. Tschudin, C. Fraglets- a metabolistic execution model for communication protocols. Proceedings of the 2nd annual symposium on autonomous intelligent networks and systems (2003).

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.