Browsed by
Author: Rachel Traylor

Exploiting Chemistry for Better Packet Flow Management 2: Formal Model

Exploiting Chemistry for Better Packet Flow Management 2: Formal Model

This post is the second breaking down a report/review of a technical report by Meyer and Tschudin [11] that modifies the formal notion of an artificial chemistry and creates an artificial packets chemistry with the goal of designing better flow management by exploiting the natural behavior of chemical reactions. 
Note: for those more interested in the application and implementation review and discussion, this section can be skipped. 

Formal Model of Artificial Packet Chemistry

Artificial Chemistry

The notion of an formal artificial chemistry has been around for some time. There is an excellent paper by Dittrich, Ziegler, and Banzhaf that gives a survey of the work in this area[1]. Put simply, an artificial chemistry is a tuple (\mathcal{S},\mathcal{R},\mathcal{A}) where \mathcal{S} = \{s_{1},s_{2},\ldots s_{n}\} is a set of all valid molecules, \mathcal{R} is a set of rules r that describe interactions between molecules, and \mathcal{A} is the reactor algorithm that determines how the set of rules in \mathcal{R} is applied to a collection of molecules termed \mathcal{P}. \mathcal{P} may be a reaction vessel, reactor, or “soup” (as Dittrich et al call it). It’s also notable that \mathcal{P} cannot be identical to \mathcal{S}.1
To expand a bit more, we’ll note that the rules r \in \mathcal{R} all take the form
s_{1} + s_{2} + \ldots s_{n}\longrightarrow \tilde{s}_{1}+\tilde{s}_{2}+\ldots \tilde{s}_{m}
where all s, \tilde{s} \in \mathcal{S}. These rules are fairly abstract, and don’t explicitly seem to describe just a reactant to product type reaction. These can be collisions or other types of interactions. The set \mathcal{S} of valid molecules presumably can be partitioned into disjoint subsets of different species of molecule as well, though the representation in [1] is more general. 
Regarding the reactor algorithm \mathcal{A}, Dittrich et al [1]  give several different descriptions/approaches by which it can be defined, depending on whether each molecule is treated explicitly, or all molecules of a type are represented by a single number (frequency or concentration):
  1. Stochastic Molecular Collisions.  Every single molecule is worked with, where a sample of molecules from the reaction vessel \mathcal{P} is drawn and the algorithm checks to see if a particular rule r \in \mathcal{R} applies. 
  2. Differential Rate Equations: This approach seeks to describe the dynamics of a chemical system using concentrations of molecular species. The rules under this algorithm take a species approach:  r: a_{1}s_{1} + a_{2}s_{2} + \ldots a_{N}s_{N} \longrightarrow b_{1}s_{1} + b_{2}s_{2} + \ldots + b_{N}s_{N}
    Here, the s_{i}‘s are species, not individual molecules. The coefficients are stoichiometric factors of the reaction. They are simply indicator functions to denote whether species s_{i} is a reactant or product. That is a_{i} = 1 if and only if s_{i} is a reactant in the rule r, and b_{i} = 1 if and only if s_{i} is a product in the rule r. It is this form of \mathcal{A} that Meyer and Tschudin [11] utilize in their packet chemistry. 
    The change of overall concentration (concentration denoted c_{s_{i}}) is given by a system of differential equations
    \frac{\text{d}c_{s_{i}}}{\text{d}t} = (b_{i}-a_{i})\prod_{j=1}^{N}c_{s_{j}}^{a_{j}}, \quad i=1,\ldots,N
    according to the Law of Mass Action discussed earlier. There may be multiple rules/reactions r \in \mathcal{R} that affect the concentration of species s_{i}, so 
    \frac{\text{d}c_{s_{i}}}{\text{d}t} = \sum_{r\in \mathcal{R}}\left[(b_{i}^{r}-a_{i}^{r})\prod_{j=1}^{N}c_{s_{j}}^{a_{j}^{r}}\right], \quad i=1,\ldots,N
  3. Others: There are other options, such as metadynamics (where the number of species and thus differential equations may change over time), mixed approaches, or symbolic analysis of the differential equations. As this article would be far too cumbersome to discuss these, they are omitted, but may be found in [1].
According to Dittrich et al [1], the reactor algorithm \mathcal{A} depends on the representation of the elements of s_{i} and thus the population. Meyer and Tschudin [11] utilize the second approach, though they do not explicitly state this. 

Artificial Packet Chemistry

Meyer and Tschudin adapt the artificial chemistry in the previous section to suit their queuing networks in a given computer network. They add an element \mathcal{G} to the artificial chemistry tuple to get an artificial packet chemistry PC = (\mathcal{G},\mathcal{S},\mathcal{R},\mathcal{A}), where \mathcal{G} is the digraph that gives the topology of the computer network. \mathcal{G} consists of a set of nodes V_{\mathcal{G}} which represent the chemical reaction vessels. (These were the \mathcal{P} in the previous section.), and E_{\mathcal{G}} is the set of directed arcs that represent network links connecting adjacent nodes.2
Here, since a molecular species is the analogue of a queue, \mathcal{S} = \cup_{i \in V}\{S_{i}\}. Here, \{S_{i}\} is a set of all queue instances in a particular node i. At this point, some discussion on clarity is warranted. It is possible to have more than one queueing instance (more than one molecular species) inside each reaction vessel (here the nodes of the network). I don’t think this is meant to be a disjoint union, since a reaction species can show up in more than one reaction vessel, so there may be repeats of certain species in this representation of \mathcal{S} when written this way. Perhaps it’s just a nitpick, but it’s worth mentioning.  
\mathcal{R} = \cup_{i \in V}\{R_{i}\} gives all the flow relations among the queues. Here the rules r take form 2 of \mathcal{A}:
r \in R_{i}: \sum_{s \in S_{i}}a_{s,r}s \longrightarrow \sum_{s \in S_{i} \cup \{S_{j}\} : j \in N_{i}}b_{s,r}s
The reaction rules basically describe what’s going on in a particular reaction vessel. We can send packets to neighboring nodes/vessels (N_{i} is the notation for the neighborhood of node i, or the set of adjacent nodes), or we can keep packets in the same node after the reaction is done. The reactions that send packets to neighboring nodes are transmissions. 
The mean reaction rate \nu_{r} of each reaction is given by the Law of Mass Action as applied to forward reactions:
\nu_{r} = k_{r}\prod_{s\in S}c_{s}^{a_{s,r}}
just as described in the previous section. 

Figure 3 from Meyer and Tschudin [11] gives an explicit example to help solidify these abstract ideas. The network consists of 4 nodes, so V = \{n_{1}, n_{2}, n_{3}, n_{4}\}. Each node has a bidirectional link with its neighbors, so E = \{n_{1}n_{2}, n_{2}n_{1}, n_{2}n_{3}, n_{3}n_{2}, n_{2}n_{4}, n_{4}n_{2}, n_{3}n_{4}, n_{4}n_{3}\}. In this case, we only have one species of molecule (one queue) per node, so \mathcal{S} = \{X_{1}, X_{2}, X_{3}, X_{4}\}. The set of reactions is simply a first-order reaction per arc: \mathcal{R} = \{r_{a,b}: X_{a} \to X_{b}: ab \in E\}

From a review standpoint, I would have liked to see a less trivial example, such as one with multiple queues in a node, and rules that may keep packets in a node instead of just transmitting. These types of scenarios would be interesting to model this way, and demonstrate better the power of this approach.


The next post in the series will discuss the mathematical analyses of the artificial packet chemistry described here. 



  1. Dittrich, P., Ziegler, J., and Banzhaf, W. Artificial chemistries – a review. Artificial Life 7(2001), 225–275.
  1. Feinburg, M. Complex balancing in general kinetic systems. Archive for Rational Mechanics and Analysis 49 (1972).
  2. Gadgil, C., Lee, C., and Othmer, H. A stochastic analysis of first-order reaction networks. Bulletin of Mathematical Biology 67 (2005), 901–946.
  3. Gibson, M., and Bruck, J. Effcient stochastic simulation of chemical systems with many species and many channels. Journal of Physical Chemistry 104 (2000), 1876–1889.
  4. Gillespie, D. The chemical langevin equation. Journal of Chemical Physics 113 (2000).
  5. Gillespie, D. The chemical langevin and fokker-planck equations for the reversible isomerizationreaction. Journal of Physical Chemistry 106 (2002), 5063–5071.
  6. Horn, F. On a connexion between stability and graphs in chemical kinetics. Proceedings of the RoyalSociety of London 334 (1973), 299–330.
  7. Kamimura, K., Hoshino, H., and Shishikui, Y. Constant delay queuing for jitter-sensitive iptvdistribution on home network. IEEE Global Telecommunications Conference (2008).
  8. Laidler, K. Chemical Kinetics. McGraw-Hill, 1950.
  9. McQuarrie, D. Stochastic approach to chemical kinetics. Journal of Applied Probability 4 (1967), 413–478.
  10.  Meyer, T., and Tschudin, C. Flow management in packet networks through interacting queues and law-of-mass-action-scheduling. Technical report, University of Basel.
  11.  Pocher, H. L., Leung, V., and Gilles, D. An application- and management-based approach to atm scheduling. Telecommunication Systems 12 (1999), 103–122.
  12. Tschudin, C. Fraglets- a metabolistic execution model for communication protocols. Proceedings of the 2nd annual symposium on autonomous intelligent networks and systems (2003).

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Exploiting Chemistry for Better Packet Flow Management 1: Introduction

Exploiting Chemistry for Better Packet Flow Management 1: Introduction

Perhaps the phrase “don’t reinvent the wheel” is overused. However, many newer disciplines, particularly in the technology sector, seem to insist on it. One thing physical engineers learned long ago was to study the world around them, work with it, and emulate it in their designs. Network engineering should be no different. In a technical report from 2011, authors Thomas Meyer and Christian Tschudin from the University of Basel describe a highly elegant natural flow management method [11] that exploits much of the hard work done in the chemistry sector in chemical kinetics. They describe a scheduling approach that creates an artificial chemistry as an analogue of a queueing network, and uses the Law of Mass Action to schedule events naturally. The analysis of such networks utilizing their implementation is simplified regardless of the depth one wishes to go into. In addition, they show a congestion control algorithm based on this framework is TCP fair and give evidence of actual implementation (a relief to practitioners).

This report will discuss their paper at length with a goal of covering not only their work, but the underlying ideas. Since the paper requires knowledge of chemical kinetics, probability, queueing theory, and networking, it should be of interest to specialists in these disciplines, but a comprehensive discussion of the fundamentals glossed over in the paper would make dissemination more likely.

Note: the full pdf of the report is attached to each section of this series at the bottom. 

Overall Review of the Paper

The paper is well written, readable, and quite clear despite the breadth of scope it contains. It’s a major benefit to the authors and readers that their theoretical model has been tested in a proof-of-concept network. The work shows much promise for the networking space as a whole.

Overview of the Paper and Chemistry Basics

Just as in chemistry and physics, packet flow in a network has microscopic behavior controlled by various protocols, and macro-level dynamics. We see this in queueing theory as well–we can study (typically in steady-state to help us out, but in transient state as well) the stochastic behavior of a queue, but find in many cases that even simple attempts to scale the analysis up to networks (such as retaining memorylessness) can become overwhelming. What ends up happening in many applied cases is a shift to an expression of the macro-level properties of the network in terms of average flow. The cost of such smoothing is an unpreparedness to model and thus deal effectively with erratic behavior. This leads to overprovisioning and other undesirable and costly design choices to mitigate those risks.

Meyer and Tschudin have adapted decades of work in the chemical and physical literature to take advantage of the Law of Mass Action, designing an artifical chemistry that takes an unconventional non-work-conserving approach to scheduling. Non-work-conserving queues add a delay to packets and have tended to be avoided for various reasons, typically efficiency. Put simply, they guarantee a constant wait time of a packet regardless of the number of packets in a queue by varying the processing rate with fill level of the queue. The more packets in queue, the faster the server processes those packets.

Law of Mass Action in Chemistry

If we have some chemical reaction with reactants A_{1}, A_{2},\ldots, A_{n} and products B_{1}, B_{2}, \ldots, B_{m}, and the reaction is only forward1, then we may express the reaction as

A_{1} + A_{2} + \ldots + A_{n} \longrightarrow_{k} B_{1} + B_{2} + \ldots + B_{n}

where k is the rate constant. In a simple reaction A \to P, with P as the product, we can see the rate expressed nicely in a very basic differential equation form[9]:

-\frac{\text{d}c_{A}}{\text{d}t} = k\cdot c_{A}

This should actually look somewhat similar to problems seem in basic calculus courses as well. The rate of change of draining the reactant is a direct function of the current concentration.

The reaction rate r_{f} of a forward reaction is proportional to the concentrations of the reactants:

r_{f} = k_{f}c_{A_{1}}c_{A_{1}}\cdots c_{A_{N}}

for a set of reactants \{A_{i}\}.

The Queuing Analogue and Assumptions

Meyer and Tschudin[11] express the networking version of these chemical reactions in a very natural way. Packets are molecules. A molecular species is a queue, so molecules of species X go into queue X. The molecular species is a temporary buffer that stores particular packets types until they are consumed by a reaction (processed by some server in the queueing space). FIFO (first-in-first-out) discipline is assumed.


The figure above from the technical report shows how a small system of reactions looks in the chemical space and the queuing space. Where analysis and scheduling can get complicated is in the coupled nature of the two reactions. The servers both drain packets from queue Y, so they are required to coordinate their actions in some way. It’s important to note here that this equivalence rests on treating the queuing system as M/M/1 queues with a slightly modified birth-death process representation.

Typically, in an M/M/1 queue, the mean service rate is constant. That is, the service rate is independent of the state the birth-death process is in. However, if we model the Law of Mass Action using a birth-death process, we’d see that the rate of service (analogously, the reaction rate) changes depending on the fill-level of the queue (or concentration of the reactant). We’ll investigate this further in the next sections, discussing their formal analysis.

Related Work and Precedent

The authors noted that adding packet delay is not unheard of in the networking space. Delay Frame Queuing[12] utilizes non-work-conserving transmission at edge nodes in an ATM network in order to guarantee upper bounds on delay and jitter for virtual circuits. Three other researchers in 2008 (Kamimura et al) proposed a Constant Delay Queuing policy that assigns a constant delay to each packet of a particular priority stream and forward other best-effort packets during the delay[8].


The next article will discuss the formal mathematical model of an artificial packet chemistry. 



  1. Dittrich, P., Ziegler, J., and Banzhaf, W. Artificial chemistries – a review. Artificial Life 7(2001), 225–275.
  1. Feinburg, M. Complex balancing in general kinetic systems. Archive for Rational Mechanics and Analysis 49 (1972).
  2. Gadgil, C., Lee, C., and Othmer, H. A stochastic analysis of first-order reaction networks. Bulletin of Mathematical Biology 67 (2005), 901–946.
  3. Gibson, M., and Bruck, J. Effcient stochastic simulation of chemical systems with many species and many channels. Journal of Physical Chemistry 104 (2000), 1876–1889.
  4. Gillespie, D. The chemical langevin equation. Journal of Chemical Physics 113 (2000).
  5. Gillespie, D. The chemical langevin and fokker-planck equations for the reversible isomerizationreaction. Journal of Physical Chemistry 106 (2002), 5063–5071.
  6. Horn, F. On a connexion between stability and graphs in chemical kinetics. Proceedings of the RoyalSociety of London 334 (1973), 299–330.
  7. Kamimura, K., Hoshino, H., and Shishikui, Y. Constant delay queuing for jitter-sensitive iptvdistribution on home network. IEEE Global Telecommunications Conference (2008).
  8. Laidler, K. Chemical Kinetics. McGraw-Hill, 1950.
  9. McQuarrie, D. Stochastic approach to chemical kinetics. Journal of Applied Probability 4 (1967), 413–478.
  10.  Meyer, T., and Tschudin, C. Flow management in packet networks through interacting queues and law-of-mass-action-scheduling. Technical report, University of Basel.
  11.  Pocher, H. L., Leung, V., and Gilles, D. An application- and management-based approach to atm scheduling. Telecommunication Systems 12 (1999), 103–122.
  12. Tschudin, C. Fraglets- a metabolistic execution model for communication protocols. Proceedings of the 2nd annual symposium on autonomous intelligent networks and systems (2003).

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Expecting the Unexpected: Borel’s Paradox

Expecting the Unexpected: Borel’s Paradox

One of the best ways to shorten a proof in statistics or probability is to use conditioning arguments. I myself have used the Law of Total Probability extensively in my work, as well as other conditioning arguments in my PhD dissertation. Like many things in mathematics, there are subtleties that, if ignored, can cause quite a bit of trouble. It’s a theme on which I almost feel like I sound preachy, because subtlety, nuance, and deliberation followed by cautious proceeding is about as old-fashioned as my MS Statistics1

One particularly good paper that discusses this was written by Michael Proschan and Brett Presnell  in The American Statistician  in August 1998 titled “Expect the Unexpected from Conditional Expectation”. In it, the authors noted the following seemingly innocuous question posed on a statistics exam

If X and Y are independent standard normal random variables, what is the conditional distribution of Y given Y=X?

There are three approaches to this problem.

(1) Interpret the statement that Y=X by declaring a new random variable Z_{1} = Y-X where Z_{1}=0.

Here, the argument proceeds as follows:

Y and Z_{1} have a bivariate normal distribution with \mu = (0,0), \sigma_{Y}^{2}=1, \sigma_{Z_{1}}^{2}=2, and correlation \rho = \tfrac{1}{\sqrt{2}}. Thus, we know that the conditional distribution of Y given Z_{1}=0 is itself normal with mean \mu_{Y}+\rho\frac{\sigma_{Y}}{\sigma_{Z_{1}}}(0-\mu_{Z_{1}})=0 and variance \sigma_{Y}^{2}(1-\rho^{2}) = \tfrac{1}{2}. Thus, the conditional density is 

f(y|Z_{1}=0) = \frac{1}{\sqrt{2\pi}}e^{-y^{2}}

This was the expected answer. However, one creative student did a different argument:

(2) Interpret the statement that Y=X by declaring a new random variable Z_{2} = \tfrac{Y}{X} where Z_{2}=1.

This student used the Jacobian method 2 and transformed the variables via declaring s=y, z_{2}=\tfrac{y}{x} and finding the joint density of S and Z_{2}. The marginal density for S was then found by dividing the joint density by the marginal density of Z_{2} evaluating at z_{2}=1. The reason for the ratio is that the marginal density of Z_{2} being the ratio of independent standard normal random variables has a Cauchy distribution3. This student’s answer was 

f(y|Z_{2}=1) = |y|e^{-y^{2}}

Not the expected answer. This is a correct interpretation of the condition Y=X, so the calculations are correct. There was a third different answer, from a student who had taken a more advanced probability course.

(3) Interpret the statement Y=X as Z_{3} = 1 where Z_{3} = \mathbb{I}[Y=X]

Here \mathbb{I}(\cdot) is the indicator function, where the variable is 1 if the condition is met, and 0 otherwise. The argument here is that Y and Z_{3} are independent. Why? Z_{3} is a constant zero with probability 1. A constant is independent of any random variable. Thus the conditional distribution of Y given Z_{3}=1 must be the same as the unconditional distribution of Y, which is standard normal.

This is also a correct interpretation of the condition. 

From the paper, “At this point the professor decided to drop the question from the exam and seek therapy.”

What happened?

At this point, both the paper did and we shall revisit exactly what conditional probability means. Suppose we have continuous random variables X and Y. We’ll usually write expressions like \mathbb{P}(Y\leq y|X=x) or \mathbb{E}(Y|X=x). However, an acute reader might already ask the question about conditioning on sets of probability 0. For a continuous random variable, the probability that we land on any specific real value x is indeed 0, which hearkens back to the measure-theoretic basis of probability. As it turns out, this little subtlety is indeed the culprit.

Formal definition of conditional distribution and conditional expectation

First, we take a look at the formal definition of conditional expectation

Definition.  A conditional expected value \mathbb{E}(Y|X) of Y given X is any Borel function g(X) = Z that satisfies 

\mathbb{E}(Z\mathbb{I}(X \in B)) = \mathbb{E}(Y\mathbb{I}(X \in B))

for every Borel set B. Then g(x)4 is the conditional expected value of Y given X=x and we can write \mathbb{E}(Y|X=x)

What this means is that the conditional expectation is actually defined as a random variable whose integral over any Borel set agrees with that of X5. Of import here is the fact that the conditional expectation is defined only in terms of an integral. From Billingsley, 1979, there always exists at least one such Borel function g, but the problem here is that there may be infinitely many. Each Borel function g that satisfies the above definition is called a version of \mathbb{E}(Y|X)

So let’s say we have two versions of \mathbb{E}(Y|X), called Z_{1} = g_{1}(X) and Z_{2} = g_{2}(X). Then we have that \mathbb{P}(Z_{1}=Z_{2})=1. Still seems pedantic, but if we look at this from a measure-theoretic perspective, this means that two versions are equal except on a set N of x such that \mathbb{P}(X \in N)=0.

What does this have to do with conditional distributions? 

For each y we fix we can find some function6 G_{y}(x) = G(y|x) such that 

G(y|x) = \mathbb{P}(Y\leq y|X=x)

In other words7, G(y|X) is a version of \mathbb{E}(\mathbb{I}(Y\leq y)|X). This last expression here is a conditional distribution of Y given X=x

Notice I said “a” conditional distribution, and not “the” conditional distribution. The words were chosen carefully. This leads us into the Borel Paradox, and the answer to why all three answers of that exam question are technically correct. 

The Borel Paradox

Also known as the Equivalent Event Fallacy, this paradox was noted by Rao (1988) and DeGroot (1986). If we attempt to condition on a set of probability (or measure) 0, then the conditioning may not be well-defined, which can lead to different conditional expectations and thus different conditional distributions. 

In the exam question, Z_{1}, Z_{2} and Z_{3} are all equivalent events. The fallacy lies in assuming that this equivalence would mean that conditioning on, say, \{Y-X=0\} is the same as conditioning on the event \{Y/X=1\}. In almost all8 cases, this is true. If the events in question have nonzero probability, then the professor’s assumption was true. However, when we’re conditioning on events that have zero probability, the classic Bayes’s formula interpretation doesn’t hold anymore, because the denominator is now 0.9

If we have an event like \{Y=X\}, then from the previous section we know that there are infinitely many versions of this event. We have to think of conditioning on a random variable, not just a value. There were three versions given above:

  • Z_{1} = Y-X, which has value z_{1}=0
  • Z_{2} = Y/X, with value z_{2}=1
  • Z_{3} = \mathbb{I}(Y=X) with value z_{3} =1

Proschan and Presnell dig a bit deeper and discuss the details here on exactly where these equivalent interpretations diverge10 and yield different conditional distributions. They also discuss the substitution fallacy, which again notes the consequences of having infinitely many versions of \mathbb{E}(Y|X), and why the convolution argument typically given to derive the distribution of the sum of independent random variables is nowhere near as air-tight as it appears.11

What’s the solution?

The Borel Paradox reared its ugly head because different yet correct interpretations of the conditioning events/sets of probability 0 yielded different conditional expectations and different conditional distributions. The core of the problem was those sets having probability 0. How do we avoid this? We actually would like to calculate \mathbb{E}(Y|X=x) (and conditional distributions that result from it), so how do we get around this paradox of infinite versions?

We take the analysts’ favorite trick: sequences and limits. Take a sequence of sets (B_{n}) where B_{n} \downarrow x. Now we’re only ever actually computing \mathbb{E}(Y|X \in B_{n}) where \mathbb{P}(X \in B_{n})>0. Now define

\mathbb{E}(Y|X=x) = \lim_{n\to\infty}\mathbb{E}(Y|X \in B_{n})


(I admit I’m seriously sidestepping some careful considerations of convergence and choices of sequence here. There is more subtlety and consideration in the analytic study of sequences of sets, but I don’t want to push too far down the rabbit hole here. The point is that we can avoid the paradox with care.)


Subtleties and paradoxes occur all over mathematics. This doesn’t mean mathematics is broken, that all statistics are lies, or any other variation of frustrated exasperation I’ll hear when discussing these. What these subtleties, fallacies, and paradoxes do show is that careful consideration is paramount to the study, practice, and application of mathematics. 


  1. Billingsley P,. (1979), Probability and Measure(1st ed.),New York:Wiley
  2. Casella, G. and Berger, R.(2002) Statistical Inference (2nd ed.), Wadsworth
  3. DeGroot,M.H. (1986), Probability and Statistics (2nd ed.), Reading,MA: Addison-Wesley.
  4. Proschan, M.A. and Presnel, B. (1998), “Expect the Unexpected from Conditional Expectation”, American Statistician, 48, 248-252
  5. Rao,M.M.(1988),”Paradoxes in Conditional Probability,” Journal of Multivariate Analysis,27, 434-446
Using Boolean Algebra to Find all Maximal Independent Sets in a Graph

Using Boolean Algebra to Find all Maximal Independent Sets in a Graph

Graph theory may be one of the most widely applicable topics I’ve seen in mathematics. It’s used in chemistry, coding theory, operations research, electrical and network engineering, and so many other places. The subject is mainly credited to have begun with the famous  Seven Bridges of Königsberg problem posed by Leonard Euler in 1736. Frank Harary should also be credited with his massive work in bringing applications of graph theory to the sciences and engineering with his famous textbook written in 1969. 

My own research forced me to stumble into this area once my  research partner, Jason Hathcock, suggested we explore the idea of viewing dependency relations in the sequences of variables we were studying as digraphs. Since then, I’ve been buried in graph theory texts, finding a wealth of fascinating topics to explore.

Of this article’s particular interest is finding all maximally independent sets in a graph using Boolean algebra. 

What’s a maximally independent set?

Firstly, what’s an independent set? 

Definition (Independent Set): A set of vertices of a graph is independent if no two vertices in the set are adjacent. 

If we take a look at the digraph above (from our paper on vertical dependence), and look at the underlying graph1, \{1,6,11\} form an independent set, as an example. There are lots more, and of varying sizes. Of particular interest here are maximal independent sets.

Definition:(Maximal Independent Set): An independent set to which no other vertex in the graph can be added to retain the independence property

An example from the graph above is \{2,3,4,5,13\}. If we added any other vertex to that set, it would be adjacent to some vertex already in there. 

A few notes:

(1) There are many maximal independent sets in a graph, and they may not all have the same cardinality. 

(2) Maximal and maximum are not the same thing. An independent set may be a maximal independent set without being the largest independent set in the graph. The largest cardinality among all the maximal independent sets is called the independence number of the graph and is denoted \beta(G).

Why do we care about maximal independent sets?

Of the many applications that arise, one in particular is in coding theory. We want to find the largest error correcting codes we can, particularly in internet transmissions that can lose packets. A paper discussing this can be found here. (Paywall warning). We’ve discussed some basics of coding theory on this site as well. Finding error correcting codes with desirable properties is equivalent to solving the problem of finding maximal independent sets. The purpose of this article isn’t to discuss the applications here, but I’ve learned long ago that no one will keep reading unless I mention at least one application. 

Finding a maximal independent set

Dependency Graph for a monotonic example

Finding a maximal independent set is relatively simple. Start with any vertex v \in V(G). Add another vertex u that is not adjacent to v. Continue adding vertices that are not adjacent to any already in the set. For a finite graph2, this process will terminate and the result will be a maximally independent set. 

Will it be one of largest cardinality? Not necessarily. 

For example, using one more of our dependency graphs generated by \alpha(n) = \sqrt{n}, we can take the order to be 24 as shown, and build a maximal independent set starting with vertex 3. Note that none of vertices 9-15 or 1 can be in the set, since they’re all adjacent to vertex 3. Vertex 2 is not adjacent to vertex 3, so we add it into our set: V = \{2,3\}. Now, the next vertex we add can’t be adjacent to either 2 or 3, so that rules out 1, 9-15, and 4-8. Grab vertex 16. Now V = \{2,3,16\}. Notice that none of the remaining vertices are adjacent to any of the previous vertices. Continuing this process, we’ll get that V = \{2,3,16,17,18,19,20,21,22,23,24\}. Notice that if we add any other vertices to this set, they’ll be adjacent to something already in it. 

Finding all Maximal Independent Sets

We’re rarely interested in just finding one maximal independent set. We’d prefer to find them all, and doing it by inspection is not very palatable. The heart of the article is an admittedly not optimal but still interesting way to find all maximal independent sets for reasonably small graphs. 

Image credit:

We’ll illustrate the method on the 6-node graph above. 

Getting started

First, we’ll assign a Boolean variable to each vertex according to its inclusion in a maximal independent set. For example A = 1 implies A is in the maximal independent set. Recall from Boolean algebra that 

x+y = \left\{\begin{array}{lr}1, & x = 1 \text{ or } y = 1 \text{ or } (x=y=1)\\0,&x=0 \text{ and } y=0\end{array}\right.

Remark: x+y is just another way of writing a union. This isn’t addition mod 2 here. 

xy=\left\{\begin{array}{lr}1, & x = 1 =y\\0,&\text{ otherwise}\end{array}\right.

What we’ve done here is set up inclusion into our maximal independent sets in a Boolean fashion. So x+y = 1 corresponds to the inclusion of either vertex x OR vertex y OR both vertices x and y. Similarly, xy = 1 corresponds to the inclusion of both vertices x and y.

Now, we can express an edge of a graph as a Boolean product xy, where x and y are the vertices at either end of the edge. 

Finally, set up the sum of all edges and call it \phi:

\phi = \sum xy \text{ for all } (x,y) \in G

For our graph above,

\phi = AB + AD + AE + BC + CE + CF + DE + EF

Why did we do this?

For a vertex to be in an independent set, it can’t be adjacent to any other vertices in the set. Put another way, for each edge, we can only have at most one of the vertices that make it up. If we include A in the independent set V, then B cannot be in there. 

Returning to our \phi, note that its value under Boolean algebra can only be 0 or 1. If \phi = 1, then at least one edge has both of its vertices “on”. This means, only combinations of A, B, C, D, E, F that yield \phi = 0 will give us a maximally independent set. 

Solving the problem

Our goal now is to find all combinations of our Boolean vertex variables that yield \phi = 0. As it turns out, solving this directly is pretty annoying3. If we want \phi = 0, that’s logically equivalent to seeking \phi^{c} = 1, where \phi^{c} is the Boolean complement (or negation) of \phi

Recall from Boolean algebra the following4:

\begin{aligned}(xy)^{c}&=x^{c}+y^{c}\\(x+y)^{c} &= x^{c}y^{c}\end{aligned}

So, if we take \phi^{c} for our graph above, 


What does the negation here actually mean? By taking the complement, instead of finding vertices to include, now we’re finding vertices to excludeWhen we multiply this expression out, we’ll get a sum of terms, where each term is a product of complements of our original Boolean variables. To get \phi^{c} = 1, all we need is one of those terms to be 1. To get a term to be 1, all members of the product must themselves be 1, meaning each term gives us a set of variables to exclude. Excluding these variables gives us one maximally independent set for each term, so this gives us all the maximally independent sets. 

The nice thing about dealing with Boolean arithmetic is that we can program a computer to do this for us. Any time we can invoke a relationship with Boolean algebra, we can enlist a friendly helpful computer. 

Finishing the example

We’ll do it by hand here, because I’m old-school like that. For larger graphs, obviously one would want to enlist some computational help, or just be very patient. We’ll remember a few other rules for Boolean algebra before we finish5:

\begin{aligned}xx &=x\\x+x &=x\\x +xy&=x\end{aligned}

After an insane amount of tedious Boolean algebra,

\phi^{c} = A^{c}C^{c}E^{c}+A^{c}B^{c}E^{c}F^{c}+A^{c}C^{c}D^{c}F^{c}+B^{c}C^{c}D^{c}E^{c}+B^{c}D^{c}E^{c}F^{c}

Recall that each term now tell us which sets of vertices to exclude from a maximal independent set. We negated the question logically. That means we have 5 maximal independent sets:

\{B,D,F\}, \{C,D\}, \{B,E\}, \{A,F\}, \{A,C\}

We can actually say what the independence number is as well, since we just have to find the maximum cardinality among the sets listed. For this graph, \beta(G) = 3.


I happened to find this interesting, and ended up obsessed with it for a day, much to the chagrin of my daily planner, which expected me to be working on writing my research monograph. I tried several different ways of solving this beyond the one given. I tried using the direct equation \phi, and tried using regular arithmetic on just \{0,1\}, setting up a sort-of structure function similar to the reliability block diagrams detailed here. 

I always hesitate to blame the method, and rather my own arithmetic errors, but I didn’t have much luck with the structure-function method, though I may try again to see if it’s an equivalent method. I believe it should be. 

Looking at \phi^{c} makes more sense after playing with this problem for some hours. The sum/union is quite nice, because it neatly separates out the various sets to exclude. It’s a better exploitation of Boolean algebra than trying to work with \phi but aiming for a sum of 0. I still think it should be possible to work with it directly, even if not advisable. If I decide to torture myself with it further, and end up with something to write about, perhaps I’ll append here.

I always end up ending my articles with some takeaway. I don’t have much of one here, except it was a curiosity worth sharing. Perhaps a decent takeaway is to reveal a bit of the tedium and dead-ends mathematicians can run into when exploring something. That’s just part of research and understanding. It’s entirely possible to spend hours, days, weeks on something and all you conclude is that the original method you saw is definitely superior than the one you were trying to develop. 

Beyond Cookbook Mathematics, Part 2

Beyond Cookbook Mathematics, Part 2

The previous article discussed the importance of definitions to mathematical thought. We looked at a definition (of an end-vertex in a graph), and picked it apart by finding multiple ways to look at it. We also directly used the definition in a practical manner to find “weak links” in a network. This time, we’ll look at the structure of mathematical theorems and proofs, and how we can read the proofs to understand the theorems. 

A mathematical theorem (informally) is a statement that takes the following structure:

IF {we have this stuff}, THEN {we get to say this about other stuff}.

Let’s pick this apart a bit even at the abstract level, because even this simple structure is important when we consider using theorems:

IF {we have this stuff}:

That IF is really important. You can’t move to the THEN without the IF part being true. Let’s say a particular theorem (say, the Central Limit theorem), has a set of requirements in the IF part. We have to have all of those parts satisfied before we can invoke the conclusion. I see data scientists just invoking “Central Limit Theorem” like a chant when they are analyzing a dataset. However, in many cases, their data does not fit the IF conditions given. The Central Limit Theorem requires, among other things, that the random variables be independent. No independence, no Central Limit Theorem. Do not move forward. 

Many complaints I hear, especially in statistics, revolve around “you can use statistics to say anything you like.” No. No you cannot. It only appears that way because practitioners are applying theorems blindly without ensuring the hypotheses (the IF bits) are all met. A theorem written and proven is not some magic silver bullet that allows for universal application and use. 


When we write (or read) a proof, we assume the IF part is true, and use those conditions in the IF to logically deduce the conclusions in the THEN. 

(A remark: I just described a particular method of proof—the direct proof. There are other methods by which we may prove a statement that are logically equivalent to a direct proof such as proof by contradiction, or proof by contrapositive. Since the idea here is to understand how all the parts of a theorem and proof work, we’ll stick with direct proofs here. We’re also discussing a particular type of logical statement—a one directional implication. We can have bidirectional statements as well, but these are an extension of understanding this first type, so we’ll start here.)

The first part discussing definitions is essential to understanding the IF part of our theorem and how this part connects to our “then” conclusion. Suppose we take the statement 

If a real-valued function f is differentiable, then f is continuous. 

We can use the statement without understanding. As long as we understand the definitions reasonably well (as in, we know how to tell if a function is differentiable), then we say it’s continuous and move on with our day. This isn’t good enough anymore. Why does differentiability imply continuity? A good proof should illuminate this for us. My suggestion at this point is to find a calculus book that proves this statement and have it open for this next part. I’m going to outline a series of thought-steps to consider as you read a proof.

(1) Write down definitions.

Write down the definition of differentiability, and that of continuity. Study both. Play with examples of differentiable and continuous functions. See if all the differentiable functions you play with are also continuous. Try to play with some weird ones. 

Examples: f(x) = x^{2}. Definitely differentiable. Definitely continuous. (I do suggest actually using the definitions here to really show that f fits these.)

f(x) = x+5. Fits nicely. Try some nonpolynomials now.

f(x) = \sin(x). Still good.

f(x) = e^{x}. Really nice. Love differentiating that guy.

As you’re playing with these, try to move between formally showing these functions fit the definitions and developing a visual picture of what it means for these functions to fit the definitions. (My high school calculus teacher, Andy Kohler, gave a nice intuitive picture of continuity—you shouldn’t have to pick up your pen to draw the function.) 

This step helps you visualize your starting point and your destination. Every single time I develop a theorem, this is the process I go through. This isn’t always short either. I’ve spent weeks on this part—exploring connections between my start and my target to develop an intuition. Often, this leads me to a way to prove the desired theorem. Occasionally I manage to create an example that renders my statement untrue. This isn’t a trivial or unimportant part, so I implore you to temper any building frustration with the speed of this bit. However, since here we’re focused on reading proofs rather than writing them, we’ll likely assume we’re working with a true statement.

(2) Now take a look at the proof. 

A good proof should take you gently by the hand and guide you through the author’s reasoning in nice, comfortable steps of logic. The author shouldn’t require you to fill in gaps or make huge leaps, and especially never require you to just take something on faith. 

The proof contains all the pieces we need for understanding. We just need to know how to read it. At some point in the proof, every single part of the IF should be invoked as a justification for a logical step. Find these parts. 

“Because f is differentiable, [STATEMENT]”

Study this part of the proof. Flag it. Do this for each time an IF condition is invoked. Make sure you can use the definition or condition invoked to make the leap the author does. Here I implore you to not just “pass over”. Really take the time to convince yourself this is true. Go back to the definition. Return to the proof. Again, this study isn’t necessarily quick.1 

Doing this for each line in the proof will help you see what’s going on between IF and THEN.

(3) Break things.

The last piece of advice I’ll give is one that was drilled into me by my graduate analysis professor at UTA, Dr. Barbara Shipman. Breaking things (or trying to) in mathematics is the best way to really cement your understanding. 

Go back to all those points you flagged in the proof where the IF conditions were invoked. Now imagine you don’t have that condition anymore. The proof should fail in some way. Either you end up with a false conclusion (“if I don’t have X anymore, then I definitely cannot have Y”), or you just end up stuck (“if I don’t have X anymore, I can’t say anything about Y”.) This failure point helps you understand why that condition was necessary in the IF. Doing this for each condition in the proof helps you see the interplay among all the conditions, and showing you how and why all those parts were needed to get you to your conclusion. 

None of this is trivial. Please don’t take the informal descriptions of this process to mean that you can breeze through and develop this skill as quickly as you memorized derivatives of functions. This is a skill that is developed over years. The reason I recommend using material you already know to study proofs is that it removes the challenge of learning new material while studying the “why”. There’s a reason mathematicians take calculus classes before real analysis courses (which spend time deeply developing and proving many things from calculus.) You’re familiar with how the subject works and that it works, then you can understand why it works. 

This happens in engineering too. We flew a plane before we understood airflow deeply, but due to our understanding of airflow, we were able to advance flight technology far beyond what I imagine early aviators and inventors even thought possible. There’s a beautiful feedback loop between mathematics, formal proofs, and engineering. Mathematical proofs aren’t just the dry formalisms we use and throw away—they’re the keys to understanding why things work the way they do. Developing the skill to read and understand these arguments is not a waste of time for an engineer; it will help propel engineering forward.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Beyond Cookbook Mathematics, Part 1

Beyond Cookbook Mathematics, Part 1

This post is due to the requests of several independent engineers and programmers. They expressed disappointment at their mathematics education and its failure to impart a deeper understanding of the formulas and algorithms they were taught to use. 

This also reflects my observations of teaching university mathematics over the years. I started as a TA (and frequent substitute lecturer) in 2008, and have taught all levels of calculus, basic statistics, and advanced undergraduate statistics thus far. I’ve certainly noticed even since 2008 a de-emphasis on proofs and “why” in favor of more examples, applications, and formulas in general. In fact, proofs were passed over in lectures in calculus courses meant for general engineers because “there wasn’t enough time”, or it was not considered something engineers needed to know. 

This attitude is actually fairly recent. Many of the older calculus books in my library were written for engineers in an undergraduate program, and these books are quite proof-heavy. Some examples are F. Hildebrand’s Advanced Calculus for Engineers (1949), Tom Apostol’s Calculus (Volumes 1 and 2), and R. Courant’s Differential and Integral Calculus (1938), to name a few. For a more modern text that still has some proof treatment, the 10th edition of Calculus: One and Several Variables, by Salas et al is a good resource. I used this one both as a student at Georgia Tech and an instructor.1. However, when I taught at University of Texas at Arlington as a grad student from 2013-2015 and then Foothill College in early 2018, the texts they chose to use was woefully inadequate to suit a college-level calculus course; proofs were nonexistent, and reasoning was thrown away in favor of contrived examples. The course was designed to steer engineers away from any proofs or thorough reasoning, showing the experience is somewhat widespread.2

I do not want to discuss the reasons for this departure; this isn’t an education site. What is important now is to discuss how to satisfy the desire of an engineer or other highly technical person using various mathematical topics/formulas to develop a deeper understanding of what he or she is doing. It wouldn’t be particularly helpful to simply suggest acquiring books and reading the proofs. 

The best thing an education can give is the ability to teach oneself through developing methods of logical thought and creativity. There are books on formal logic and mathematical proofs circulating, but they can be a bit daunting to start with, as they are quite abstract and discuss mathematical logic and proof theory in general.3What I’ll give here can be perhaps considered a friendly introduction as to how to begin using mathematical proofs to facilitate understanding of material. 

Start with what you know

Most of you who are reading this have likely taken a calculus course or two, and probably some basic linear algebra/matrix theory (especially if you’re in computer science). You’ve been exposed to limits, differentiation, continuity, determinants, and linear maps. You know more math than you think. We’ll use this material to begin learning how to deconstruct mathematical statements and arguments in order to understand how the pieces all fit together. (It’s really not unlike diagramming sentences.)

Pick a topic you know well. That way we’re not trying to introduce new material and learn how to read proofs at the same time. Dust off your old calculus book, or differential equations book. 


Definitions are the most important thing in mathematics, and perhaps the most ignored by those using it. “Continuous” has a meaning. “Differentiable” has a meaning. Spending time to really understand the definition of a mathematical term will provide an unshakable foundation. 

Example: Let’s take something visual: the degree of a vertex in a graph. 

You might see this definition of degree of a vertex:

Def. 1: The degree of a vertex v in a graph G is the number of edges connected to v

Here’s a graph from one of my research articles. I realize this is a digraph, but we can still say that every node except the first is an end-vertex.

This is fairly intuitive and straightforward. One way to go about understanding this definition is to find other equivalent ways to express it. For example, we know that if there’s an edge sticking out of a vertex v, there must be something on the other end of the edge. (Graphs don’t allow dangling edges.) Thus, we might reframe this definition as 

Def. 2: The degree of a vertex v is the number of vertices adjacent (connected to) v

If we go one step further and collect all the vertices adjacent to v into a set or bucket, and name that bucket the neighborhood of v, we can write one more equivalent definition of the degree of a vertex.

Def. 3: The degree of a vertex v is the number of vertices in (or cardinality of) the neighborhood of v. (Formally, mathematicians would say that the number of elements in a set is the cardinality of the set.)

Notice what we’ve done here. We’ve taken one simple definition and expressed it three equivalent ways. Each one of these gives us a slightly different facet of what the degree of a vertex is. We can look at it from the perspective of edges or vertices.

Let’s take this definition and use it in another definition.

An end-vertex or pendant vertex in a graph is a vertex of degree 1. 

New definition using our previously defined degree. But what does it really tell us here? Can we visualize this? If a vertex only has degree 1, then we know that only one edge sticks out of that vertex. Equivalently, we can also say that it has only one neighbor vertex adjacent to it. The size of its neighborhood is 1. We now can picture an end-vertex quite nicely. 

Using Definitions

Many applications rely on checking to see if a definition is satisfied. 

  • Is f(x) = \sin(x) continuous?
  • Do I have any end vertices in my network?

Here we are taking a specific example and looking to see if a definition is satisfied, typically because we know that (due to theorems) we get certain properties we either want (or maybe don’t want) if the definition is satisfied. 

For example, network engineers like to have resilient networks. By “resilient”, I mean that they’d like to be able to tolerate a link failure and still be able to send information anywhere on the network. Intuitively, it would be really bad if a particular link failure isolated a node/switch so that no information could reach it. 

Let’s try to frame that in mathematical terms. A network can be drawn as a graph, with circles representing nodes/switches/computers/whatever, and edges between nodes representing the physical links connecting them. We want to design a network so that a single link failure anywhere in the network will not isolate a node. 

We can mathematically represent a link failure by the removal of an edge. So we can take our graph representing our network, and start testing edges by deleting them to see if a node ends up isolated with no edges emanating from it. Or…we could return to a definition from earlier and think about this mathematically. 

If a single link failure isolates a particular node, then that means only one edge sticks out from it. That means that node has degree 1, by our first definition of degree. Thus, we may now conclude that this node also fits the definition of an end-vertex. 

Moving back to the practical space, we conclude: a vertex can only be isolated via a single-link failure if it’s an end-vertex. We can also write the statement the other way:  If a vertex is an end-vertex, then the deletion of its incident edge isolates it. 

Now, thanks to these definitions, and our understanding of them, we can find a way to spot all the end-vertices in a network. Since we have multiple ways of looking at this problem, we can find the way that suits us best.

Using definition 1, we can count the number of links from each node. Any node that has only one link connected to it is an end-vertex, and the failure of that link will isolate that node. 

Using definition 2, we can count the number of adjacent neighbors, especially if we have a forwarding table stored for each node. If any node only has one other node in its table, it’s an end-vertex, and the removal of the link connecting the two nodes would isolate our vertex. 

I used a fairly visual, practical definition and example here. Other definitions in mathematics can get fairly involved; I’ve spent hours simply picking apart a definition to understand it. But the strategy doesn’t change. The first step in developing a deeper understanding of mathematics is to pay attention to definitions–not just what they say, but what they mean. The next article will discuss how we use definitions to write theorems and understand proofs. 

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

The Hathlor Classification System

The Hathlor Classification System

Many researchers have their own libraries, and The Math Citadel is no different. Both Jason and I have spent many hours buried in the shelves of bookstores new and used, the stacks of university library shelves, and the rows of books in public libraries across four states now. During this time, we’ve amassed our own collection of new, used, out-of-print, and rare math and engineering texts. Some we acquired as part of our formal studies, most are acquired tangentially, and sometimes haphazardly. 

As we’re both very interested in organization and cataloguing in general, we noticed the different methods libraries tend to use to shelve their vast collections, particularly the technical books. Most university libraries we’ve visited utilize the more modern Library of Congress classification system, while public libraries tend to be a mix of the Dewey Decimal System and the Library of Congress system, varying by library. 

What we ultimately realized in our time exploring these shelves is that both of these methods for cataloguing books are insufficient for our needs. One of our biggest desires in a classification system was a way to determine some of the content of the text in a database or card catalogue listing, without necessarily having to pull the book from the shelf (which may be in storage). One other shortcoming we noticed is that neither system accounts for a “continuum-style” classification. As an example, many abstract algebra texts contain chapters on number theory, but some do not. Perhaps two different abstract algebra texts have different focus topics; one may focus more heavily on group theory, while another spends much of its pages discussing advanced matrix theory. The titles do not always reflect this, nor do the classification codes used by either the Dewey Decimal System nor the Library of Congress system.

So, we invented our own. Born of months of discussion and trial, we finally settled on a new one that takes the best aspects of both original systems, and also expands to include additional information. We’ll begin by briefly describing the two original classification systems, then describe our new Hathlor system. Our system is customizable to suit anyone’s needs, and we’re excited to share a passion project of ours. 

The Dewey Decimal System

This system is the one most are familiar with from school libraries. It was created by Melvil Dewey in 1876, and introduced a relative index that allowed for library expansion. Wikipedia has a great account of the history and development of the system, so we’ll not dive too deeply here; we’ll merely describe it. The system separates books by discipline, ten major classes overall, each divided into ten sections. It’s a hierarchical classification system that endeavors to classify a book as specifically as possible by adding classes after the decimal points. 

Wikipedia gives an example of a book on Finsler geometry:

  1. The book’s broadest class is 500-Natural Sciences and Mathematics 
  2. Next, it’s specifically a mathematics text (not physics or biology), which is the first section, so we are at 510
  3. Within mathematics, its topic is geometry, the 6th topic listen, thus 516
  4. Now we may move to decimal places to further classify as specifically as we like. This text is an analytic geometry, the third subtopic in geometry. Thus, we may now list the text as 516.3
  5. We wish to further classify, so among analytic geometries, this one discusses metric differential geometries, the 7th subdivision of analytic geometry. If the library classes the books to two decimal places, the code is 516.37
  6. We may even wish to subdivide metric differential geometry and move to three decimal places, finishing with 516.375–its final classification code.


This system allows very fine granularity and specificity in classification; one could decide to divide further and move to four, five, or more decimal places. It provides a total ordering, so shelving is sensical. 


One thing I personally don’t like about this system is the length the codes can reach without really offering tons more information. It breaks single topics down into very fine divisions if desired, but one would need to know the possible subdivisions to look at the code of a book and discern all that information. One other issue is that the book is restricted to one and only one class. Many mathematics and engineering texts don’t fit neatly into one class, division, or subdivision. A book on algebraic statistics would span abstract algebra and statistics, yet under the Dewey Decimal system, I’m not only forced to classify it under one or the other, but the “other” I don’t use disappears, and the text can get lost. A statistician might not always just browse the algebra shelf, yet this text might be of interest to him. 

Library of Congress

The LCC was developed by the Library of Congress, and is the most common system deployed in academic and research libraries in the United States. Herbert Putnam invented the system in 1897. For additional history, check here. This system has a larger number of broad classes, and uses letters to denote general classes and narrower topics. There are 21 broad classes, using all letters of the alphabet excepting I, O, W, X, and Y. Each broad class is divided into subclasses, though the number of subclasses may vary. For example, Class B (Philosophy, Psychology, and Religion) is divided into 15 subclasses, one of which is BV-Practical Theology. Class Q (Science) is divided into 12 subclasses, one of which is QA-Mathematics. 

Underneath these broad classes is a narrower topic of 4 digits, then a cutter number that represents the author, corporation, or title, and finally the last line notes the year of publication. 

As an example, from the University of Mississippi library

Title: Price Control under Fair Trade Legislation

Author: Ewald T. Grether

  1. This text falls under HF (Social Sciences → Commerce).
  2. The library codes it 5415, signaling Business → Marketing → General Works
  3. Next, the cutter number for Grether is .G67, and finally, the year of publication is 1939. 

In summary, the LCC call number would look like this:




There are more classes than the Dewey Decimal system.


We thoroughly dislike this system for a number of reasons. The primary concern is that looking at the call number on the spine of the book tells very little useful information, and gives some information that is irrelevant to evaluation of the content, such as the cutter number and publication year. It also shares the same issue as the Dewey Decimal system in that multiple subjects contained in a text are lost by forcing the text into one class.

In both of these systems, it’s difficult to determine where some more multidisciplinary or multitopic texts might be found without knowing the exact name of the text. It would be impossible to use either of these systems to locate some text that discusses graph theory and algorithms, but also mentions applications to chemistry. It is with this motivation that we devised our new system.

The Hathlor Classification Codes

The main goal of the new system was to allow a user without intimate familiarity with the minutest details of the scheme, but with knowledge of the specific subjects, to discern information from the spine or given code of a text. Neither LCC nor DDC give this benefit. We describe our system and reasoning here. 

Hierarchical Yet Lateral

We have divided material in the library into a hierarchical set of groups. The largest group a text can be a member of is the Subject. Our library currently contains 5 subjects, each given a sensical one or two-letter code:

Computer ScienceCS

Within each Subject are a number of Topics, where each topic has a three letter code. Again, these codes were designed sensibly, so that one can infer the topic from the three letter code (as opposed to needing to know that QA is Mathematics in the LCC system, for example). To illustrate, the table below shows how we divided Mathematics into ten Topics

Applied and Engineering Mathematics AEM
Differential/Integral EquationsDIE
Discrete MathematicDSC
Number TheoryNUT
Statistics/Probability TheorySPT

Finally, within each Topic are a varying number of Subtopics. Texts can contain many Subtopics, and in the spirit of the MAC address, we give a variable length binary code that indicates which Subtopics are contained. For example, the Topic of Fundamentals contains four Subtopics: Calculus in bit 1, Logic in bit 2, Set Theory in bit 3, and Trigonometry in bit 4. 

A general code will look like this:

Subject – Topic.XXXX

where the single-letter subject code is given, the three letter topic code is given, and the X is an indicator of whether or not a particular subtopic is contained in the work. Note that the number of X’s may vary by topic. 

Example: Strang’s Linear Algebra and its Applications is a basic linear algebra text covering fundamental matrix theory, Gaussian elimination, linear systems, orthogonal projections, determinants, eigenvalues and eigenvectors, and positive-definite matrices. Subtopics in Algebra are Abstract Algebra, Category Theory, Linear Algebra, and Matrix Theory, in that order (alphabetically). 

To code this book, we note that its subject is Mathematics, with topic Algebra, and containing linear algebra and matrix theory, but not Abstract Algebra, or Category Theory.

Thus, the Hathlor code for the book is 


Cross-Topic Texts

Many texts in mathematics contain material that spans topics. Our codes reflect this, and may be extended to include those additional topics in descending order of importance or inclusion to the work, separating the Topic.Subtopic codes by a colon. One may do this as many times as necessary to encapsulate all appropriate topics. Thus, we see that now the code may take the form of 


Example: Apostol’s Linear Algebra: A First Course with Applications to Differential Equations is primarily a linear algebra text, but does have a significant discussion of differential equations. Thus, the subject is still clearly Mathematics, but now we have a primary topic of Algebra and a secondary topic of Differential/Integral Equations. Differential/Integral Equations has three subtopics: integral equations, ordinary differential equations, and partial differential equations. Apostol only addresses the second of these in his text. 

Thus, the Hathlor code for this book is 


As a note, one can have as many topics indicated as necessary. Joyner’s Adventures in Group Theory is an engaging, multi-topic text that touches on many fundamental areas of mathematics, presenting abstract algebra matrix theory, logic, set theory, and a tiny bit of graph theory through the lens of recreational mathematics. Its Hathlor code is M-ALG.1001:FUN.0110:DSC.0100.

Final Generalization: Cross-Subject Texts

We also noticed in reading many technical works that authors may span entire subjects. Some texts in analysis focus only on the pure mathematics, and others discuss applications of analysis to physics or electrical engineering. The Hathlor codes account for this as well, giving the book a primary subject and as many subsequent subjects as necessary. One finishes the entire subject-topic-subtopic classification prior to moving onto the next, noting a new subject by the set <> of symbols. The final general Hathlor code thus takes on the following form:


Example: Hohn’s Applied Boolean Algebra discusses Boolean algebra and some of its applications, particularly in electrical engineering and relay circuits.The text is primarily a mathematical treatment, but it would be foolish not to note its electrical engineering motivations. Thus is spans the subjects of mathematics and engineering. Its Hathlor code is


Shelving: Simple Lexicographic Order

Shelving books by the Hathlor classification code is done in lexicographic order, moving left to right, maintaining a simple, unambiguous shelving system. Thus, all chemistry books are shelved before all computer science books. Within each subject, we move by alphabetical order by topic according to the three letter codes. All (Primary) Algebra (ALG) books come before all (primary) Analysis (ANA) books. Within each topic, we shelve by the binary indicators. Thus, M-ALG.1000 comes before M-ALG.01001. If the text contains secondary/tertiary topics, we file those after those with no subsequent topics, the same way that “can” precedes “candle” in a dictionary. The secondary topics are filed in alphabetical order, and so forth. The texts that span subjects then follow those that do not span subjects, still shelved by their primary subject. 


Why Reinvent the Wheel?

As mentioned before, we felt that the current systems in use both omit useful information regarding the content of the works and add extra information a user doesn’t typically care about, such as the LCC’s cutter number. In addition, a researcher or browser may simply have a general idea of the types of things he would like a text to contain, but neither the DDC nor the LCC provides a simple way to search for such things. Ours provides a way to search via a simple regular expression query, returning a set of texts previously unknown to the user that fit the subjects, topics, and subtopics he seeks, particularly books that contain all he seeks. 

Though neither of us have formally worked in a library, nor to either of us hold any formal degrees in library science, we both have a deep and abiding passion for reading, mathematics, organization, and classification. Beyond that, our collection now spans over 220 books, so we definitely needed a better way to shelve our library. We both think it’s a pity that very few technical libraries outside of universities exist anymore, particularly in the private sector. Company libraries for engineers and scientists such as the (highly underrated and unknown) library at NASA Ames Research Center should be cared for, revitalized, and restarted. We’re happy to help with that. 


–Rachel Traylor and Jason Hathcock


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

On the Essential Nature of Foundations

On the Essential Nature of Foundations

We get asked frequently a valid question: why fund our research? Why fund mathematics research, when I can’t see what the finished product will be, and you can’t give me a guaranteed code library next quarter?

I’m going to use an analogy of building and stray from my usual temptation to use math analogies. Every house built (well, every house that will last more than 5 years) sits on a foundation. That foundation is unseen, unsexy, and isn’t part of the curb appeal a realtor uses to sell the house. However, no amount of curb appeal can overcome a crumbling or poorly built foundation, and one good natural disaster ends the house. 

If one is going to build a house, one should ensure the foundation is built well and funded appropriately, otherwise any work built upon it doesn’t stand a chance in the long run. When we’re asked the “value prop” of our research, much of the questions come from a misconception of research funding. Funding research isn’t like funding a quick consulting project, or an app for the App Store. What is really being asked is for the value proposition of the foundation for a house. You’re not funding flashy curb appeal and a new stone facade for a house, you’re funding the very foundation the house will stand on.  A good foundation can support many stories, up to the highest skyscrapers. 

The next question one would ask when building a house (or funding research) is: “When will the house be completed?” 

The answer: it depends. 

I’ll illustrate with some examples. We have a project that proposes a different method to detect anomalies in time series based on using fuzzy numbers. (Our new “project pages” feature is coming soon, and we’ll update here with links.) In this case, enough infrastructure has been laid (like electrical lines and water and sewer pipes), that we can not only immediately begin pouring a foundation, but we can also clearly see how the house will look at completion. This is like building in a city on an empty lot. In this case, I can give a reasonable completion date for certain stages of the project of around 1-2 years, with the framework in 6 months or less. Of course, just as in building a house, some small things may come up to adjust the exact timeframe, but the path to completion is fairly clear. 

Now, suppose you’re a bit bolder. You desire to fund something less certain and less concrete, so to speak, such as pure research in, say, dependency theory . This would be analogous to wanting to purchase land in rural Wyoming and building a house there. Because no one already resides there, you can purchase quite a bit of land (a broad research direction) reasonably inexpensively. However, in this location, there are no sewer lines, no electrical wires, no city-style infrastructure to work with. The land needs to be surveyed and understood before selecting a building site, and wells need to be drilled, a septic tank installed, and electrical lines extended, in addition to the foundation of the house itself. There’s a reason most lack the courage to build here. This is a multi-year commitment, but fortune favors the bold.

One thing this type of building allows for is surprises. Perhaps during the survey of the land, you find valuable mineral deposits that will net you an unexpectedly large profit when you sell or lease the mineral rights to a mining company, thus recuperating your initial land purchase investment almost immediately and providing decades of passive income. You decide to develop that in addition to building the house, which delays the house building slightly. 

It’s also possible to find an underground river close to the surface on the site you originally intended to build, thus forcing you to relocate the site of the house 1000 feet north. Those are the less-desirable surprises, but now you can drill a well in that original spot. All of this is part of the exploratory nature of mathematical and fundamental research so many tend to scoff at as “wandering” and “impractical.”

One cannot employ the same “laser focus” building techniques in an unknown area as one would on a city lot. While the prospecting may not be obviously applicable to building the house, it’s essential in remote areas. Finding pitfalls and potential issues for a building site can save hundreds of thousands of dollars later. Finding mineral deposits can bring enormous unexpected wealth. 

Funding pure math research is purchasing land in rural Wyoming. It takes patience and courage, but can bring incalculable wealth and a set of foundations sturdy enough to build an entire city. Funding applied mathematics is building the foundation of a house on a city lot. The Math Citadel works in both areas, and also commits to helping you build the framework as well, which is the second most important feature of a good house. 

Fancy windows, pretty shingles, nice facades, and landscaping are all features of a house that makes it beautiful and attractive to passersby, and increases the value of your home. But all those are meaningless if the foundation is cracked and the framework rotten. We encourage you directly to consider the real value-proposition of mathematics research. Our business is in building foundations that are immortal. No matter how many times you update the landscaping or other surface features to stay current and modern, the foundation will still stand. 

We offer pricing far below traditional funding methods, with greater flexibility and agility. Our foundations don’t come with heavy administrative overhead, and our passion guides us to leave no stone unturned while we prospect your site for any possible additional value. 

Please feel free to contact us to set up a discussion on research projects that are of interest to your business. We are happy to discuss our current directions, and we’re also happy to forge a path in an entirely new direction as well.  

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.


Paper Review: Active Queue Management with Non-Linear Packet Dropping Function

Paper Review: Active Queue Management with Non-Linear Packet Dropping Function

As promised in the previous article, I plan to review Reference 2, Active Queue Management with Non-Linear Packet Dropping Function, by D. Augustyn, A. Domanski, and J. Domanska, published in HET-NETs 2010, which discusses a change in the structure of the packet drop probability function using the average queue length in a buffer. I mentioned previously that choosing a linear function of the average queue length can be viewed as a bit of an arbitrary choice, since we’re designing a control mechanism here, and this paper attempts to define a new form of this packet drop probability function.

In summary, the best lesson one can take from this paper is that publication in a journal or conference proceedings does not guarantee that the paper withstands scrutiny. The paper is linked above for the interested reader to peruse himself, and to investigate the claims. 


The paper intended to give a new function to calculate the probability of proactively dropping a packet in a queue in order to prevent a full buffer. It seemed to be presented as an alternative to RED, described in my previous article. The authors define this new function, then set up a simulation in order to examine the effects. 

When Orthogonal is Abused

The authors describe using a finite linear combination of orthogonal basis polynomials defined on a finite interval as the underlying mathematical structure. 

First, we should discuss what we mean by orthogonal in context of functions. Orthogonal is most commonly understood in terms of vectors, and when we’re in two dimensions, orthogonal becomes our familiar perpendicular. 


Beginning with the familiar notion of perpendicular, we can generalize this to understand orthogonality. The geometric interpretation of two vectors  being perpendicular is that the angle between them is 90^{\circ}. Once we leave two and three dimensions (or jump to the space of polynomials, as we’ll do soon), the concept of an angle isn’t as helpful. 

Another way to define perpindicular is through an operation known as the dot productSuppose we take two 2D vectors, \mathbf{x} and \mathbf{y}. Each vector will have coordinates: \mathbf{x} = (x_{1},x_{2}) and \mathbf{y} = (y_{1}, y_{2}). The dot product is a special type of multiplication defined on vectors, and denoted \cdot:

\mathbf{x}\cdot\mathbf{y} = x_{1}y_{1} + x_{2}y_{2}

The dot product can be described in words as the sum of the component-wise multiplication of the coordinates of the two vectors. 

Now, we can say that two vectors are perpendicular if their dot product is 0. That is, \mathbf{x} and \mathbf{y} are perpindicular if \mathbf{x}\cdot\mathbf{y} = 0. (This article gives a nice overview of how we move from the algebraic to the geometric definition of perpendicular and orthogonal.)

Remember, perpendicular doesn’t make sense once we get into higher dimensions. Orthogonal is a more general notion of vectors being perpendicular, and is defined for two vectors (of any length) as their dot product equalling zero.

From Dot Product to Inner Product

The dot product is used on vectors, and defines another type of product that is different from the scalar multiplication we know. In fact, we can generalize the notion of a dot product to something called an inner product, which can be defined on many different spaces than just vectors. We can define operations and products however we like, but for our definition to qualify as an inner product (denoted \langle \cdot, \cdot\rangle), it must meet certain criteria

For instance, on the set of real valued functions with domain [a,b], we define the inner product of two functions f(x) and g(x) as 

\langle f, g\rangle := \int_{a}^{b}f(x)g(x)dx

The concept of orthogonality generalizes to an inner product as well. If the inner product of two functions is zero (as defined above), we say the functions are orthogonal. 

Back to the paper

The authors claim to be using a set of orthogonal polynomials to define their drop probability function, and they give the structure of such functions. For a domain [a,b], and for \phi_{j} in the set of polynomials, they define \phi_{j} = (x-a)^{j-1}(b-x). So, for example, \phi_{1} = (b-x), and \phi_{5} = (x-a)^{4}(b-x)

Now, in order to be an orthogonal basis for a space1, the set of functions that are claimed to form the basis of the set must be pairwise orthogonal. That is, I need to be able to take the inner product of any two functions \phi_{i} and \phi_{j} and get 0. If that isn’t true for even one pair, then the set is not orthogonal. 

As it turns out, if we take the inner product of any two functions in the basis set over the domain given, we find that there are no pairs that are orthogonal. To do this in general, we compute the integral

\int_{a}^{b}(x-a)^{i-1}(b-x)\cdot (x-a)^{j-1}(b-x)dx

The integral computation is one of simple polynomial integration, and can be done either by hand or using your favorite software (Mathematica) of choice. What we find here is that this set of functions defined in general this way is never orthogonal, yet the paper claims they are. 

Applying to the particular situation of designing a drop probability function, they give the following for average queue length thresholds T_{\min} and T_{\max} p(x,a_{1},a_{2}) = \left\{\begin{array}{lr}0, &x < T_{\min}\\\phi_{0} + a_{1}\phi_{1}(x) + a_{2}\phi_{2}(x),&T_{\min}\leq x \leq T_{\max}\\1,&x > T_{\max}\end{array}\right.

where the basis functions are 

\begin{aligned}\phi_{0}(x) &= p_{m}\frac{x-T_{\min}}{T_{\max}-T_{\min}}\\\phi_{1}(x) &= (x-T_{\min})(T_{\max}-x)\\\phi_{2}(x) &= (x-T_{\min})^{2}(T_{\max}-x)\end{aligned}

The reader will recognize \phi_{0} as the original drop probability function from the RED algorithm. These functions are absolutely not orthogonal though (as the authors claim), and a simple check as we did above will verify it.

Other issues

Another issue is that these mysterious coefficients a_{1} and a_{2} need to be determined. How, you ask? The authors do not say, other than to note that one can define “a functional” implicit on the unknown p(x,a_{1},a_{2}) that can be minimized to find the optimal values for those coefficients. They write that this mysterious functional2 can be based on either the average queue length or average waiting time, yet provide no details whatsoever as to the functional they have chosen for this purpose. They provide a figure with a sample function, but give no further details as to how it was obtained.

One other issue I have in their methodology is discussing the order of estimation. For those familiar with all sorts of ways to estimate unknown functions, from Taylor series, to splines, to Fourier series, we know that a function is exactly equal to an infinite sum of such functions. Any finite sum is an estimation, and the number of terms we use for estimation with a desired accuracy may change with the function being estimated. 

For instance, if I want to use Taylor series (a linear combination of polynomials) to estimate a really icky function, how many terms should I include to ensure my accuracy at a point is within 0.001 of the real value? It depends on the function.3.

The authors simply claim that a second order term is appropriate in this case. The issue I take with that is these queueing management drop probability functions are designed. We’re not estimating a function describing a phenomenon, we are designing a control policy to seek some sort of optimal behavior. Fundamentally, the authors posing this as an estimation problem of some unknown drop probability function is incorrect. This isn’t a law of nature we seek to observe; it’s a control policy we seek to design and implement and optimize. Using language that implies the authors are estimating some “true function” is misleading.

Regarding the simulation itself, since the design was arbitrary, and not based on sound mathematical principles, I cannot give any real comment to the simulations and the results. The authors briefly discuss and cite some papers that explore the behavior of network traffic, and claim to take this into account in their simulations. To those, I cannot comment yet. 


Always verify a paper for yourself, and don’t accept anything at face value. Research and technical publications should be completely transparent as to choices and methodologies (and obviously free of mathematical inaccuracies), and derivations and proofs should be present when necessary, even if in an appendix. 

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.


Networking Mathematics: Random Early Detection and TCP synchronization

Networking Mathematics: Random Early Detection and TCP synchronization

Computer networks are something most of us take for granted–speed, reliability, availability are expectations. In fact, network problems tend to make us very angry, whether it’s dropped packets (yielding jittery Skype calls), congestion (that huge game download eating all the bandwidth), or simply a network outage. There’s an awful lot going on underneath the hood of all devices on a network to load that webpage or download that song for you. 

Much of the reliability in networking relies on maintaining good Quality of Service (QoS) policies, which involve buffering and queue management. Networks aren’t unlike the roads we travel on; traffic isn’t steady, and congestion at interfaces (or intersections on the road) happen. How do we handle that? We’ll explore some basic networking principles in order to uncover some interesting mathematics governing buffering and queue management at congested network interfaces.

Update: Mr. Fred Baker reached out to me with a few corrections regarding my interchangeable use of queue and buffer. I’ve inserted his comments into the “Buffer” section. Incidentally, Mr. Baker was the inventor of the WRED (Weighted Random Early Detection) algorithm mentioned as an extension below. 

Read More Read More