Browsed by
Category: Applied Mathematics

On the optimal storage of bulk cargoes: an exercise in calculus.

On the optimal storage of bulk cargoes: an exercise in calculus.

Editor’s Note: This article is authored by Valentin Fadeev, a physics PhD candidate at The Open University, UK. You can reach him via Twitter at @valfadeev.

Despite their undeniable and enormous role in applications, the methods of calculus are often seen by students present and past as a form of mathematical art removed from everyday life. Evaluating integrals may seem as a sport for the initiated. Extremal problems are mostly to demonstrated on the stripped-down textbook examples. Coming across a use case in an unusual and easily understandable context is rare. We present one such use case here and hope the reader finds it instructive.

We begin with some background from maritime logistics. The term “bulk cargo” is used to denote the type of cargo, such as coal, ores, fertilizers, grain and the like, carried by merchant vessels “in bulk”, that is put directly in the holds and not stuffed in boxes or any other containers. In some cases, while waiting to be picked up by vessels or other modes of transportation, such cargoes will be stored in stacks spilled by belt conveyors or other handling mechanisms.

We consider here a simplified problem of calculating the optimal dimensions of one such stack given its total volume. For the criteria of optimality we pick the ground area occupied by the stack and the work against gravity required to form it.

Figure 1

We shall approximate the stack with a geometrical body in the shape of an obelisk of height H, having a rectangle L_2\times B at its base and 4 facets inclined towards the centre at equal angles, thus forming the upper horizontal edge of length L_1 (see Figure 1).

The volume of the stack can be calculated as follows:\begin{aligned}V& =\int_0^HS(x)\,\mathrm{d}x , & (1)\end{aligned} where the x-axis is directed downwards and S(x) is the area of the horizontal cross-section as a function of x, given explicitly as \begin{aligned}S(x)&=a(x)\cdot b(x)\\&=\left(L_{1}+\frac{L_{2}-L_{1}}{H}x\right)\cdot\frac{Bx}{H}\\&=B\left(L_{1}\frac{x}{H}+\left(L_{2}-L_{1}\right)\left(\frac{x}{H}\right)^{2}\right).&(2)\end{aligned}
Substituting (2) in (1) and integrating we find \begin{aligned}V&=BH\int_0^1\left(L_1\frac{x}{H}+(L_2-L_1)\left(\frac{x}{H}\right)^2\right)\,\mathrm{d}\left(\frac{x}{H}\right)\\&=\frac{HB}{6}(2L_2+L_1) &(3)\end{aligned}

Now we shall determine the amount of work (energy spent) performed against the forces of gravity that is required to form the stack. Assume that density of the cargo is \gamma\; \mathrm{kg}/\mathrm{m}^3. The work \mathrm{d}A required to lift a layer of thickness \mathrm{d}x to height H-x is given as follows:
dA=\gamma gdV\cdot(H-x)=\gamma gB\left(L_1\frac{x(H-x)}{H}+\frac{(L_2-L_1)x^2(H-x)}{H^2}\right)\,\mathrm{d}x,
where g\approx 9.81 \,\mathrm{kg} \cdot \mathrm{m} / \mathrm{s}^2 is the acceleration due to gravity. Thus, the total work is \begin{aligned}A&=\gamma gB\int_0^H\left(L_1\frac{x(H-x)}{H}+\frac{(L_2-L_1)x^2(H-x)}{H^2}\right)\,\mathrm{d}x\\&=\frac{\gamma gBH^2}{12}(L_1+L_2). & (4)\end{aligned}

Now we can ask the following question: if the volume V given, what should be base length L_{2} and width B of so that the stack occupies the smallest possible area? First, we simplify our parametrization of the problem by introducing the angle of the natural slope \chi. This is a term from the soil mechanics which denotes the angle that an unattached slope of the material forms with the horizontal surface at mechanical equillibrium. It depends on the properties of the material such as density and viscosity, and can be looked up in special tables.

In our model \begin{aligned}\frac{2H}{B}=\tan{\chi}.\end{aligned}. We can, therefore, express everything in terms of L\equiv L_2, B and \chi \begin{aligned}H=\frac{1}{2}B\tan{\chi}, &\qquad L_1=L_2-\frac{2H}{\tan{\chi}}=L-B& (5).\end{aligned}
Then the volume (3) can be rewritten as follows \begin{aligned}V&=\frac{B^2\tan{\chi}}{12}(3L-B).&(6)\end{aligned} Solving (6) for L we obtain \begin{aligned}L&=\frac{4V}{B^2\tan{\chi}}+\frac{B}{3}.&(7)\end{aligned} The base area S can then be expressed as the function of B S(B)=LB=\frac{4V}{B\tan{\chi}}+\frac{B^2}{3}, whereby taking the derivative with respect to B and solving S'(B) = 0 we find the stationary point B_0 \begin{aligned}B_0&=\sqrt[3]{\frac{6V}{\tan{\chi}}},& (8)\end{aligned} which happens to be a point of minimum for S. Using (8) L_0 is found as follows \begin{aligned}L_0&=\left(\frac{4}{\sqrt[3]{36}}+\frac{2}{3}\sqrt[3]{36}\right)\sqrt[3]{\frac{V}{\tan{\chi}}},&(9)\end{aligned} so that the minimal area is \begin{aligned}S_0=S(B_0)&=4\left(\frac{1}{\sqrt[3]{6}}+1\right)\left(\frac{V}{\tan{\chi}}\right)^{2/3}\approx 6.2\left(\frac{V}{\tan{\chi}}\right)^{2/3}.&(10)\end{aligned}

Next we shall determine the values of L, B which minimize the amount of work required to form the stack. Rewriting (4) using (5) and (7) we obtain A=\frac{\gamma g \tan^2{\chi}}{144}\left(\frac{24VB}{\tan{\chi}}-B^4\right)
Differentiating with respect to B we find the stationary point to be given precisely by (8) again! Thus we find that, given the volume and the natural slope (8) and (9) actually produce the optimal solution, both in terms of the ground area, and the energy cost. Finally, we can use the above results to determine the largest admissible volume of a single stack given the constraint on the ground pressure, q,\, \mathrm{kg}/\mathrm{m}^2. Given V, S and \gamma the safety condition is expressed by the inequality \frac{V\gamma}{S} < q
Solving for V and using (10) we derive \begin{aligned}V&<\frac{qS}{\gamma}\approx\frac{q}{\gamma}6.2\left(\frac{V}{\tan{\chi}}\right)^{2/3},\\V&<\frac{(4.4q)^3}{\gamma^3\tan{\chi}}\approx 238.48\frac{q^3}{\gamma^3\tan{\chi}}.\end{aligned}

The model can be modified to account for the fact that in practice the edges of the stack can develop surfaces that look like half-cones. In this case the reader will find, following the steps similar to the above that the total area and the work required to form the stack are given by \begin{aligned}S(B) = \frac{4V}{B\tan{\chi}} + \frac{\pi}{12}B^2, &\qquad A(B) = \frac{\gamma g \tan^2{\chi}}{144}\left(\frac{24VB}{\tan{\chi}}-\frac{3\pi}{4}B^4\right),\end{aligned}
respectively. The values B_{S} and B_{A} minimizing the above expressions will not be the same anymore
\begin{aligned}B_{S}=\sqrt[3]{3}B_{A}, &\qquad B_{A} = 2\sqrt[3]{\frac{V}{\pi\tan{\chi}}}.\end{aligned}
We have thus determined the maximum allowed volume, given the safety constraint and the properties of the material, assuming the optimal dimensions of the stack.
In conclusion, we have considered a simple but non-trivial application of the very basic tools of calculus to a problem that has important engineering applications. However, the main point is that calculus is ubiquitous and sometimes can deliver useful and elegant results in places one would least expect it to.

Using Dirac Delta Formalism to Calculate Shear Force and Bending Moment

Using Dirac Delta Formalism to Calculate Shear Force and Bending Moment

Editor’s note: This article was written by Valentin Fadeev, a physics PhD candidate in the UK. He also co-authored this article on equivalences and isomorphisms in category theory. You can find him on Twitter via the handle @ValFadeev.

The delta function was introduced by P.A.M. Dirac, one of the founders of quantum electrodynamics. The delta function belongs to the abstract concepts of function theory. In a rigorous sense it is a functional that picks a value of a given function at a given point. However, it also arises as the result of the differentiation of discontinuous functions. One of the most appealing properties of the delta function is the fact that it allows us to work with discretely and continuously distributed quantities in the same way, replacing discrete sums with integrals. Delta and its derivative meet the physical intuition in cases where we deal with quantities of large magnitude whose impact is limited to a small region of space or a very short amount of time (impulses). The inspiration for this particular article came during a course on mechanics in university.

In structural mechanics, it is often desired to understand the relationship between bending moment and shear force for objects, particularly beams, under applied loads. An explanation of the typical method is given here.

In the text I was using at the time, the proof of the relation between bending moment and shear force did not make explicit use of delta function. The final result was valid, but there was a serious fault in the proof. Point forces were considered as mere constant terms in the sum and differentiated away, despite the fact their contribution was accounted for only starting form certain points (suggesting delta-terms). I gave this problem a mathematically rigorous treatment which led to some nice closed formulae applicable to a very general distribution of load.

Relationship Between Shear Force and Bending Moment

A beam with various forces

Consider a rigid horizontal beam of length L, and introduce a horizontal axis x directed along it. Assume the following types of forces are applied:

  • point forces F^{p}_{i} at x_i, i=1,2,…,n, 0<x_i<L
  • distributed forces with linear density given by continuous functions \phi^{d}_{j}(x) on intervals (a_j,b_j),(j=1,2,…,m), 0<a_{j-1}<b_{j-1}<a_j<b_j<L
  • moments (pairs of forces) M_k with axes of rotation at x_k, (k=1,2,…,p), 0<x_k<L.

Note that we ignore our own mass of the beam for the sake of clarity, although it can be easily accounted for if necessary.

Point Forces

Although the point forces F^{p}_{i} are applied at certain points by definition, in reality the force is always applied to a certain finite area. In this case we can consider the density of the force being very large within this area and dropping to 0 outside it. Hence, it is convenient to define the distribution density as follows: \phi^{p}_{i}(x)=F^{p}_{i}\delta(x-x_i). Shear force F at point x is defined as the sum of all forces applied before that point (we are moving from the left end of the beam to the right one): F=\sum_{x_i<x}F_{i}
F^{p}(x)=\sum_{i}\int_0^x \phi^{p}_{i}(z)\mathrm{d}z=\sum_{i}F^{p}_{i}e(x-x_i), where e(x)=\begin{cases}1,\qquad x>0\\0, \qquad x\neq 0\end{cases} is the step function.

Distributed Forces

Now we shall find the contribution from the distributed forces. \phi^{d}_j(x) may be formally defined on the whole axis, so we must cut off the unnecessary intervals outside (a_j,b_j). Consider the following expressions: \phi^{d}_{j}(x,a_j,b_j)=\phi^{d}_j(x)[e(x-a_j)-e(x-b_j)]. Indeed it is easy to ascertain that the right side is equal to \phi^{d}_j(x) within (a_j,b_j) and vanishes everywhere else. Calculating shear force due to distributed forces demonstrates some useful properties of the delta function: \begin{aligned}F^d&=\sum_{j}\int_{0}^{x}\phi_{j}^{d}(z)[e(z-a_j)-e(z-b_j)]\mathrm{d}z\\&=\sum_{j}\left[\int_0^x\phi^{d}_j(z)e(z-a_j)\mathrm{d}z-\int_0^x \phi^{d}_j(z)e(z-b_j)\mathrm{d}z\right]\\&=\sum_{j}\left[\int_{a_j}^x\phi^{d}_{j}(z)e(z-a_j)\mathrm{d}z-\int{b_j}^{x}\phi^{d}_j(z)e(z-b_j)\mathrm{d}z\right]\\&=\sum_{j}\left[\left.\left(e(z-a_j)\int_{a_j}^x\phi^{d}_j(z)\mathrm{d}z\right)\right|_{a_j}^x-\int_{a_j}^x\left(\int_{a_j}^x\phi^{d}_j(z)\mathrm{d}z\right)\delta(z-a_j)\mathrm{d}z\right.\\&\left.-\left.\left(e(z-b_j)\int_{b_j}^x\phi^{d}_j(z)\mathrm{d}z\right)\right|{b_j}^x+\int_{b_j}^x\left(\int_{b_j}^x\phi^{d}_j(z)\mathrm{d}z\right)\delta(z-b_j)\mathrm{d}z\right]\\&=\sum_{j}\left[e(x-a_j)\int_{a_j}^x\phi^{d}_j(z)\mathrm{d}z+e(x-b_j)\int_{x}^{b_j}\phi^{d}_j(z)\mathrm{d}z\right]\end{aligned}

Here we used the defining property of delta: f(x)\delta(x-x_0)=f(x_0)\delta(x-x_0), from which it follows, in particular that \left(\int_{a_j}^x\phi^{d}_j(z)\mathrm{d}z\right)\delta(x-a_j)=\left(\int_{a_j}^{a_j}\phi^{d}_j(z)\mathrm{d}z\right)\delta(x-a_j)=0.

Bending Moments

Now we shall calculate bending moments created by all types of forces involved. Consider F^{p}_{i} applied at x_i. The bending moment created by this force evaluated at x>x_i can be determined as follows:

\begin{aligned}M_{i}^{p}(x)&=F_{i}^{p}e(x-x_i)(x-x_i),\\M_{i}^{p}(x+\mathrm{d}x)&=F_{i}^{p}e(x+\mathrm{d}x-x_i)(x+\mathrm{d}x-x_i)=F^{p}_{i}e(x-x_i)(x-x_i+\mathrm{d}x),\\dM_{i}^{p}&=F_{i}^{p}e(x-x_i)\mathrm{d}x,\\M_{i}^{p}&=F_{i}^{p}\int_0^{x}e(z-x_i)\mathrm{d}z=F^{p}_{i}\frac{x-x_i+|x-x_i|}{2}.\end{aligned} Hence, the total moment due to point forces is M^{p}=\sum_{i}F_{i}^{p}\frac{x-x_i+|x-x_i|}{2}.

To calculate moment created by the distributed forces we use the following approach adopted in mechanics. Replace the distributed force to the left of the point of summation with a point force applied at the center of gravity of the figure enclosed by the graph of \phi_j(x) and lines x=a_j, x=b_j. If a_j<x<b_j: \begin{aligned}F^{d}_{j}(x)&=\int_{a_j}^x\phi^{d}_{j}(z)\mathrm{d}z,\\ M^{d}_{j}&=\int_{a_j}^x\phi^{d}{j}(z)\mathrm{d}z\left(x-\frac{\int_{a_j}^x z\phi^{d}_{j}(z)\,\mathrm{d}z}{\int_{a_j}^x\phi^{d}_{j}(z)\mathrm{d}z}\right)\\&=x\int_{a_j}^x\phi^{d}_{j}(z)\mathrm{d}z-\int_{a_j}^x z\phi^{d}_{j}(z)\mathrm{d}z.\end{aligned} Differentiating both parts with respect to x we obtain \frac{\mathrm{d}M^{d}_{j}}{\mathrm{d}x}=\int_{a_j}^x\phi^{d}_{j}(z)\mathrm{d}z+x\phi^{d}_{j}(x)-x\phi^{d}_{j}(x)=\int_{a_j}^x\phi^{d}_{j}(z)\mathrm{d}z=F^{d}_{j}(x)

In fact, we could as well include point forces in the above derivation considering total density \phi(x)=\phi^{p}(x)+\phi^{d}(x), which is the correct way to account for point forces in this proof. We can therefore derive the general relation between shear force and bending moment: F=\frac{\mathrm{d}M}{\mathrm{d}x}. Now we need to calculate the contribution made by moments of pairs of forces. Each of these moments can be considered as a pair of equal large forces F applied at a small distance h of each other and oppositely directed. They will make the following contribution to the expression for total density \mu(x)=F\delta(x+h)-F\delta(x)=Fh\frac{\delta(x+h)-\delta(x)}{h}=M\frac{\delta(x+h)-\delta(x)}{h}. Taking the limit as h \to 0: \mu(x)=M\delta'(x)

Note that the derivative of \delta(x) is called the unit doublet.
This does not imply that expression for shear force will contain terms of the form M\delta(x). This is due to the fact that shear force consists of vertical components of all forces and for pairs of forces those components will cancel out each other.
Therefore, the total bending moment of the pairs of forces is expressed as follows M=\sum_{k}M_ke(x-x_k). Finally we can write the expressions for shear force and bending moment in the most general case \begin{aligned}F(x)&=\sum_{i}F^{p}_{i}e(x-x_i)\\&\qquad+\sum_{j}\left[e(x-a_j)\int_{a_j}^x\phi^{d}_j(z)\mathrm{d}z+e(x-b_j)\int_{x}^{b_j}\phi^{d}_j(z)\mathrm{d}z\right],\\M(x)&=\sum_{k}M_ke(x-x_k)+\sum_{i}F^{p}_{i}\frac{x-x_i+|x-x_i|}{2}\\&\qquad+\sum_{j}\int_0^x\left[e(t-a_j)\int_{a_j}^t \phi^{d}_j(z)\mathrm{d}z+e(t-b_j)\int_{t}^{b_j}\phi^{d}_j(z)\mathrm{d}z\right]\mathrm{d}t.\end{aligned} In an important particular case where \phi^{d}_j(x)\equiv \phi^{d}_j=const these expressions reduce to the following \begin{aligned}F(x)&=\sum_{i}F^{p}_{i}e(x-x_i)+\sum_{j}\phi^{d}_j\left[(x-a_j)e(x-a_j)+(x-b_j)e(x-b_j)\right], \\ M(x)&=\sum_{k}M_ke(x-x_k)+\sum_{i}F^{p}_{i}\frac{x-x_i+|x-x_i|}{2}\\ &\qquad+\sum_{j}\frac{\phi^{d}_j}{2}\left[(x-a_j)^2e(x-a_j)+(x-b_j)^2e(x-b_j)\right].\end{aligned}


In this example we have demonstrated the application of Dirac delta and the related functions to a problem where the input data has a combination of continuous and discrete distributions. Singular functions allow us to work with this data in a uniform and relatively rigorous way. The advantage is that we get the final results in the closed form, as opposed to an algorithmic step-by-step method of solution. We can also obtain concise proofs, such as that for the relation between shear force and bending moment, and extend the results to very general inputs.

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Should I accept this shipment?

Should I accept this shipment?

Suppose you work for an engineering or manufacturing firm, and you receive shipments of different parts from various suppliers. It’s not good business practice to just blindly accept a shipment, because some parts in your batch may be defective. Too many defective parts, and you’ll slow down manufacturing (in addition to wasting valuable time and money). You come up with a brilliant plan (which is used in industry already): you’ll take a sample from the shipment, test the sample, and if the number of defective parts in the sample is below some predetermined number, you’ll accept the shipment. Otherwise, you’ll send it back and probably place an angry call to your supplier.


(1) Why not just test everything?

For one, shipments may contain thousands of parts. That will take forever. Secondly, some tests are destructive, meaning you push an item to its breaking point, which renders it unusable. (In these cases, you’re testing to ensure the breaking point is high enough.) Thirdly, certain kinds of tests may be expensive or time-consuming. We have real costs — time and money and physical scarcity — we must consider now.

(2) How big should our sample be?

There are statistical considerations and business/cost considerations. There’s no perfect answer here. Sampling theory is a large branch of statistics, and there are tools for analysis of optimal sample size. However, we must also consider the specific case. If we have a destructive test, the “statistically optimal” size may destroy too many. If the test is time-consuming, we may lack the time to perform the test on a large number of items1

(3) What should be the “cutoff” for shipment rejection?

This discussion is the main focus of this article. We’re going to take a pretty small and simplistic overview of it, just to illustrate how powerful even basic probability can be for engineering and business problems. To do this, we’ll briefly describe the hypergeometric distribution, and then illustrate its use in an operating characteristic curve.

The Hypergeometric Distribution

Suppose we have N objects belonging to one of two classes, with N_{1} in the first class, and N_{2} in the second class, so N = N_{1} + N_{2}. (In this example, N_{1} is the total number of defective parts in the batch, and N_{2} is the number of good parts.) We select n objects without replacement2. Find the probability that exactly x belong to the first class and n-x to the second.

We can select x objects from the N_{1} in class one in {N_{1} \choose x} = \frac{N_{1}!}{x!(N_{1}-x)!} ways3. There are n-x left and they must come from N_{2} objects. We can select those in {N_{2} \choose n-x} ways. Then P(X=x) = \frac{{N_{1} \choose x}{N_{2} \choose n-x}}{{N \choose n}}.

Evaluating a shipment acceptance plan

We’ll create a small example with manageable numbers to illustrate the use of the hypergeometric distribution in acceptance sampling.

A supplier ships parts to another company in lots of 25. Some items in the shipment may be defective. The receiving company has an acceptance sampling plan to inspect n=5 parts from the lot without replacement. If none of the sample is defective, they accept the shipment, otherwise they reject it. Suppose we want to evaluate this plan.

If X is the random number of defectives in the sample, then X has a hypergeometric distribution. Except this time, N_{1} and N_{2} (number of defective and nondefective parts respectively) is unknown. We only know N_{1} + N_{2} = 25.

In designing an acceptance plan, we want the probability of accepting the lot to be large if N_{1} is small. That is, we want to have a high probability of accepting the lot if the true number of defective parts is very small. We also want the probability of “falsely accepting” the lot to be low. (That is, we want the probability of acceptance to be low when N_{1} is high).

When we treat these probabilities as a function of N_{1} (or equivalently, the fraction defective given by p = \frac{N_{1}}{25} in the lot) we call this the operating characteristic curve. Mathematically, the operating characteristic curve, denoted OC(p) is given in this case by:

OC(p) = P(X=0) = \frac{{N_{1} \choose 0}{25-N_{1} \choose 5}}{{25 \choose 5}}

Note here that OC(p) = P(X=0) because that’s the plan in this fictional example. If the plan changed, the operating characteristic curve would be defined by something different. For example, if the plan was to accept shipments that contain 1 or fewer defects, then OC(p) = P(X\leq 1) and we would recalculate those probabilities using the hypergeometric probability mass function given above.

Let’s look at some numerical values of OC(p) for our fictional example. Remember that p = \frac{N_{1}}{25}.


Is the acceptance plan satisfactory? With N_{1} = 1, OC(0.04) = 0.8 which may be considered too low (we may reject perfectly valid shipments), and with N_{1} = 4, OC(0.16) = 0.383 may be too high (we may not want that high a probability of accepting a shipment with that many defects). Thus, we may want to reconsider the number of items we test, or reconsider our acceptance plan.

Usually lot sizes are far larger than the numbers seen here, and sample sizes are in the hundreds, so as the values get large, this computation becomes cumbersome. We can approximate the hypergeometric distribution with the Poisson distribution, which we won’t cover here.


This is a small illustration of the very practical use of the hypergeometric distribution to devise an intelligent strategy for accepting/rejecting shipments of goods. This type of work falls within the purview of the services we offer.

Exploiting Chemistry for Better Packet Flow Management 1: Introduction

Exploiting Chemistry for Better Packet Flow Management 1: Introduction

Perhaps the phrase “don’t reinvent the wheel” is overused. However, many newer disciplines, particularly in the technology sector, seem to insist on it. One thing physical engineers learned long ago was to study the world around them, work with it, and emulate it in their designs. Network engineering should be no different. In a technical report from 2011, authors Thomas Meyer and Christian Tschudin from the University of Basel describe a highly elegant natural flow management method [11] that exploits much of the hard work done in the chemistry sector in chemical kinetics. They describe a scheduling approach that creates an artificial chemistry as an analogue of a queueing network, and uses the Law of Mass Action to schedule events naturally. The analysis of such networks utilizing their implementation is simplified regardless of the depth one wishes to go into. In addition, they show a congestion control algorithm based on this framework is TCP fair and give evidence of actual implementation (a relief to practitioners).

This report will discuss their paper at length with a goal of covering not only their work, but the underlying ideas. Since the paper requires knowledge of chemical kinetics, probability, queueing theory, and networking, it should be of interest to specialists in these disciplines, but a comprehensive discussion of the fundamentals glossed over in the paper would make dissemination more likely.

Note: the full pdf of the report is attached to each section of this series at the bottom. 

Overall Review of the Paper

The paper is well written, readable, and quite clear despite the breadth of scope it contains. It’s a major benefit to the authors and readers that their theoretical model has been tested in a proof-of-concept network. The work shows much promise for the networking space as a whole.

Overview of the Paper and Chemistry Basics

Just as in chemistry and physics, packet flow in a network has microscopic behavior controlled by various protocols, and macro-level dynamics. We see this in queueing theory as well–we can study (typically in steady-state to help us out, but in transient state as well) the stochastic behavior of a queue, but find in many cases that even simple attempts to scale the analysis up to networks (such as retaining memorylessness) can become overwhelming. What ends up happening in many applied cases is a shift to an expression of the macro-level properties of the network in terms of average flow. The cost of such smoothing is an unpreparedness to model and thus deal effectively with erratic behavior. This leads to overprovisioning and other undesirable and costly design choices to mitigate those risks.

Meyer and Tschudin have adapted decades of work in the chemical and physical literature to take advantage of the Law of Mass Action, designing an artifical chemistry that takes an unconventional non-work-conserving approach to scheduling. Non-work-conserving queues add a delay to packets and have tended to be avoided for various reasons, typically efficiency. Put simply, they guarantee a constant wait time of a packet regardless of the number of packets in a queue by varying the processing rate with fill level of the queue. The more packets in queue, the faster the server processes those packets.

Law of Mass Action in Chemistry

If we have some chemical reaction with reactants A_{1}, A_{2},\ldots, A_{n} and products B_{1}, B_{2}, \ldots, B_{m}, and the reaction is only forward1, then we may express the reaction as

A_{1} + A_{2} + \ldots + A_{n} \longrightarrow_{k} B_{1} + B_{2} + \ldots + B_{n}

where k is the rate constant. In a simple reaction A \to P, with P as the product, we can see the rate expressed nicely in a very basic differential equation form[9]:

-\frac{\text{d}c_{A}}{\text{d}t} = k\cdot c_{A}

This should actually look somewhat similar to problems seem in basic calculus courses as well. The rate of change of draining the reactant is a direct function of the current concentration.

The reaction rate r_{f} of a forward reaction is proportional to the concentrations of the reactants:

r_{f} = k_{f}c_{A_{1}}c_{A_{1}}\cdots c_{A_{N}}

for a set of reactants \{A_{i}\}.

The Queuing Analogue and Assumptions

Meyer and Tschudin[11] express the networking version of these chemical reactions in a very natural way. Packets are molecules. A molecular species is a queue, so molecules of species X go into queue X. The molecular species is a temporary buffer that stores particular packets types until they are consumed by a reaction (processed by some server in the queueing space). FIFO (first-in-first-out) discipline is assumed.


The figure above from the technical report shows how a small system of reactions looks in the chemical space and the queuing space. Where analysis and scheduling can get complicated is in the coupled nature of the two reactions. The servers both drain packets from queue Y, so they are required to coordinate their actions in some way. It’s important to note here that this equivalence rests on treating the queuing system as M/M/1 queues with a slightly modified birth-death process representation.

Typically, in an M/M/1 queue, the mean service rate is constant. That is, the service rate is independent of the state the birth-death process is in. However, if we model the Law of Mass Action using a birth-death process, we’d see that the rate of service (analogously, the reaction rate) changes depending on the fill-level of the queue (or concentration of the reactant). We’ll investigate this further in the next sections, discussing their formal analysis.

Related Work and Precedent

The authors noted that adding packet delay is not unheard of in the networking space. Delay Frame Queuing[12] utilizes non-work-conserving transmission at edge nodes in an ATM network in order to guarantee upper bounds on delay and jitter for virtual circuits. Three other researchers in 2008 (Kamimura et al) proposed a Constant Delay Queuing policy that assigns a constant delay to each packet of a particular priority stream and forward other best-effort packets during the delay[8].


The next article will discuss the formal mathematical model of an artificial packet chemistry. 



  1. Dittrich, P., Ziegler, J., and Banzhaf, W. Artificial chemistries – a review. Artificial Life 7(2001), 225–275.
  1. Feinburg, M. Complex balancing in general kinetic systems. Archive for Rational Mechanics and Analysis 49 (1972).
  2. Gadgil, C., Lee, C., and Othmer, H. A stochastic analysis of first-order reaction networks. Bulletin of Mathematical Biology 67 (2005), 901–946.
  3. Gibson, M., and Bruck, J. Effcient stochastic simulation of chemical systems with many species and many channels. Journal of Physical Chemistry 104 (2000), 1876–1889.
  4. Gillespie, D. The chemical langevin equation. Journal of Chemical Physics 113 (2000).
  5. Gillespie, D. The chemical langevin and fokker-planck equations for the reversible isomerizationreaction. Journal of Physical Chemistry 106 (2002), 5063–5071.
  6. Horn, F. On a connexion between stability and graphs in chemical kinetics. Proceedings of the RoyalSociety of London 334 (1973), 299–330.
  7. Kamimura, K., Hoshino, H., and Shishikui, Y. Constant delay queuing for jitter-sensitive iptvdistribution on home network. IEEE Global Telecommunications Conference (2008).
  8. Laidler, K. Chemical Kinetics. McGraw-Hill, 1950.
  9. McQuarrie, D. Stochastic approach to chemical kinetics. Journal of Applied Probability 4 (1967), 413–478.
  10.  Meyer, T., and Tschudin, C. Flow management in packet networks through interacting queues and law-of-mass-action-scheduling. Technical report, University of Basel.
  11.  Pocher, H. L., Leung, V., and Gilles, D. An application- and management-based approach to atm scheduling. Telecommunication Systems 12 (1999), 103–122.
  12. Tschudin, C. Fraglets- a metabolistic execution model for communication protocols. Proceedings of the 2nd annual symposium on autonomous intelligent networks and systems (2003).

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Paper Review: Active Queue Management with Non-Linear Packet Dropping Function

Paper Review: Active Queue Management with Non-Linear Packet Dropping Function

As promised in the previous article, I plan to review Reference 2, Active Queue Management with Non-Linear Packet Dropping Function, by D. Augustyn, A. Domanski, and J. Domanska, published in HET-NETs 2010, which discusses a change in the structure of the packet drop probability function using the average queue length in a buffer. I mentioned previously that choosing a linear function of the average queue length can be viewed as a bit of an arbitrary choice, since we’re designing a control mechanism here, and this paper attempts to define a new form of this packet drop probability function.

In summary, the best lesson one can take from this paper is that publication in a journal or conference proceedings does not guarantee that the paper withstands scrutiny. The paper is linked above for the interested reader to peruse himself, and to investigate the claims. 


The paper intended to give a new function to calculate the probability of proactively dropping a packet in a queue in order to prevent a full buffer. It seemed to be presented as an alternative to RED, described in my previous article. The authors define this new function, then set up a simulation in order to examine the effects. 

When Orthogonal is Abused

The authors describe using a finite linear combination of orthogonal basis polynomials defined on a finite interval as the underlying mathematical structure. 

First, we should discuss what we mean by orthogonal in context of functions. Orthogonal is most commonly understood in terms of vectors, and when we’re in two dimensions, orthogonal becomes our familiar perpendicular. 


Beginning with the familiar notion of perpendicular, we can generalize this to understand orthogonality. The geometric interpretation of two vectors  being perpendicular is that the angle between them is 90^{\circ}. Once we leave two and three dimensions (or jump to the space of polynomials, as we’ll do soon), the concept of an angle isn’t as helpful. 

Another way to define perpindicular is through an operation known as the dot productSuppose we take two 2D vectors, \mathbf{x} and \mathbf{y}. Each vector will have coordinates: \mathbf{x} = (x_{1},x_{2}) and \mathbf{y} = (y_{1}, y_{2}). The dot product is a special type of multiplication defined on vectors, and denoted \cdot:

\mathbf{x}\cdot\mathbf{y} = x_{1}y_{1} + x_{2}y_{2}

The dot product can be described in words as the sum of the component-wise multiplication of the coordinates of the two vectors. 

Now, we can say that two vectors are perpendicular if their dot product is 0. That is, \mathbf{x} and \mathbf{y} are perpindicular if \mathbf{x}\cdot\mathbf{y} = 0. (This article gives a nice overview of how we move from the algebraic to the geometric definition of perpendicular and orthogonal.)

Remember, perpendicular doesn’t make sense once we get into higher dimensions. Orthogonal is a more general notion of vectors being perpendicular, and is defined for two vectors (of any length) as their dot product equalling zero.

From Dot Product to Inner Product

The dot product is used on vectors, and defines another type of product that is different from the scalar multiplication we know. In fact, we can generalize the notion of a dot product to something called an inner product, which can be defined on many different spaces than just vectors. We can define operations and products however we like, but for our definition to qualify as an inner product (denoted \langle \cdot, \cdot\rangle), it must meet certain criteria

For instance, on the set of real valued functions with domain [a,b], we define the inner product of two functions f(x) and g(x) as 

\langle f, g\rangle := \int_{a}^{b}f(x)g(x)dx

The concept of orthogonality generalizes to an inner product as well. If the inner product of two functions is zero (as defined above), we say the functions are orthogonal. 

Back to the paper

The authors claim to be using a set of orthogonal polynomials to define their drop probability function, and they give the structure of such functions. For a domain [a,b], and for \phi_{j} in the set of polynomials, they define \phi_{j} = (x-a)^{j-1}(b-x). So, for example, \phi_{1} = (b-x), and \phi_{5} = (x-a)^{4}(b-x)

Now, in order to be an orthogonal basis for a space1, the set of functions that are claimed to form the basis of the set must be pairwise orthogonal. That is, I need to be able to take the inner product of any two functions \phi_{i} and \phi_{j} and get 0. If that isn’t true for even one pair, then the set is not orthogonal. 

As it turns out, if we take the inner product of any two functions in the basis set over the domain given, we find that there are no pairs that are orthogonal. To do this in general, we compute the integral

\int_{a}^{b}(x-a)^{i-1}(b-x)\cdot (x-a)^{j-1}(b-x)dx

The integral computation is one of simple polynomial integration, and can be done either by hand or using your favorite software (Mathematica) of choice. What we find here is that this set of functions defined in general this way is never orthogonal, yet the paper claims they are. 

Applying to the particular situation of designing a drop probability function, they give the following for average queue length thresholds T_{\min} and T_{\max}

p(x,a_{1},a_{2}) = \left\{\begin{array}{lr}0, &x < T_{\min}\\\phi_{0} + a_{1}\phi_{1}(x) + a_{2}\phi_{2}(x),&T_{\min}\leq x \leq T_{\max}\\1,&x > T_{\max}\end{array}\right.

where the basis functions are 

\begin{aligned}\phi_{0}(x) &= p_{m}\frac{x-T_{\min}}{T_{\max}-T_{\min}}\\\phi_{1}(x) &= (x-T_{\min})(T_{\max}-x)\\\phi_{2}(x) &= (x-T_{\min})^{2}(T_{\max}-x)\end{aligned}

The reader will recognize \phi_{0} as the original drop probability function from the RED algorithm. These functions are absolutely not orthogonal though (as the authors claim), and a simple check as we did above will verify it.

Other issues

Another issue is that these mysterious coefficients a_{1} and a_{2} need to be determined. How, you ask? The authors do not say, other than to note that one can define “a functional” implicit on the unknown p(x,a_{1},a_{2}) that can be minimized to find the optimal values for those coefficients. They write that this mysterious functional2 can be based on either the average queue length or average waiting time, yet provide no details whatsoever as to the functional they have chosen for this purpose. They provide a figure with a sample function, but give no further details as to how it was obtained.

One other issue I have in their methodology is discussing the order of estimation. For those familiar with all sorts of ways to estimate unknown functions, from Taylor series, to splines, to Fourier series, we know that a function is exactly equal to an infinite sum of such functions. Any finite sum is an estimation, and the number of terms we use for estimation with a desired accuracy may change with the function being estimated. 

For instance, if I want to use Taylor series (a linear combination of polynomials) to estimate a really icky function, how many terms should I include to ensure my accuracy at a point is within 0.001 of the real value? It depends on the function.3.

The authors simply claim that a second order term is appropriate in this case. The issue I take with that is these queueing management drop probability functions are designed. We’re not estimating a function describing a phenomenon, we are designing a control policy to seek some sort of optimal behavior. Fundamentally, the authors posing this as an estimation problem of some unknown drop probability function is incorrect. This isn’t a law of nature we seek to observe; it’s a control policy we seek to design and implement and optimize. Using language that implies the authors are estimating some “true function” is misleading.

Regarding the simulation itself, since the design was arbitrary, and not based on sound mathematical principles, I cannot give any real comment to the simulations and the results. The authors briefly discuss and cite some papers that explore the behavior of network traffic, and claim to take this into account in their simulations. To those, I cannot comment yet. 


Always verify a paper for yourself, and don’t accept anything at face value. Research and technical publications should be completely transparent as to choices and methodologies (and obviously free of mathematical inaccuracies), and derivations and proofs should be present when necessary, even if in an appendix. 

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.


Networking Mathematics: Random Early Detection and TCP synchronization

Networking Mathematics: Random Early Detection and TCP synchronization

Computer networks are something most of us take for granted–speed, reliability, availability are expectations. In fact, network problems tend to make us very angry, whether it’s dropped packets (yielding jittery Skype calls), congestion (that huge game download eating all the bandwidth), or simply a network outage. There’s an awful lot going on underneath the hood of all devices on a network to load that webpage or download that song for you. 

Much of the reliability in networking relies on maintaining good Quality of Service (QoS) policies, which involve buffering and queue management. Networks aren’t unlike the roads we travel on; traffic isn’t steady, and congestion at interfaces (or intersections on the road) happen. How do we handle that? We’ll explore some basic networking principles in order to uncover some interesting mathematics governing buffering and queue management at congested network interfaces.

Update: Mr. Fred Baker reached out to me with a few corrections regarding my interchangeable use of queue and buffer. I’ve inserted his comments into the “Buffer” section. Incidentally, Mr. Baker was the inventor of the WRED (Weighted Random Early Detection) algorithm mentioned as an extension below. 

Read More Read More

Poisson Processes and Data Loss

Poisson Processes and Data Loss

There are many applications for counting arrivals over time. Perhaps I want to count the arrivals into a store, or shipments into a postal distribution center, or node failures in a cloud cluster, or hard drive failures in a traditional storage array. It’s rare that these events come neatly, one after the other, with a constant amount of time between each event or arrival. Typically those interarrival times, the time between two events in a sequence arriving, are random. How then do we study these processes?

Read More Read More