Browsed by
Month: April 2018

Simulating Soundscapes Using Convolutions

Simulating Soundscapes Using Convolutions

One of the most powerful areas of electrical engineering that flourished in the 20th century is the field of signal processing. The field is broad and rich in some beautiful mathematics, but by way of introduction, here we’ll take a look at some basic properties of signals and how we can use these properties to find a nice compact representation of operations on them. As a motivating application, we’ll use what we study today to apply certain effects to audio signals. In particular, we’ll take a piece of audio, and be able to make it sound like it’s being played in a cathedral, or in a parking garage, or even through a metal spring.

First things first: what is a signal? For this discussion we’ll limit ourselves to looking at the space \ell = \{x :\mathbb{Z} \rightarrow \mathbb{R}\} – the set of functions which take an integer and return a real number. Another way to think of a signal then is as an infinite sequence of real numbers. We’re limiting ourselves to functions where the domain is discrete (the integers), rather than continuous (the real numbers), since in many applications we’re looking at signals that represent some measurement taken at a bunch of different times1. It’s worth noting that any signal that’s been defined on a countable domain \{..., t_{n-1}, t_n, t_{n+1},...\} can be converted to one defined on the integers via an isomorphism. We like to place one further restriction on the signals, in order to make certain operations possible. We restrict the space to so-called finite-energy signals:

\ell_2 = \left\{x \in \ell : \sum_{n = -\infty}^{\infty} |x(n)|^2 < \infty\right\}.

This restriction makes it much easier to study and prove things around these functions, while still giving us lots of useful signals to study, without having to deal with messy things like infinities. In practice, when dealing with audio we usually have a signal with a finite length and range, so this finite-energy property is trivially true.

Studying signals is only as useful if we can also define operations on them. We’ll study the interaction of signals with systems, which take one signal and transform it into another – essentially, a function operating on signals. Here, we’ll say that a system H : \ell_2 \rightarrow \ell_2 takes an input signal x(t) and produces output signal H\{x(t)\} = y(t).

Linearity and Time Invariance

There are certain properties that are useful for systems to have. The first is linearity. A system H is considered linear if for every pair of inputs x_1, x_2 \in \ell_2, and for any scalar values \alpha, \beta \in R, we have

H\{\alpha x_1 + \beta x_2\} = \alpha H\{x_1\} + \beta H\{x_2\}

This is very useful, because it allows us to break down a signal into simpler parts, study the response of the system to each of those parts, and understand the response to the more complex original signal.

The next property we’re going to impose on our systems is time-invariance:

\forall s \in \mathbb{Z}, H\{x(n)\} = y(n) \Rightarrow H\{x(n-s)\} = y(n-s)

This means that shifting the input by s corresponds to a similar shift in the output. In our example of playing music in a cathedral, we expect our system to be time-invariant, since it shouldn’t matter whether we play our music at noon or at midnight, we’d expect it to sound the same. However, if we were playing in a building that, for example, lowered a bunch of sound-dampening curtains at 8pm every night, then the system would no longer be time-invariant.

So what are some more concrete examples of systems that are linear and time-invariant?
Let’s consider an audio effect which imitates an echo – it outputs the original signal, plus a quieter, delayed version of that signal. We might express such a system as

H_{\Delta, k}\{x(n)\} = x(n) + kx(n-\Delta)

where \Delta \in \mathbb{Z} is the time delay of the echo (in terms of the number of samples), and k \in \mathbb{R} is the relative volume of the echoed signal. We can see that this signal is time-invariant, because there is no appearance of the time variable outside of the input. If we replaced k by a function k(n) = \sin(n), for example, we would lose this time invariance. Additionally, the system is plainly linear:

\begin{aligned}H_{\Delta, k}\{\alpha x_1(n) + \beta x_2(n)\} &= \alpha x_1(n) + \beta x_2(n) + k x_1(n-\Delta) +k x_2(n-\Delta) \\&= H_{\Delta, k}\{x_1(n)\} + H_{\Delta, k}\{x_2(n)\}\end{aligned}

A common non-linearity in audio processing is called clipping — we limit the output to be between -1 and 1: H\{x(n)\} = \max(\min(x(n), 1), -1). This is clearly non-linear since doubling the input will not generally double the output.

The Kronecker delta signal

There is a very useful signal that I would be remiss not to mention here: the Kronecker delta signal. We define this signal as

\delta(n) = \begin{cases} 1 & n = 0 \\ 0 & n \neq 0 \end{cases}

The delta defines an impulse, and we can use it to come up with a nice compact description of linear, time-invariant systems. One property of the delta is that it can be used to “extract” a single element from another signal, by multiplying:

\forall s \in \mathbb{Z}, \delta(n-s)x(s) = \begin{cases} x(n) & n=s \\ 0 & n \neq s\end{cases}

Similarly, we can then write any signal as an infinite sum of these multiplications:

x(n) = \sum_{s=-\infty}^{\infty} \delta(n-s)x(s) = \sum_{s=-\infty}^{\infty}\delta(s)x(n-s)

Why would we want to do this? Let H be a linear, time-invariant system, and let h(t) = H\{\delta(t)\}, the response of the system to the delta signal. Then we have

\begin{aligned} H\{x(n)\} &= H\left\{\sum_{s=-\infty}^{\infty} \delta(n-s)x(s)\right\}\\&=\sum_{s=-\infty}^{\infty}H\{\delta(n-s)\}x(s) \text{ by linearity}\\&=\sum_{s=-\infty}^{\infty}h(n-s)x(s) \text{ by time-invariance}\end{aligned}

We can write any linear, time-invariant system in this form. We call the function h the impulse response of the system, and it fully describes the behaviour of the system. This operation where we’re summing up the product of shifted signals is called a convolution, and appears in lots of different fields of math 2.

Firing a Gun in The Math Citadel

The power of this representation of a system is that if we want to understand how it will act on any arbitrary signal, it is sufficient to understand how it responds to an impulse. To demonstrate this, we’ll look at the example of how audio is affected by the environment it is played in. Say we were a sound engineer, and we wanted to get an instrument to sound like it was being played in a big, echoing cathedral. We could try to find such a place, and actually record the instrument, but that could be expensive, requiring a lot of setup and time. Instead, if we could record the impulse response of that space instead, we could apply the convolution to a recording we did back in a studio. How do we capture an impulse response? We just need a loud, very short audio source – firing a pistol or popping a balloon are common. To demonstrate, here are some example impulse responses, taken from OpenAirLib, and their effects on different audio signals.

First, here is the unprocessed input signal – a short piece of jazz guitar:

Here is the same clip, as if it were played in a stairwell at the University of York. First, the impulse response, then the processed audio.

That sounds a little different, but we can try a more extreme example: the gothic cathedral of York Minster. Again, here is the impulse response, followed by the processed signal.

In this case, we have a much more extreme reverberation effect, and we get the sound of a guitar in a large, ringing room. For our last example, we’ll note that impulse responses don’t have to be natural recordings, but instead could be entirely synthetic. Here, I’ve simply reversed the first impulse response from the stairwell, which creates this pre-echo effect, which doesn’t exist naturally.

This is just one of the most basic examples of what can be done with signal processing, but I think it’s a particularly good one – by defining some reasonable properties for signals and systems, we’re able to derive a nice compact representation that also makes a practical application very simple.

Sequences & Tendency: Topology Basics Pt. 2

Sequences & Tendency: Topology Basics Pt. 2


In my previous post I presented abstract topological spaces by way of two special characteristics. These properties are enough to endow a given set with vast possibilities for analysis. Fundamental to mathematical analysis of all kinds (real, complex, functional, etc.) is the sequence.

We have covered the concept of sequences in some of our other posts here at The Math Citadel. As Rachel pointed out in her post on Cauchy sequences, one of the most important aspects of the character of a given sequence is convergence.

In spaces like the real numbers, there is convenient framework available to quantify closeness and proximity, and which allows naturally for a definition of limit or tendency for sequences. In a general topological space missing this skeletal feature, convergence must be defined.

This post will assume only some familiarity with sequences as mathematical objects and, of course, the concepts mentioned in Part 1. For a thorough treatment of sequences, I recommend Mathematical Analysis by Tom M. Apostol.


Suppose (X,\mathscr{T}) is a given topological space, and nothing more is known. At our disposal so far are only open sets (elements of \mathscr{T}), and so it is on these a concept of vicinity relies.

Definition. Given a topological space (X,\mathscr{T}), a neighborhood of a point x\in X is an open set which contains x.

That is, we say an element T\in\mathscr{T} such that x\in T is a neighborhood1 of x. To illustrate, take the examples from my previous post.

The Trivial Topology

When the topology in question is the trivial one: \{\emptyset,X\}, the only nonempty open set is X itself, hence it is the only neighborhood of any point x\in X.

The Discrete Topology

Take X=\{2,3,5\} and \mathscr{T} to be the collection of all subsets of X:

\{2\} \{3\} \{5\}
\{2,3\} \{2,5\} \{3,5\}


Then, for, say x=5, neighborhoods include \{5\}, \{2,5\}, \{3,5\}, and \{2,3,5\}.

The Standard Topology on \mathbb{R}

The standard topology on \mathbb{R} is defined to be the family of all sets of real numbers containing an open interval around each of its points. In this case, there are infinitely2 many neighborhoods of every real number. Taking x=\pi for instance, then (3,4), (-2\pi,2\pi), and even


are all neighborhoods of \pi.

Remark. A special type of neighborhood in the standard topology is the symmetric open interval. Given a point x and a radius r>0, the set


is a neighborhood of x. These sets form what is referred to as a basis for the standard topology and are important to definition of convergence in \mathbb{R} as a metric space.


“…the topology of a space can be described completely in terms of convergence.” —John L. Kelley, General Topology

At this point in our discussion of topological spaces, the only objects available for use are open sets and neighborhoods, and so it is with these that convergence of a sequence are built3.

Definition. A sequence (\alpha_n) in a topological space (X,\mathscr{T}) converges to a point L\in X if for every neighborhood U of L, there exists an index N\in\mathbb{N} such that \alpha_n\in U whenever n\geq N. The point L is referred to as the limit of the sequence (\alpha_n).

Visually, this definition can be thought of as demanding the points of the sequence cluster around the limit point L. In order for the sequence (\alpha_n) to converge to L, it must be the case that after finitely many terms, every one that follows is contained in the arbitrarily posed neighborhood U.

As you might expect, the class of neighborhoods available has a dramatic effect on which sequences converge, as well as where they tend. Just how close to L are the neighborhoods built around it in the topology?

We will use the example topologies brought up so far to exhibit the key characteristics of this definition, and what these parameters permit of sequences.

The Trivial Topology

In case it was to this point hazy just how useful the trivial topology is, sequences illustrate the issue nicely. For the sake of this presentation, take the trivial topology on \mathbb{R}. There is precisely one neighborhood of any point, namely \mathbb{R} itself. As a result, any sequence of real numbers converges, since every term belongs to \mathbb{R}. Moreover, every real number is a limit of any sequence. So, yes, the sequence (5,5,5,\ldots) of all 5‘s converges to \sqrt{2} here.

The Discrete Topology

Whereas with the trivial topology a single neighborhood exists, the discrete topology is as packed with neighborhoods as can be. So, as the trivial topology allows every sequence to converge to everything, we can expect the discrete topology to be comparatively restrictive. Taking the set \{2,3,5\} with the discrete topology as mentioned above, we can pinpoint the new limitation: every set containing exactly one point is a neighborhood of that point. Notice the sets4 \{2\}, \{3\}, and \{5\} are all open sets.

What does this mean? Any sequence that converges to one of these points, say 3, must eventually have all its terms in the neighborhood \{3\}. But that requires all convergent sequences to be eventually constant! This seems to be a minor issue with the finite set \{2,3,5\}, but it presents an undesirable, counter-intuitive problem in other sets.

Take \mathbb{R} with the discrete topology, for example. Under these rules, the sequence


though expected to converge to 0, does not converge at all.

So, the discrete topology is too restrictive, and the trivial topology lets us get away with anything. Fortunately, a happy middle ground exists by being a little more selective with neighborhoods.

The Standard Topology

By requiring an open set to contain an open interval around each of its points, it is impossible that a singleton be an open set. Therefore a singleton cannot be a neighborhood, and we eliminate the trouble posed by the discrete topology. Yet every open interval around a real number L contains a smaller one, and each of these is a neighborhood.

This effectively corrals the points of any convergent sequence, requiring the distance between the terms and the limit to vanish as n increases. Take again the sequence


We suspect (\alpha_n) converges to 0, but this requires proof. Therefore, we must consider an arbitrary neighborhood of 0, and expose the index N\in\mathbb{N} such that all terms, from the Nth onward, exist in that neighborhood.

Suppose U is a given neighborhood of 0, so that U contains an open interval surrounding 0. Without loss of generality, we may assume this interval is symmetric; that is, the interval has the form (-r,r) for some radius r>0. Take N to be any integer greater than \tfrac{1}{r}. Then, whenever n\geq N,

\alpha_n = \frac{1}{n} \leq \frac{1}{N} < \frac{1}{1/r} = r.

But this means \alpha_n\in(-r,r)\subset U so long as n\geq N. Since we chose U arbitrarily, it follows (\alpha_n) converges to 0.


The behavior of a sequence in a given set can change rather drastically depending on the network of neighborhoods the topology allows. However, with careful construction, it is possible to have all the sequential comforts of metric spaces covered under a briefly put definition.

My next post in this series will push the generalization of these concepts much further, by relaxing a single requirement. In order to define convergence in the preceding discussion, the set of indices \mathbb{N} was important not for guaranteeing infinitely many terms, but instead for providing order. This allows us to speak of all terms subsequent to one in particular. It turns out that if we simply hold on to order, we can loosen the nature of the set on which it is defined. That is the key to Moore-Smith Convergence, to be visited next.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Paper Review: Active Queue Management with Non-Linear Packet Dropping Function

Paper Review: Active Queue Management with Non-Linear Packet Dropping Function

As promised in the previous article, I plan to review Reference 2, Active Queue Management with Non-Linear Packet Dropping Function, by D. Augustyn, A. Domanski, and J. Domanska, published in HET-NETs 2010, which discusses a change in the structure of the packet drop probability function using the average queue length in a buffer. I mentioned previously that choosing a linear function of the average queue length can be viewed as a bit of an arbitrary choice, since we’re designing a control mechanism here, and this paper attempts to define a new form of this packet drop probability function.

In summary, the best lesson one can take from this paper is that publication in a journal or conference proceedings does not guarantee that the paper withstands scrutiny. The paper is linked above for the interested reader to peruse himself, and to investigate the claims. 


The paper intended to give a new function to calculate the probability of proactively dropping a packet in a queue in order to prevent a full buffer. It seemed to be presented as an alternative to RED, described in my previous article. The authors define this new function, then set up a simulation in order to examine the effects. 

When Orthogonal is Abused

The authors describe using a finite linear combination of orthogonal basis polynomials defined on a finite interval as the underlying mathematical structure. 

First, we should discuss what we mean by orthogonal in context of functions. Orthogonal is most commonly understood in terms of vectors, and when we’re in two dimensions, orthogonal becomes our familiar perpendicular. 


Beginning with the familiar notion of perpendicular, we can generalize this to understand orthogonality. The geometric interpretation of two vectors  being perpendicular is that the angle between them is 90^{\circ}. Once we leave two and three dimensions (or jump to the space of polynomials, as we’ll do soon), the concept of an angle isn’t as helpful. 

Another way to define perpindicular is through an operation known as the dot productSuppose we take two 2D vectors, \mathbf{x} and \mathbf{y}. Each vector will have coordinates: \mathbf{x} = (x_{1},x_{2}) and \mathbf{y} = (y_{1}, y_{2}). The dot product is a special type of multiplication defined on vectors, and denoted \cdot:

\mathbf{x}\cdot\mathbf{y} = x_{1}y_{1} + x_{2}y_{2}

The dot product can be described in words as the sum of the component-wise multiplication of the coordinates of the two vectors. 

Now, we can say that two vectors are perpendicular if their dot product is 0. That is, \mathbf{x} and \mathbf{y} are perpindicular if \mathbf{x}\cdot\mathbf{y} = 0. (This article gives a nice overview of how we move from the algebraic to the geometric definition of perpendicular and orthogonal.)

Remember, perpendicular doesn’t make sense once we get into higher dimensions. Orthogonal is a more general notion of vectors being perpendicular, and is defined for two vectors (of any length) as their dot product equalling zero.

From Dot Product to Inner Product

The dot product is used on vectors, and defines another type of product that is different from the scalar multiplication we know. In fact, we can generalize the notion of a dot product to something called an inner product, which can be defined on many different spaces than just vectors. We can define operations and products however we like, but for our definition to qualify as an inner product (denoted \langle \cdot, \cdot\rangle), it must meet certain criteria

For instance, on the set of real valued functions with domain [a,b], we define the inner product of two functions f(x) and g(x) as 

\langle f, g\rangle := \int_{a}^{b}f(x)g(x)dx

The concept of orthogonality generalizes to an inner product as well. If the inner product of two functions is zero (as defined above), we say the functions are orthogonal. 

Back to the paper

The authors claim to be using a set of orthogonal polynomials to define their drop probability function, and they give the structure of such functions. For a domain [a,b], and for \phi_{j} in the set of polynomials, they define \phi_{j} = (x-a)^{j-1}(b-x). So, for example, \phi_{1} = (b-x), and \phi_{5} = (x-a)^{4}(b-x)

Now, in order to be an orthogonal basis for a space1, the set of functions that are claimed to form the basis of the set must be pairwise orthogonal. That is, I need to be able to take the inner product of any two functions \phi_{i} and \phi_{j} and get 0. If that isn’t true for even one pair, then the set is not orthogonal. 

As it turns out, if we take the inner product of any two functions in the basis set over the domain given, we find that there are no pairs that are orthogonal. To do this in general, we compute the integral

\int_{a}^{b}(x-a)^{i-1}(b-x)\cdot (x-a)^{j-1}(b-x)dx

The integral computation is one of simple polynomial integration, and can be done either by hand or using your favorite software (Mathematica) of choice. What we find here is that this set of functions defined in general this way is never orthogonal, yet the paper claims they are. 

Applying to the particular situation of designing a drop probability function, they give the following for average queue length thresholds T_{\min} and T_{\max} p(x,a_{1},a_{2}) = \left\{\begin{array}{lr}0, &x < T_{\min}\\\phi_{0} + a_{1}\phi_{1}(x) + a_{2}\phi_{2}(x),&T_{\min}\leq x \leq T_{\max}\\1,&x > T_{\max}\end{array}\right.

where the basis functions are 

\begin{aligned}\phi_{0}(x) &= p_{m}\frac{x-T_{\min}}{T_{\max}-T_{\min}}\\\phi_{1}(x) &= (x-T_{\min})(T_{\max}-x)\\\phi_{2}(x) &= (x-T_{\min})^{2}(T_{\max}-x)\end{aligned}

The reader will recognize \phi_{0} as the original drop probability function from the RED algorithm. These functions are absolutely not orthogonal though (as the authors claim), and a simple check as we did above will verify it.

Other issues

Another issue is that these mysterious coefficients a_{1} and a_{2} need to be determined. How, you ask? The authors do not say, other than to note that one can define “a functional” implicit on the unknown p(x,a_{1},a_{2}) that can be minimized to find the optimal values for those coefficients. They write that this mysterious functional2 can be based on either the average queue length or average waiting time, yet provide no details whatsoever as to the functional they have chosen for this purpose. They provide a figure with a sample function, but give no further details as to how it was obtained.

One other issue I have in their methodology is discussing the order of estimation. For those familiar with all sorts of ways to estimate unknown functions, from Taylor series, to splines, to Fourier series, we know that a function is exactly equal to an infinite sum of such functions. Any finite sum is an estimation, and the number of terms we use for estimation with a desired accuracy may change with the function being estimated. 

For instance, if I want to use Taylor series (a linear combination of polynomials) to estimate a really icky function, how many terms should I include to ensure my accuracy at a point is within 0.001 of the real value? It depends on the function.3.

The authors simply claim that a second order term is appropriate in this case. The issue I take with that is these queueing management drop probability functions are designed. We’re not estimating a function describing a phenomenon, we are designing a control policy to seek some sort of optimal behavior. Fundamentally, the authors posing this as an estimation problem of some unknown drop probability function is incorrect. This isn’t a law of nature we seek to observe; it’s a control policy we seek to design and implement and optimize. Using language that implies the authors are estimating some “true function” is misleading.

Regarding the simulation itself, since the design was arbitrary, and not based on sound mathematical principles, I cannot give any real comment to the simulations and the results. The authors briefly discuss and cite some papers that explore the behavior of network traffic, and claim to take this into account in their simulations. To those, I cannot comment yet. 


Always verify a paper for yourself, and don’t accept anything at face value. Research and technical publications should be completely transparent as to choices and methodologies (and obviously free of mathematical inaccuracies), and derivations and proofs should be present when necessary, even if in an appendix. 

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.


Networking Mathematics: Random Early Detection and TCP synchronization

Networking Mathematics: Random Early Detection and TCP synchronization

Computer networks are something most of us take for granted–speed, reliability, availability are expectations. In fact, network problems tend to make us very angry, whether it’s dropped packets (yielding jittery Skype calls), congestion (that huge game download eating all the bandwidth), or simply a network outage. There’s an awful lot going on underneath the hood of all devices on a network to load that webpage or download that song for you. 

Much of the reliability in networking relies on maintaining good Quality of Service (QoS) policies, which involve buffering and queue management. Networks aren’t unlike the roads we travel on; traffic isn’t steady, and congestion at interfaces (or intersections on the road) happen. How do we handle that? We’ll explore some basic networking principles in order to uncover some interesting mathematics governing buffering and queue management at congested network interfaces.

Update: Mr. Fred Baker reached out to me with a few corrections regarding my interchangeable use of queue and buffer. I’ve inserted his comments into the “Buffer” section. Incidentally, Mr. Baker was the inventor of the WRED (Weighted Random Early Detection) algorithm mentioned as an extension below. 

Read More Read More

Little’s Law: For Estimation Only

Little’s Law: For Estimation Only

I had been intending on writing some posts on queuing theory for a while now, as this branch is the closest to my research interests and was the spark that sent me down the road that eventually led to my PhD dissertation. Most are quite familiar with the concepts of queuing theory, at least intuitively, so this is one of the more tangible topics in mathematics. We’ve all stood in queues at grocery stores, grouched in rush hour traffic, or had to refresh a webpage when the connection drops. I was reminded of my intentions when Datrium (to my surprise) mentioned a common queuing theory result called Little’s Law in their Storage Field Day presentation, and they even have an article they’ve written that makes use of it1. What exactly is Little’s Law, and what does and doesn’t it say?

Some queuing theory terms

To be rigorous in our discussion, we’ll need to formalize some concepts. A queuing system is composed of two components: a queue and a server. The queue is the line you stand in, and the server is the object performing some kind of task (like scanning your groceries). A queuing system can have more than one server2, and different kinds of service policies3

Counting the number of customers in each stage

A customer is in a queuing system when he is standing in the queue itself, and when he is being served by a server. So if we let N_{q} be the number of customers waiting in a queue, and N_{s} be the number of customers currently being served, then the total number of customers in a queuing system (N) is given by 

N = N_{q} + N_{s}

It’s important to note here that customers arrive randomly and are served with random service times. So the number of customers in a queue or in service is a random variable, and changing with time. The arrival times, the service times, and the number of customers in the queuing system all can be assigned probability distributions, but these are random variables. 

Looking at the time

When standing in a queue, you’re now in the queue; that is, you’ve arrived. Now you have to wait. How long will you wait before service? How long will service take? As anyone moving through everyday life can attest, both of those times are random. We denote W as the waiting time a customer spends in the queue before service, and X the amount of time a customer spends being served. Thus, the total is 

T = W+X

and is commonly referred to as the sojourn time, or the total time spent in the queuing system.

Once again, W and X are random variables, and therefore so is T

Arrival Rates

Arrivals to a queuing system are random, and governed by a counting process with a probability distribution. The most common counting process used in queuing theory is the Poisson process due to its very comfortable mathematical properties. 

If we watched the arrivals to a queue for a very long time, a simple method of analyzing a queue is to find the mean arrival rate, or the average number of arrivals per unit time, typically denoted by \lambda. We could make a rough guess at it by dividing the number of arrivals in a long period of time by the length of our “watch period”. A Poisson process has a mean arrival rate, and the number of arrivals in a period of time follows a Poisson distribution. 

At the risk of sounding like a broken record, I’d like to point out that a mean arrival rate of 1/hr does not mean that each hour, only one customer will arrive. It’s an average. Some hours will have no arrivals, some will have 5 arrivals, or 3 arrivals. An average is simply a point estimate of a typical hour.4

The Statement of Little’s Law

Little’s Law is an elegant little formula that relates the mean number of customers in a queuing system, the mean sojourn time, and the mean arrival rate. It’s quite simply written

\text{E}[N] = \lambda\text{E}[T]

There’s not much to analyze here, which in many ways is nice. If you want to get the mean or expected number of customers in your queuing system at a given time, simply multiply the average arrival rate with the average time in the system. 

What’s great about this law is that it’s just that–a law. It’s true regardless of the service distribution, the arrival process, or the service policies. Sounds like a silver bullet, right? Queuing theory solved!

Some subtleties

I keep insisting the reader pay close attention to the fact that we’re dealing with random variables the whole way through. Everything is random, which means that nothing really hits the mean that often. Little’s Law is great for quick calculations and estimations, but relying too heavily on it oversimplifies a queuing system, which can be damaging. (This is similar to wantonly applying the Central Limit Theorem, even when your assumptions are not met. )

Little’s Law essentially smooths out the randomness into a deterministic, steady state equation. The means are numbers, not random variables. What we need to understand about this formula is what it doesn’t help us with. Random variables have variances, and Little’s Law just gives us an average number calculated from other averages.


A queuing system is dynamic, not a steadily flowing river. Little’s Law is great in practice for estimation, and having such an elegant relationship between the means of several random variables is useful to get a general idea of what’s going on in your queuing system. Traffic planning and analysis is a difficult task because of all the randomness involved.

The danger of using Little’s Law as your silver bullet (much the way the Central Limit Theorem is commonly used as a silver bullet) is that you risk losing valuable information about the variation in the random variables that make up your queuing system, which can wreak havoc on the best-laid plans.

Example: Suppose a store owner applies Little’s Law in order to determine how many cashiers he should call in each day. He calculates via Little’s Law that he expects 10 customers/hr, and thus feels perfectly justified in only having two cashiers per day in the store. What he didn’t realize is that the weekends tended to be very busy, with swarms of people coming in the store, and Wednesdays were practically a ghost town in the store. 

Everything averaged out to 10 customers/hr, but that wasn’t much help for the two cashiers who had to handle the weekend rush, and it wasn’t much help to the owner who ended up paying two cashiers that weren’t needed on Wednesdays. He probably could have just handled the store alone on Wednesdays. 

The example above is a simple one, but there is an important lesson in here. Too much smoothing causes valuable information loss. Little’s Law is great for estimation, and should only be used as such.


There is a distributional version of Little’s Law, published in 1993, which is much better suited than the original Little’s Law because it discusses the relationship of the probability distributions of the random variables in the queuing system rather than simply their averages.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Building a Ground Floor: Topology Basics Pt. 1

Building a Ground Floor: Topology Basics Pt. 1

Like some other terms in mathematics (“algebra” comes to mind), topology is both a discipline and a mathematical object. Moreover like algebra, topology as a subject of study is at heart an artful mathematical branch devoted to generalizing existing structures like the field of real numbers for their most convenient properties. It is also a favorite subject of mine, ever since my first introduction to it. This is due in large part to its exceedingly simple first principles, which make the wealth of expansion they allow all the more impressive.

It is my intent to discuss some of these starting points here, in the first of a short series of posts toward the goal of presenting one of my favorite results arising in topology: Moore-Smith convergence, a vast extension of the notion of the limit of a sequence. My representation here follows the explanation given by John L. Kelley in his classic text General Topology, which I recommend to all curious readers.

What is a topology?

Definition. By topology is meant any collection \mathscr{T} of sets satisfying two conditions:

\begin{array}{lrcl}\text{(1)}&A,B\in\mathscr{T}&\Rightarrow&A\cap B\in\mathscr{T};\\\text{(2)}&\mathscr{C}\subset\mathscr{T}&\Rightarrow&\bigcup\{C\in\mathscr{C}\}\in\mathscr{T}\end{array}

It is worthwhile to break this definition down. Condition (1) requires that the intersection of any two elements of the collection \mathscr{T} must itself be a member of \mathscr{T}. Condition (2) states that the union of any subcollection of \mathscr{T} must also belong to \mathscr{T}. These are referred to as closure to finite intersection and closure to arbitrary union, respectively, in some texts.

Notably, the definition speaks only of a collection of sets with no specification beyond the two conditions. Yet, even with these, one can deduce some further characteristic properties.

Corollary. If \mathscr{T} is a topology, then


Since \emptyset\subset S for every set S, and \mathscr{T}\subset\mathscr{T}, it is enough to apply (2) to both of these cases to prove the corollary. In fact, many texts make the definition X\mathrel{:=}\bigcup\{T\in\mathscr{T}\}, and refer to the pair (X,\mathscr{T}) as the topological space defined by \mathscr{T}.

This way, the space is given its character by way of the scheme that builds \mathscr{T}, rather than the set X. It is an important distinction, for many topologies are possible on a given set. With that, we can look at some examples.

From Trivial to Complicated

1. The Trivial Topology

Based on the corollary just presented, it is enough to gather a given set X and the empty set \emptyset into a collection \{\emptyset,X\} and have created a topology on X. Because X and \emptyset are its only members, the collection is easily closed to arbitrary union and finite intersection of its elements. This is known as the trivial or indiscrete topology, and it is somewhat uninteresting, as its name suggests, but it is important as an instance of how simple a topology may be. As per the corollary, every topology on X must contain \emptyset and X, and so will feature the trivial topology as a subcollection.

2. The Discrete Topology

For this example, one can start with an arbitrary set, but in order to better illustrate, take the set of the first three primes: \{2,3,5\}. Suppose we consider the collection of all possible subsets of \{2,3,5\}. This is also referred to as the power set of \{2,3,5\}, and denoted \wp(\{2,3,5\}). Fortunately, the set is small enough to list exhaustively1. Here they are listed from top-to-bottom in order of increasing inclusion:

\{2\} \{3\} \{5\}
\{2,3\} \{2,5\} \{3,5\}


Note these are all possible subsets of \{2,3,5\}. It is clear any union or intersection of the pieces in the table above exists as an entry, and so this meets criteria (1) and (2). This is a special example, known as the discrete topology. Because the discrete topology collects every existing subset, any topology on \{2,3,5\} is a subcollection of this one.

For example, taking the sets


from the collection in the table is enough to produce a topology2.

Remark. Given a topological space (X,\mathscr{T}), the elements of \mathscr{T} are referred to as open sets. This nomenclature is motivated in the next example.

3. \mathbb{R} and Open Intervals

This example will be more constructive than the previous ones. Consider the set of real numbers, \mathbb{R}. Let us define a special collection \mathscr{T} of subsets of real numbers the following way: a set T belongs to \mathscr{T} if, and only if, for every x\in T, there exist real numbers a and b such that x\in(a,b) and (a,b)\subset T. That is, we say T\in\mathscr{T} to mean T contains an open interval around each of its elements.

It is good practice to take the time to prove this collection defines a topology on \mathbb{R}. To do so, it must be shown that \bigcup\{T\in\mathscr{T}\}=\mathbb{R}, and that \mathscr{T} meets conditions (1) and (2).

Proof. To show \bigcup\{T\in\mathscr{T}\}=\mathbb{R}, it must be verified that \bigcup\{T\in\mathscr{T}\}\subset\mathbb{R} and \mathbb{R}\subset\bigcup\{T\in\mathscr{T}\}. The first containment follows by defining every T\in\mathscr{T} as a subset of \mathbb{R} to begin with, so the reverse containment is all that is left. Let x\in\mathbb{R} be given. Then certainly x\in(x-1,x+1), and surely (x-1,x+1)\in\mathscr{T}, as it contains an open interval around all its points by its very design. Thus x\in\bigcup\{T\in\mathscr{T}\}.

On to proving \mathscr{T} satisfies (1) and (2). For (1), let A,B\in\mathscr{T} be given and suppose3 x\in A\cap B. This holds if, and only if, x\in A and x\in B. Since A and B both belong to \mathscr{T}, there exist real numbers a, b, c, and d such that x\in(a,b)\subset A, and x\in(c,d)\subset B. But this means x\in(a,b)\cap(c,d). Fortunately, two intervals of real numbers may only overlap in one way: this means either c<b or a<d. Without loss of generality, suppose it is the former case, that c<b. Then (a,b)\cap(c,d)=(c,b), and it is so that x\in(c,b), an open interval contained in A\cap B (precisely as desired), and it follows A\cap B\in\mathscr{T}.

To show (2) is much easier. Let \{T_\alpha\}_{\alpha\in\mathscr{A}} be a collection4 of sets belonging to \mathscr{T}, and suppose x\in\bigcup_{\alpha\in\mathscr{A}}T_\alpha. Then there exists an index, say \alpha_0\in\mathscr{A}, such that x\in T_{\alpha_0}. Since T_{\alpha_0}\in\mathscr{T}, there exist real numbers a and b such that x\in(a,b)\subset T_{\alpha_0}. But this means x\in(a,b)\subset\bigcup_{\alpha\in\mathscr{A}}T_\alpha. Since x was chosen arbitrarily, it follows \bigcup_{\alpha\in\mathscr{A}}T_\alpha\in\mathscr{T}.

The proof above shows (\mathbb{R},\mathscr{T}) is a topological space; the collection \mathscr{T} is referred to as the standard topology on \mathbb{R}. The open sets in this space are all the subsets of real numbers contained in open intervals. Fittingly, then, open intervals are open sets in the standard topology.


This first post is meant to express the simple starting points of topology as a subject of study. It only takes the two criteria mentioned here to define a topology of sets, and yet an entire realm of theory rests upon them. This is a recurring theme in topology, algebra, and mathematics in general. Building the fully-featured universes that hold the answers for more specific inquiry: the complete ordered field of real numbers5 \mathbb{R}, the space \mathcal{C}^{\infty}(\mathbb{C}) of infinitely differentiable functions f\mathrel{:}\mathbb{C}\to\mathbb{C}, the class of all real-valued Lebesgue-integrable functions on \mathbb{R}, each of these requires a well-made foundation.

The next post in this series will cover the nature of sequences in topological spaces, particularly those instances where the convenient features afforded by the real numbers are no longer available. With the metric space structure stripped away, how does one define convergence and limit of sequences? What does it mean for elements in topological spaces to be close when distance is otherwise without definition?

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.