Paper Review: Active Queue Management with Non-Linear Packet Dropping Function

Paper Review: Active Queue Management with Non-Linear Packet Dropping Function

C. Traylor

This article reviews a network engineering paper and points out some issues.

I plan to review Reference 2, Active Queue Management with Non-Linear Packet Dropping Function,by D. Augustyn, A. Domanski, and J. Domanska, published in HET-NETs 2010,which discusses a change in the structure of the packet drop probability function using the average queue length in a buffer. I mentioned previously that choosing a linear function of the average queue length can be viewed as a bit of an arbitrary choice, since we're designing a control mechanism here, and this paper attempts to define a new form of this packet drop probability function.

In summary, the best lesson one can take from this paper is that publication in a journal or conference proceedings does not guarantee that the paper withstands scrutiny. The paper is linked above for the interested reader to peruse himself, and to investigate the claims.

Summary

The paper intended to give a new function to calculate the probability of proactively dropping a packet in a queue in order to prevent a full buffer. It seemed to be presented as an alternative to RED, described in my previous article. The authors define this new function, then set up a simulation in order to examine the effects.

When Orthogonality is Abused

The authors describe using a finite linear combination of orthogonal basis polynomials defined on a finite interval as the underlying mathematical structure.

First, we should discuss what we mean by orthogonal in context of functions.Orthogonalis most commonly understood in terms of vectors, and when we're in two dimensions,orthogonalbecomes our familiarperpendicular.

Orthogonality

Beginning with the familiar notion of perpendicular, we can generalize this to understand orthogonality. The geometric interpretation of two vectors being perpendicular is that the angle between them is $90^{\circ}$. Once we leave two and three dimensions (or jump to the space of polynomials, as we'll do soon), the concept of an angle isn't as helpful.

Another way to define perpendicular is through an operation known as thedot product.Suppose we take two 2D vectors, $\mathbf{x}$ and $\mathbf{y}$. Each vector will have coordinates: $\mathbf{x} = (x_{1},x_{2})$ and $\mathbf{y} = (y_{1}, y_{2})$. Thedot product is a special type of multiplication defined on vectors, and denoted $\cdot$:

$$\mathbf{x}\cdot\mathbf{y} = x_{1}y_{1} + x_{2}y_{2}$$

The dot product can be described in words as the sum of the component-wise multiplication of the coordinates of the two vectors.

Now, we can say that two vectors are perpendicular if their dot product is 0. That is, $\mathbf{x}$ and $\mathbf{y}$ are perpendicular if $\mathbf{x}\cdot\mathbf{y} = 0$. (This article gives a nice overview of how we move from the algebraic to the geometric definition of perpendicular and orthogonal.)

Remember, perpendicular doesn't make sense once we get into higher dimensions.Orthogonalis a more general notion of vectors being perpendicular, and is defined for two vectors (of any length) as their dot product equalling zero.

From Dot Product to Inner Product

The dot product is used on vectors, and defines another type of product that is different from the scalar multiplication we know. In fact, we can generalize the notion of a dot product to something called aninner product, which can be defined on many different spaces than just vectors. We can define operations and products however we like, but for our definition to qualify as an inner product (denoted $\langle \cdot, \cdot\rangle$), it must meet certain criteria.

For instance, on the set of real valued functions with domain $[a,b]$, we define the inner product of two functions $f(x)$ and $g(x)$ as $$\langle f, g\rangle := \int_{a}^{b}f(x)g(x)dx$$

The concept of orthogonality generalizes to an inner product as well. If the inner product of two functions is zero (as defined above), we say the functions are orthogonal.

Back to the paper

The authors claim to be using a set of orthogonal polynomials to define their drop probability function, and they give the structure of such functions. For a domain $[a,b]$, and for $\phi_{j}$ in the set of polynomials, they define $\phi_{j} = (x-a)^{j-1}(b-x)$. So, for example, $\phi_{1} = (b-x)$, and $\phi_{5} = (x-a)^{4}(b-x)$.

Now, in order to be an orthogonal basis for a space, the set of functions that are claimed to form the basis of the set must be pairwise orthogonal. That is, I need to be able to take the inner product of any two functions $\phi_{i}$ and $\phi_{j}$ and get 0. If that isn't true for even one pair, then the set is not orthogonal.

As it turns out, if we take the inner product of any two functions in the basis set over the domain given, we find that there areno pairs that are orthogonal. To do this in general, we compute the integral

$$\int_{a}^{b}(x-a)^{i-1}(b-x)\cdot (x-a)^{j-1}(b-x)dx$$

The integral computation is one of simple polynomial integration, and can be done either by hand or using your favorite software (Mathematica) of choice. What we find here is that this set of functions defined in general this way is never orthogonal, yet the paper claims they are.

Applying to the particular situation of designing a drop probability function, they give the following for average queue length thresholds $T_{\min}$ and $T_{\max}$

$$p(x,a_{1},a_{2}) = \left\{\begin{array}{lr}0, &x < T_{\min}\\\phi_{0} + a_{1}\phi_{1}(x) + a_{2}\phi_{2}(x),&T_{\min}\leq x \leq T_{\max}\\1,&x > T_{\max}\end{array}\right.$$ where the basis functions are $$\begin{aligned}\phi_{0}(x) &= p_{m}\frac{x-T_{\min}}{T_{\max}-T_{\min}}\\\phi_{1}(x) &= (x-T_{\min})(T_{\max}-x)\\\phi_{2}(x) &= (x-T_{\min})^{2}(T_{\max}-x)\end{aligned}$$ The reader will recognize $\phi_{0}$ as the original drop probability function from the RED algorithm. These functions are absolutely not orthogonal though (as the authors claim), and a simple check as we did above will verify it.

Other issues

Another issue is that these mysterious coefficients $a_{1}$ and $a_{2}$ need to be determined. How? The authors do not say, other than to note that one can define "a functional" implicit on the unknown $p(x,a_{1},a_{2})$ that can be minimized to find the optimal values for those coefficients. They write that this mysterious functional can be based on either the average queue length or average waiting time, yet provide no details whatsoever as to the functional they have chosen for this purpose. (I did not make a typo. A functional is a mapping that takes functions as input.) They provide a figure with a sample function, but give no further details as to how it was obtained.

One other issue I have in their methodology is discussing the order of estimation. For those familiar with all sorts of ways to estimate unknown functions, from Taylor series, to splines, to Fourier series, we know that a function is exactly equal toan infinite sum of such functions. Any finite sum is an estimation, and the number of terms we use for estimation with a desired accuracy may change with the function being estimated.

For instance, if I want to use Taylor series (a linear combination of polynomials) to estimate a really icky function, how many terms should I include to ensure my accuracy at a point is within 0.001 of the real value? It depends on the function. Many engineers use a first or second order approximation as a rule of thumb, though it really should still be verified that higher order terms contribute a negligible amount first.

The authors simply claim that a second order term is appropriate in this case. The issue I take with that is these queueing management drop probability functions are designed. We're not estimating a function describing a phenomenon, we are designing a control policy to seek some sort of optimal behavior. Fundamentally, the authors posing this as an estimation problem of some unknown drop probability function is incorrect. This isn't a law of nature we seek to observe; it's a control policy we seek to design and implement and optimize. Using language that implies the authors are estimating some "true function" is misleading.

Regarding the simulation itself, since the design was arbitrary, and not based on sound mathematical principles, I cannot give any real comment to the simulations and the results. The authors briefly discuss and cite some papers that explore the behavior of network traffic, and claim to take this into account in their simulations. To those, I cannot comment yet.

Conclusion

Always verify a paper for yourself, and don't accept anything at face value. Research and technical publications should be completely transparent as to choices and methodologies (and obviously free of mathematical inaccuracies), and derivations and proofs should be present when necessary, even if in an appendix.